Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add FLAN-T5 #1398

Merged
merged 3 commits into from
Mar 9, 2023
Merged

Add FLAN-T5 #1398

merged 3 commits into from
Mar 9, 2023

Conversation

yifanmai
Copy link
Collaborator

@yifanmai yifanmai commented Mar 1, 2023

Fixes #1189

@yifanmai yifanmai requested review from percyliang and teetone March 1, 2023 23:24
@@ -12,6 +13,12 @@
from .client import Client, wrap_request_time, truncate_sequence


MODEL_ALIASES = {
"bloomz": "bloomz-176b-alpa",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine, but it's worth noting that the exact Together model will have an impact on efficiency metrics.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I do wonder if we want to use the actual Together names just to be completely transparent...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like the idea of using Together names because they haven't been particularly stable. Most of the current Together model names are already stale. Also I don't think it makes sense to have implementation details in the name that we show users.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Together names should be (more) stable now. I think it's easier to merge than to separate later. If we ever need to distinguish between different implementations, we need to separate it out (certainly will be different for efficiency, but I think the models might not provide exactly the same predictions).

I agree it'd be nice to have simpler names for users - we can perhaps use aliases on our side which resolve immediately to an implementation.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about:

  • User sees "together/bloomz"
  • Cache key and raw request both use "bloomz-176b-alpa"

Then if the Together implementation changes name, it will automatically invalidate the cache.

I'm thinking about the H3 model which is named "h3-2.7b-h3" (see #1404), which just seems strange to expose to users.

Copy link
Contributor

@percyliang percyliang Mar 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That works - the caching of the underlying model makes me feel better about this. And if there is a change, we can always migrate.. Ideally when the user makes a request, it will map together/bloomz to together/bloomz-176b-alpa (or whatever is the version de jour), and the user could also request together/bloomz-176b-alpa to get particular implementations if they want.

@@ -62,7 +62,7 @@ def get_window_service(model_name: str, service: TokenizerService) -> WindowServ
window_service = SantaCoderWindowService(service)
elif model_name == "huggingface/gpt2":
window_service = GPT2WindowService(service)
elif model_name == "together/bloom":
elif model_name == "together/bloom" or model_name == "together/bloomz":
window_service = BloomWindowService(service)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's the same tokenizer: https://huggingface.co/bigscience/bloomz#cpu

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added BLOOMZ tokenizer.

As far as I can tell, it's the exact same tokenizer, just under a different name.

@@ -12,6 +13,12 @@
from .client import Client, wrap_request_time, truncate_sequence


MODEL_ALIASES = {
"bloomz": "bloomz-176b-alpa",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-alpha: Will there be future versions?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to Together: "the convention is <model-name>-<size>-<framework>"

There might be alternate implementations.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there will be alternate implementations, is it okay to cache results as plain bloomz?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed in the other thread, I changed this to cache results as the full name including the framework e.g. bloomz-176b-alpa.

@@ -78,7 +78,7 @@ def get_window_service(model_name: str, service: TokenizerService) -> WindowServ
window_service = OPTWindowService(service)
elif model_name == "together/t0pp":
window_service = T0ppWindowService(service)
elif model_name == "together/t5-11b":
elif model_name == "together/t5-11b" or model_name == "together/flan-t5-xxl":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this model also has its own tokenizer too.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added tokenizer.

@@ -323,6 +333,15 @@ def engine(self) -> str:
# Does not support echo=True
tags=[TEXT_MODEL_TAG, LIMITED_FUNCTIONALITY_TEXT_MODEL_TAG, ABLATION_MODEL_TAG, NO_NEWLINES_TAG],
),
Model(
group="together",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have to update schema.yaml - remove todo?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@yifanmai yifanmai requested review from teetone and percyliang March 2, 2023 18:57
Copy link
Member

@teetone teetone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few more comments. Could we also add unit tests for the new window services?

def max_sequence_length(self) -> int:
"""
The model was trained with a sequence length of 2,048.
Source: https://huggingface.co/bigscience/bloom
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update the link. Also, it's probably correct, but just want to make sure the sequence length is 2048.

@property
def max_sequence_length(self) -> int:
"""Return the max sequence length."""
# From https://arxiv.org/pdf/1910.10683.pdf, "we use a maximum sequence length of 512".
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is for T5. Can we update this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the stale links. I checked on Hugging Face AutoTokenizer that both of these are correct.

@@ -12,6 +13,12 @@
from .client import Client, wrap_request_time, truncate_sequence


MODEL_ALIASES = {
"bloomz": "bloomz-176b-alpa",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there will be alternate implementations, is it okay to cache results as plain bloomz?

@@ -12,6 +13,12 @@
from .client import Client, wrap_request_time, truncate_sequence


MODEL_ALIASES = {
"bloomz": "bloomz-176b-alpa",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a typo: alpa?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yifanmai yifanmai requested a review from teetone March 8, 2023 19:23
@yifanmai
Copy link
Collaborator Author

yifanmai commented Mar 8, 2023

This is ready for review again. PTAL

Copy link
Contributor

@percyliang percyliang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great!

@yifanmai
Copy link
Collaborator Author

yifanmai commented Mar 9, 2023

@teetone could you take another look? This PR is blocked on your review.

@yifanmai
Copy link
Collaborator Author

yifanmai commented Mar 9, 2023

Thanks!

@yifanmai yifanmai merged commit 3e55882 into main Mar 9, 2023
@yifanmai yifanmai deleted the yifanmai/flan-t5 branch March 9, 2023 16:54
JoelNiklaus pushed a commit to JoelNiklaus/helm that referenced this pull request Mar 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support Flan-T5 (model and window service)
3 participants