-
Notifications
You must be signed in to change notification settings - Fork 280
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add FLAN-T5 #1398
Add FLAN-T5 #1398
Conversation
@@ -12,6 +13,12 @@ | |||
from .client import Client, wrap_request_time, truncate_sequence | |||
|
|||
|
|||
MODEL_ALIASES = { | |||
"bloomz": "bloomz-176b-alpa", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fine, but it's worth noting that the exact Together model will have an impact on efficiency metrics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I do wonder if we want to use the actual Together names just to be completely transparent...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like the idea of using Together names because they haven't been particularly stable. Most of the current Together model names are already stale. Also I don't think it makes sense to have implementation details in the name that we show users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Together names should be (more) stable now. I think it's easier to merge than to separate later. If we ever need to distinguish between different implementations, we need to separate it out (certainly will be different for efficiency, but I think the models might not provide exactly the same predictions).
I agree it'd be nice to have simpler names for users - we can perhaps use aliases on our side which resolve immediately to an implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about:
- User sees "together/bloomz"
- Cache key and raw request both use "bloomz-176b-alpa"
Then if the Together implementation changes name, it will automatically invalidate the cache.
I'm thinking about the H3 model which is named "h3-2.7b-h3" (see #1404), which just seems strange to expose to users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That works - the caching of the underlying model makes me feel better about this. And if there is a change, we can always migrate.. Ideally when the user makes a request, it will map together/bloomz
to together/bloomz-176b-alpa
(or whatever is the version de jour), and the user could also request together/bloomz-176b-alpa
to get particular implementations if they want.
@@ -62,7 +62,7 @@ def get_window_service(model_name: str, service: TokenizerService) -> WindowServ | |||
window_service = SantaCoderWindowService(service) | |||
elif model_name == "huggingface/gpt2": | |||
window_service = GPT2WindowService(service) | |||
elif model_name == "together/bloom": | |||
elif model_name == "together/bloom" or model_name == "together/bloomz": | |||
window_service = BloomWindowService(service) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's the same tokenizer: https://huggingface.co/bigscience/bloomz#cpu
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added BLOOMZ tokenizer.
As far as I can tell, it's the exact same tokenizer, just under a different name.
@@ -12,6 +13,12 @@ | |||
from .client import Client, wrap_request_time, truncate_sequence | |||
|
|||
|
|||
MODEL_ALIASES = { | |||
"bloomz": "bloomz-176b-alpa", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-alpha
: Will there be future versions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to Together: "the convention is <model-name>-<size>-<framework>
"
There might be alternate implementations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there will be alternate implementations, is it okay to cache results as plain bloomz
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed in the other thread, I changed this to cache results as the full name including the framework e.g. bloomz-176b-alpa
.
@@ -78,7 +78,7 @@ def get_window_service(model_name: str, service: TokenizerService) -> WindowServ | |||
window_service = OPTWindowService(service) | |||
elif model_name == "together/t0pp": | |||
window_service = T0ppWindowService(service) | |||
elif model_name == "together/t5-11b": | |||
elif model_name == "together/t5-11b" or model_name == "together/flan-t5-xxl": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this model also has its own tokenizer too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added tokenizer.
@@ -323,6 +333,15 @@ def engine(self) -> str: | |||
# Does not support echo=True | |||
tags=[TEXT_MODEL_TAG, LIMITED_FUNCTIONALITY_TEXT_MODEL_TAG, ABLATION_MODEL_TAG, NO_NEWLINES_TAG], | |||
), | |||
Model( | |||
group="together", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have to update schema.yaml - remove todo
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few more comments. Could we also add unit tests for the new window services?
def max_sequence_length(self) -> int: | ||
""" | ||
The model was trained with a sequence length of 2,048. | ||
Source: https://huggingface.co/bigscience/bloom |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update the link. Also, it's probably correct, but just want to make sure the sequence length is 2048.
@property | ||
def max_sequence_length(self) -> int: | ||
"""Return the max sequence length.""" | ||
# From https://arxiv.org/pdf/1910.10683.pdf, "we use a maximum sequence length of 512". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is for T5. Can we update this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed the stale links. I checked on Hugging Face AutoTokenizer that both of these are correct.
@@ -12,6 +13,12 @@ | |||
from .client import Client, wrap_request_time, truncate_sequence | |||
|
|||
|
|||
MODEL_ALIASES = { | |||
"bloomz": "bloomz-176b-alpa", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there will be alternate implementations, is it okay to cache results as plain bloomz
?
@@ -12,6 +13,12 @@ | |||
from .client import Client, wrap_request_time, truncate_sequence | |||
|
|||
|
|||
MODEL_ALIASES = { | |||
"bloomz": "bloomz-176b-alpa", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a typo: alpa
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it's https://opt.alpa.ai/
b055277
to
cc6b7c5
Compare
1770731
to
184de70
Compare
This is ready for review again. PTAL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great!
@teetone could you take another look? This PR is blocked on your review. |
Thanks! |
Fixes #1189