-
Notifications
You must be signed in to change notification settings - Fork 547
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add text_col param that's required for SFTTrainer #66
Conversation
I feel like my use of the post_init to verify the dataclass values could be a code smell (since there's now a decent amount of logic and it no longer feels like a struct). Happy to take suggestions if others feel the same way! |
It's not too bad -- Dataclasses are more than simple structs and post init is intended to catch user errors early so what you have is great IMO. |
"\n", | ||
"This only needs to be done once!" | ||
"This only needs to be done once! However, you have to wait for the token request to be approved for this to work." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: for some reason I don't get any notifications when someone requests a token, could add a note about slacking me or Nikolai to have that approved quickly?
"dataset_text_field" | ||
) | ||
if ( | ||
existing_dataset_text_field is not None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: consider adding parens around sub-expressions here (so readers don't have to think about and
vs not
operations precedence)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, though it ends up looking a bit weird due to the formatter.
@@ -58,3 +62,32 @@ def test_custom_train(): | |||
) | |||
|
|||
train(config) | |||
|
|||
|
|||
def test_pack_train(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To keep pre-commit costs under control, we should probably mark this test optional
IIUC, this can be done as follows: pytest.mark.foo
For example, you can add this: @pytest.mark.e2e
then this test will not run by default, only if you pass e2e parameter to pytest
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then we'll have to figure out how to run e2e tests periodically (outside of precommit flow)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Marking as skip for now since it's way too slow. After I resolve #62 I'll change it to e2e
data.trainer_kwargs["dataset_text_field"]
withdata.text_col
since it's a required field for SFTTrainertext_col
a required field for SFTTrainer, making it not possible to doTrainingConfig()
in tests.