You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As we have discussed on Discord, there are many details in this process that could go wrong, especially considering you are using your own training framework. Here are some usual suspects we checked:
You mentioned your framework works well with other models like deepseek-coder, llama3, mistral.
for grad_norm nan, check if the training precision is set to bf16.
quality check your dataset.
consider update bitsandbytes==0.42.0 to newer version, developers have complained about similar issues. Also check other dependencies like deepspeed.
Very large loss at the beginning might indicate LR is too large, and you have already tried a smaller value of 1e-5 and gradient_clipping.
Hope there are smarter minds that can answer this.
First, thanks for your great works. I've tried to finetune the Yi-Coder-9B-Chat models on my own dataset but here comes the problems.
Problems
'grad_norm' becomes nan when I try to finetune the Yi-Coder-9B-Chat models
Details Description
In the first step, the grad_norm becomes nan, and later the loss becomes zero due to the ''grad_norm' nan issues.
But when I use the same code and change the model to CodeLlama-13b-Instruct-hf everything works as my expection.
Reproduce Code
I've changed the dataset from my own dataset to public dataset Genshin_Character_instruction/Genshin_Character_instruction.json, it can be found the huggingface.
link: https://huggingface.co/datasets/YanFu0320/Genshin_Character_instruction
System and Env setting
System
Related package version
The text was updated successfully, but these errors were encountered: