Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batch size and training steps #3

Open
jhao6 opened this issue Jan 3, 2025 · 1 comment
Open

Batch size and training steps #3

jhao6 opened this issue Jan 3, 2025 · 1 comment

Comments

@jhao6
Copy link

jhao6 commented Jan 3, 2025

Hi,

I have two questions regarding your code:

  1. What is the bath size for the pretraining? I found that it is set to 512 in the table 5 in the original paper, but it is 64 and the micro batch size is 16 in the code.
  2. How many pretraining steps are taken in Table 1, 10k or 50k? If we use more pretraining steps, like 50k steps, how many steps are taken in the first pretraining stage on the random-selected data and how many steps are taken in the last pretraining stage on selected data.
    Hope you can help me to figure them out. Thank you very much.
@yuzc19
Copy link
Contributor

yuzc19 commented Jan 12, 2025

Hi @jhao6,

Thanks for your question. For 1, we use 8 GPUs in our main experiment as stated here and the lightning package we use will sync the gradients across all GPUs (i.e., the batch size 64 you notice in the code is only for one GPU) since we use the default DDP strategy. So the total global batch size is 512. If you are using fewer GPUs, you may need to change the 64 to adapt to your configurations.

For 2, in Table 1, we train for 50k total steps for all methods. In the first pretraining stage, we train for 10k steps, and for every following model-aware pretraining stage, we train for 10k steps. Therefore, we have 4 model-aware stages in total.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants