Skip to content

SLTrain: a sparse plus low-rank approach for parameter and memory efficient pretraining (NeurIPS 2024)

License

Notifications You must be signed in to change notification settings

andyjm3/SLTrain

Repository files navigation

SLTrain (NeurIPS 2024)

A repository containing beta implementation for SLTrain: a sparse plus low-rank approach for parameter and memory efficient pretraining, which has been accepted to NeurIPS 2024. Preprint available on http://arxiv.org/abs/2406.02214.

Modeling for pretraining

The main idea is to re-parameterize linear layer with low-rank and sparse factors for improved parameter and memory efficiency.

W = BA + S,

where B, A model the low-rank component and S models the sparse component. S has a random sparsity pattern.

Motivation

Below, we show how the learned weights L + S enlarges the spectrum. In particular, the L component primarily learns the head singular value spectrum and the S component primarily learns the tail spectrum.

Contribution of L and S components in the singular values of learned W

Zoomed view

Results

Result Comparisons

SlTrain Memory

Installation

Build cpp extensions via

cd ./sparse-lora
pip install .

Usage

Run the scripts placed in scripts/llm_pretrain/. Typical usage:

torchrun --standalone --nproc_per_node 1 torchrun_main.py \
    --model_config configs/llama_60m.json \
    --lr 0.003 \
    --peft_model sltrain\
    --optimizer adamw \
    --rank 128 \  
    --sp_ratio 0.03 \  # sparsity delta
    --batch_size 256 \
    --total_batch_size 512 \
    --num_training_steps 11000 \
    --warmup_steps 1100 \
    --weight_decay 0 \
    --dtype bfloat16 \
    --eval_every 1000 \
    --lora_alpha 32 

Citation

@inproceedings{han2024sltrain,
  title={{SLTrain}: a sparse plus low-rank approach for parameter and memory efficient pretraining},
  author={Han, Andi and Li, Jiaxiang and Huang, Wei and Hong, Mingyi and Takeda, Akiko and Jawanpuria, Pratik and Mishra, Bamdev},
  booktitle = {Advances in Neural Information Processing Systems},
  volume = {37},
  year={2024}
}

About

SLTrain: a sparse plus low-rank approach for parameter and memory efficient pretraining (NeurIPS 2024)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published