Skip to content

OpenSparseLLMs/MoM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MoM: Mixture-of-Memories

arXiv huggingface weights zhihu stars

Welcome to MoM! This repository provides the implementation of MoM: Linear Sequence Modeling with Mixture-of-Memories, on huggingface eco-system. MoM is compatible with all kinds of linear sequence modeling methods like: linear attention, SSM, linear RNN, etc. Here is an introductory artical about MoM (in Chinese) on Zhihu.

Figure 1: MoM Architecture

Installation

The following requirements should be satisfied:

Install the package from source:

pip install -e .

Getting Started

Data Preparation

Before training, make sure to preprocess your data by following the steps outlined in training/README.md.

Training From Scratch

To start training with default setup, simply run:

cd training

bash train.sh \
  nodes=4 \
  gpus=8 \
  type=mom \
  lr=3e-4 \
  steps=30720 \
  batch=8 \
  update=1 \
  warmup=1024 \
  context=2048 \
  path=SlimPajama/mom-15B \
  project=SlimPajama \
  model=configs/mom_340M.json \
  tokenizer=fla-hub/gla-1.3B-100B \
  data=SlimPajama-627B \
  cache=data/chunk1/train

You can also

Evaluation

To evaluate model checkpoints on commonsense reasoning benchmarks, we recommend you to run:

MODEL_PATH=training/SlimPajama/mom-15B/checkpoint-30720

accelerate launch --multi_gpu evals/harness.py --model hf \
    --model_args pretrained=$MODEL_PATH,dtype=bfloat16 \
    --tasks arc_easy,arc_challenge,hellaswag,lambada_standard,piqa,winogrande,wikitext \
    --output_path eval_results \
    --batch_size 32 \
    --device cuda

To evaluate model checkpoints on recall-intensive tasks, we recommend you to run:

  1. Install lm_eval
cd lm-eval-harness
pip install -e .
  1. Run the script:
MODEL_PATH=../training/SlimPajama/mom-15B/checkpoint-30720

CUDA_VISIBLE_DEVICES=0,1,2,3,4 python launch_local.py \
    --batch-size 32 \
    -t based_squad \
    -t based_swde \
    -t based_fda \
    -t based_drop \
    -t based_triviaqa \
    -t based_nq_2048 \
    -m $MODEL_PATH \
    --context_length 2048 \
    --answer_length 48 \
    --cutting_context \
    --limit -1 \
    -p

Acknowledgement

This repo builds upon the open-source flash-linear-attention and the evaluation code is based on prefix-linear-attention. Happy experimenting! 🔥🚀🔥

Citation

If you find this repo useful, please consider citing our paper:

@article{du2025mom,
  title={MoM: Linear Sequence Modeling with Mixture-of-Memories},
  author={Du, Jusen and Sun, Weigao and Lan, Disen and Hu, Jiaxi and Cheng, Yu},
  journal={arXiv preprint arXiv:2502.13685},
  year={2025}
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published