MoM: Mixture-of-Memories

Welcome to MoM! This repository provides the implementation of MoM: Linear Sequence Modeling with Mixture-of-Memories, on huggingface eco-system. MoM is compatible with all kinds of linear sequence modeling methods like: linear attention, SSM, linear RNN, etc. Here is an introductory artical about MoM (in Chinese) on Zhihu.

Figure 1: MoM Architecture

Installation

The following requirements should be satisfied:

PyTorch >= 2.5
Triton >=3.0
einops
transformers >=4.45.0
datasets >=3.3.0
causal-conv1d >=1.4.0

Install the package from source:

pip install -e .

Getting Started

Data Preparation

Before training, make sure to preprocess your data by following the steps outlined in training/README.md.

Training From Scratch

To start training with default setup, simply run:

cd training

bash train.sh \
  nodes=4 \
  gpus=8 \
  type=mom \
  lr=3e-4 \
  steps=30720 \
  batch=8 \
  update=1 \
  warmup=1024 \
  context=2048 \
  path=SlimPajama/mom-15B \
  project=SlimPajama \
  model=configs/mom_340M.json \
  tokenizer=fla-hub/gla-1.3B-100B \
  data=SlimPajama-627B \
  cache=data/chunk1/train

You can also

Modify the script to adjust the modeling and training settings.
e.g., modify examples/configs/mom_340M.json to adjust the MoM model structure.

Evaluation

To evaluate model checkpoints on commonsense reasoning benchmarks, we recommend you to run:

MODEL_PATH=training/SlimPajama/mom-15B/checkpoint-30720

accelerate launch --multi_gpu evals/harness.py --model hf \
    --model_args pretrained=$MODEL_PATH,dtype=bfloat16 \
    --tasks arc_easy,arc_challenge,hellaswag,lambada_standard,piqa,winogrande,wikitext \
    --output_path eval_results \
    --batch_size 32 \
    --device cuda

To evaluate model checkpoints on recall-intensive tasks, we recommend you to run:

Install lm_eval

cd lm-eval-harness
pip install -e .

Run the script:

MODEL_PATH=../training/SlimPajama/mom-15B/checkpoint-30720

CUDA_VISIBLE_DEVICES=0,1,2,3,4 python launch_local.py \
    --batch-size 32 \
    -t based_squad \
    -t based_swde \
    -t based_fda \
    -t based_drop \
    -t based_triviaqa \
    -t based_nq_2048 \
    -m $MODEL_PATH \
    --context_length 2048 \
    --answer_length 48 \
    --cutting_context \
    --limit -1 \
    -p

Acknowledgement

This repo builds upon the open-source flash-linear-attention and the evaluation code is based on prefix-linear-attention. Happy experimenting! 🔥🚀🔥

Citation

If you find this repo useful, please consider citing our paper:

@article{du2025mom,
  title={MoM: Linear Sequence Modeling with Mixture-of-Memories},
  author={Du, Jusen and Sun, Weigao and Lan, Disen and Hu, Jiaxi and Cheng, Yu},
  journal={arXiv preprint arXiv:2502.13685},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
assets		assets
benchmarks		benchmarks
evals		evals
lm-eval-harness		lm-eval-harness
mom		mom
training		training
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MoM: Mixture-of-Memories

Installation

Getting Started

Data Preparation

Training From Scratch

Evaluation

Acknowledgement

Citation

About

Releases

Packages

Contributors 2

Languages

License

OpenSparseLLMs/MoM

Folders and files

Latest commit

History

Repository files navigation

MoM: Mixture-of-Memories

Installation

Getting Started

Data Preparation

Training From Scratch

Evaluation

Acknowledgement

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages