- Cambridge, MA
- http://people.csail.mit.edu/clai24/
- @jefflai108
Stars
[ICASSP 2025] Official Pytorch implementation of "Large Language Models are Strong Audio-Visual Speech Recognition Learners".
Official implementation of the paper "BigCodec: Pushing the Limits of Low-Bitrate Neural Speech Codec"
Generative models for conditional audio generation
Movie Gen Bench - two media generation evaluation benchmarks released with Meta Movie Gen
PyTorch implementation of MAR+DiffLoss https://arxiv.org/abs/2406.11838
SALMONN: Speech Audio Language Music Open Neural Network
Offical code for the CVPR 2024 Paper: Separating the "Chirp" from the "Chat": Self-supervised Visual Grounding of Sound and Language
Python module for syllabifying English ARPABET transcriptions
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audi…
Models and code for RepCodec: A Speech Representation Codec for Speech Tokenization
This is the code for the SpeechTokenizer presented in the SpeechTokenizer: Unified Speech Tokenizer for Speech Language Models. Samples are presented on
Official PyTorch implementation of GroupViT: Semantic Segmentation Emerges from Text Supervision, CVPR 2022.
Implementation of Generating Diverse High-Fidelity Images with VQ-VAE-2 in PyTorch
[ICLR 2024] Fine-tuning LLaMA to follow Instructions within 1 Hour and 1.2M Parameters
Phoneme segmentation using pre-trained speech models
Code for the IEEE Signal Processing Letters 2022 paper "UAVM: Towards Unifying Audio and Visual Models".
Unsupervised phone and word segmentation using dynamic programming on self-supervised VQ features.
A curated list of awesome adversarial reprogramming and input prompting methods for neural networks since 2022
Word Discovery in Visually Grounded, Self-Supervised Speech Models
Implementation of N-Grammer, augmenting Transformers with latent n-grams, in Pytorch
Robust Speech Recognition via Large-Scale Weak Supervision
Python implementation of simple GMM and HMM models for isolated digit recognition.
X (weighted / probabilistic) Context-Free Grammars
"rsync for cloud storage" - Google Drive, S3, Dropbox, Backblaze B2, One Drive, Swift, Hubic, Wasabi, Google Cloud Storage, Azure Blob, Azure Files, Yandex Files