Skip to content
View jefflai108's full-sized avatar
🍄
venture to a bigger world
🍄
venture to a bigger world

Block or report jefflai108

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

[ICASSP 2025] Official Pytorch implementation of "Large Language Models are Strong Audio-Visual Speech Recognition Learners".

Python 14 1 Updated Mar 10, 2025

Official implementation of the paper "BigCodec: Pushing the Limits of Low-Bitrate Neural Speech Codec"

Python 149 10 Updated Sep 19, 2024

Generative models for conditional audio generation

Python 2,975 295 Updated Mar 21, 2025

Movie Gen Bench - two media generation evaluation benchmarks released with Meta Movie Gen

380 22 Updated Mar 8, 2025

ConMamba for Automatic Speech Recognition

Python 62 5 Updated Aug 12, 2024

PyTorch implementation of MAR+DiffLoss https://arxiv.org/abs/2406.11838

Python 1,382 77 Updated Sep 27, 2024

SALMONN: Speech Audio Language Music Open Neural Network

Python 1,191 96 Updated Mar 4, 2025

Offical code for the CVPR 2024 Paper: Separating the "Chirp" from the "Chat": Self-supervised Visual Grounding of Sound and Language

Jupyter Notebook 72 12 Updated Jun 12, 2024

Python module for syllabifying English ARPABET transcriptions

Python 66 16 Updated Feb 15, 2019

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audi…

Python 8,856 692 Updated Mar 3, 2025

Models and code for RepCodec: A Speech Representation Codec for Speech Tokenization

Python 169 12 Updated Jul 12, 2024

This is the code for the SpeechTokenizer presented in the SpeechTokenizer: Unified Speech Tokenizer for Speech Language Models. Samples are presented on

Python 537 49 Updated Jun 9, 2024

Official PyTorch implementation of GroupViT: Semantic Segmentation Emerges from Text Supervision, CVPR 2022.

Python 755 52 Updated May 10, 2022

Implementation of Generating Diverse High-Fidelity Images with VQ-VAE-2 in PyTorch

Python 1,704 278 Updated Feb 15, 2023

[ICLR 2024] Fine-tuning LLaMA to follow Instructions within 1 Hour and 1.2M Parameters

Python 5,841 380 Updated Mar 14, 2024

Phoneme segmentation using pre-trained speech models

Python 55 10 Updated Nov 4, 2022

multilingual speech aligner

Python 72 5 Updated Nov 19, 2023

Code for the IEEE Signal Processing Letters 2022 paper "UAVM: Towards Unifying Audio and Visual Models".

Python 54 3 Updated Apr 20, 2023

Unsupervised phone and word segmentation using dynamic programming on self-supervised VQ features.

Jupyter Notebook 36 8 Updated Mar 4, 2024

A curated list of awesome adversarial reprogramming and input prompting methods for neural networks since 2022

Python 36 Updated Nov 30, 2023

Word Discovery in Visually Grounded, Self-Supervised Speech Models

Jupyter Notebook 26 7 Updated Dec 4, 2023

Implementation of N-Grammer, augmenting Transformers with latent n-grams, in Pytorch

Python 73 1 Updated Dec 4, 2022

Robust Speech Recognition via Large-Scale Weak Supervision

Python 78,963 9,475 Updated Jan 4, 2025

Python implementation of simple GMM and HMM models for isolated digit recognition.

Python 63 22 Updated Feb 7, 2021

X (weighted / probabilistic) Context-Free Grammars

Python 25 2 Updated Jan 30, 2024

Speech2Vec Reality Check

Python 82 4 Updated Feb 21, 2023

Fast and Modularized CFG-focused Models

Python 23 1 Updated Nov 8, 2023

"rsync for cloud storage" - Google Drive, S3, Dropbox, Backblaze B2, One Drive, Swift, Hubic, Wasabi, Google Cloud Storage, Azure Blob, Azure Files, Yandex Files

Go 49,527 4,413 Updated Mar 26, 2025

Google Drive CLI Client

Go 8,975 1,184 Updated Apr 19, 2023
Next
Showing results