Skip to content

Music Source Separation Training Inference Webui, besides, we packed UVR together!

License

Notifications You must be signed in to change notification settings

larry3033/MSST-WebUI

 
 

Repository files navigation

MSST-WebUI

Music Source Separation Training Inference Webui, besides, we packed UVR together!

Introduction

This is a webUI for Music-Source-Separation-Training, which is a music source separation training framework. You can use this webUI to infer the MSST model and UVR VR.Models (The inference code comes from python-audio-separator, and we do some changes on it), and the preset process page allows you to customize the processing flow yourself. You can install models in the "Install Models" interface. If you have downloaded Ultimate Vocal Remover before, you do not need to download VR.Models again. You can go to the "Settings" page and directly select your UVR5 model folder. Finally, we also provide four convenient tools in the webUI.

Usage

You can go to Releases to download the installer

中国用户可以从下方链接下载

下载地址:https://www.123pan.com/s/1bmETd-AefWh.html 提取码:1145
B站教程视频:https://www.bilibili.com/video/BV18m42137rm

Run from source

  1. Clone this repository.

  2. Create Python environment and install the requirements.

    conda create -n msst python=3.10 -y
    conda activate msst
    pip install -r requirements.txt

Note

  1. You may meet some problems when using UVR-Separate, they comes from dependances Librosa. This issue occurs in the line around 2000 lines in 'libsora/util/utils.py' with 'np.dtype(np.float).type'. You can manually specify it as 'float32' or 'float64' to resolve this issue.(Do not attempt to install an older version of NumPy to solve this problem, as older versions of NumPy do not support Python 3.10, and using a version of Python other than 3.10 may prevent other modules from being installed.)
  2. During install the requirements, you may meet conflict between huggingface-Hub and gradio. Just ignore it, it will not affect the use of the webUI.
  1. Run the webui use the following command.

    python webUI.py

Command Line

MSST Inference

Use Inference.py

usage: inference.py [-h] [--model_type MODEL_TYPE] [--config_path CONFIG_PATH] [--start_check_point START_CHECK_POINT] [--input_folder INPUT_FOLDER] [--store_dir STORE_DIR]
                    [--device_ids DEVICE_IDS [DEVICE_IDS ...]] [--extract_instrumental] [--force_cpu]

options:
  -h, --help            show this help message and exit
  --model_type MODEL_TYPE
                        One of mdx23c, htdemucs, segm_models, mel_band_roformer, bs_roformer, swin_upernet, bandit
  --config_path CONFIG_PATH
                        path to config file
  --start_check_point START_CHECK_POINT
                        Initial checkpoint to valid weights
  --input_folder INPUT_FOLDER
                        folder with mixtures to process
  --store_dir STORE_DIR
                        path to store results as wav file
  --device_ids DEVICE_IDS [DEVICE_IDS ...]
                        list of gpu ids
  --extract_instrumental
                        invert vocals to get instrumental if provided
  --force_cpu           Force the use of CPU even if CUDA is available

UVR Inference

Use uvr_inference.py

Note

Only VR_Models can be used for UVR Inference. You can use other models like MDX23C models and HTDemucs models in MSST Inference. Fix: You can now import folder_path for UVR Inference!

usage: uvr_inference.py [-h] [-d] [-e] [-l] [--log_level LOG_LEVEL] [-m MODEL_FILENAME] 
                        [--output_format OUTPUT_FORMAT] [--output_dir OUTPUT_DIR] [--model_file_dir MODEL_FILE_DIR] 
                        [--invert_spect] [--normalization NORMALIZATION] [--single_stem SINGLE_STEM] [--sample_rate SAMPLE_RATE] [--use_cpu]
                        [--vr_batch_size VR_BATCH_SIZE] [--vr_window_size VR_WINDOW_SIZE] [--vr_aggression VR_AGGRESSION] [--vr_enable_tta] [--vr_high_end_process] [--vr_enable_post_process]
                        [--vr_post_process_threshold VR_POST_PROCESS_THRESHOLD] 
                        [audio_file]

Separate audio file into different stems.

positional arguments:
  audio_file                                             The audio file path to separate, in any common format. You can input file path or file folder path

options:
  -h, --help                                             show this help message and exit

Info and Debugging:
  -d, --debug                                            Enable debug logging, equivalent to --log_level=debug.
  -e, --env_info                                         Print environment information and exit.
  -l, --list_models                                      List all supported models and exit.
  --log_level LOG_LEVEL                                  Log level, e.g. info, debug, warning (default: info).

Separation I/O Params:
  -m MODEL_FILENAME, --model_filename MODEL_FILENAME     model to use for separation (default: model_mel_band_roformer_ep_3005_sdr_11.4360.ckpt). Example: -m 2_HP-UVR.pth
  --output_format OUTPUT_FORMAT                          output format for separated files, any common format (default: FLAC). Example: --output_format=MP3
  --output_dir OUTPUT_DIR                                directory to write output files (default: <current dir>). Example: --output_dir=/app/separated
  --model_file_dir MODEL_FILE_DIR                        model files directory (default: /tmp/audio-separator-models/). Example: --model_file_dir=/app/models

Common Separation Parameters:
  --invert_spect                                         invert secondary stem using spectogram (default: False). Example: --invert_spect
  --normalization NORMALIZATION                          max peak amplitude to normalize input and output audio to (default: 0.9). Example: --normalization=0.7
  --single_stem SINGLE_STEM                              output only single stem, e.g. Instrumental, Vocals, Drums, Bass, Guitar, Piano, Other. Example: --single_stem=Instrumental
  --sample_rate SAMPLE_RATE                              modify the sample rate of the output audio (default: 44100). Example: --sample_rate=44100
  --use_cpu                                              use CPU instead of GPU for inference

VR Architecture Parameters:
  --vr_batch_size VR_BATCH_SIZE                          number of batches to process at a time. higher = more RAM, slightly faster processing (default: 4). Example: --vr_batch_size=16        
  --vr_window_size VR_WINDOW_SIZE                        balance quality and speed. 1024 = fast but lower, 320 = slower but better quality. (default: 512). Example: --vr_window_size=320       
  --vr_aggression VR_AGGRESSION                          intensity of primary stem extraction, -100 - 100. typically 5 for vocals & instrumentals (default: 5). Example: --vr_aggression=2      
  --vr_enable_tta                                        enable Test-Time-Augmentation; slow but improves quality (default: False). Example: --vr_enable_tta
  --vr_high_end_process                                  mirror the missing frequency range of the output (default: False). Example: --vr_high_end_process
  --vr_enable_post_process                               identify leftover artifacts within vocal output; may improve separation for some songs (default: False). Example: --vr_enable_post_process
  --vr_post_process_threshold VR_POST_PROCESS_THRESHOLD  threshold for post_process feature: 0.1-0.3 (default: 0.2). Example: --vr_post_process_threshold=0.1

Train MSST

Use train.py

usage: train.py [-h] [--model_type MODEL_TYPE] [--config_path CONFIG_PATH] [--start_check_point START_CHECK_POINT] [--results_path RESULTS_PATH] [--data_path DATA_PATH [DATA_PATH ...]]
                [--dataset_type DATASET_TYPE] [--valid_path VALID_PATH [VALID_PATH ...]] [--num_workers NUM_WORKERS] [--pin_memory PIN_MEMORY] [--seed SEED]
                [--device_ids DEVICE_IDS [DEVICE_IDS ...]] [--use_multistft_loss] [--use_mse_loss] [--use_l1_loss]

options:
  -h, --help            show this help message and exit
  --model_type MODEL_TYPE
                        One of mdx23c, htdemucs, segm_models, mel_band_roformer, bs_roformer, swin_upernet, bandit
  --config_path CONFIG_PATH
                        path to config file
  --start_check_point START_CHECK_POINT
                        Initial checkpoint to start training
  --results_path RESULTS_PATH
                        path to folder where results will be stored (weights, metadata)
  --data_path DATA_PATH [DATA_PATH ...]
                        Dataset data paths. You can provide several folders.
  --dataset_type DATASET_TYPE
                        Dataset type. Must be one of: 1, 2, 3 or 4. Details here: https://github.com/ZFTurbo/Music-Source-Separation-Training/blob/main/docs/dataset_types.md
  --valid_path VALID_PATH [VALID_PATH ...]
                        validation data paths. You can provide several folders.
  --num_workers NUM_WORKERS
                        dataloader num_workers
  --pin_memory PIN_MEMORY
                        dataloader pin_memory
  --seed SEED           random seed
  --device_ids DEVICE_IDS [DEVICE_IDS ...]
                        list of gpu ids
  --use_multistft_loss  Use MultiSTFT Loss (from auraloss package)
  --use_mse_loss        Use default MSE loss
  --use_l1_loss         Use L1 loss

Thanks

About

Music Source Separation Training Inference Webui, besides, we packed UVR together!

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%