Source separation models
This page lists the source separation models supported in sherpa-onnx.
We only describe some
of the models. You can find ALL
models from
the following address:
Spleeter
It is from https://github.com/deezer/spleeter.
We only support the 2-stem
model at present.
Hint
For those who want to learn how to convert the PyTorch checkpoint to the model supported in sherpa-onnx, please see the scripts in following address:
There variants of the 2-stem
models are given below:
Model |
Comment |
No quantization |
|
|
|
|
We describe how to use the fp16
quantized model. Steps below are also applicable to other variants.
Download the model
Please use the following commands to download it:
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/source-separation-models/sherpa-onnx-spleeter-2stems-fp16.tar.bz2
tar xvf sherpa-onnx-spleeter-2stems-fp16.tar.bz2
rm sherpa-onnx-spleeter-2stems-fp16.tar.bz2
ls -lh sherpa-onnx-spleeter-2stems-fp16
You should see the following output:
$ ls -lh sherpa-onnx-spleeter-2stems-fp16/
total 76880
-rw-r--r-- 1 fangjun staff 19M May 23 15:27 accompaniment.fp16.onnx
-rw-r--r-- 1 fangjun staff 19M May 23 15:27 vocals.fp16.onnx
Download test files
We use the following two test wave files:
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/source-separation-models/qi-feng-le-zh.wav
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/source-separation-models/audio_example.wav
ls -lh audio_example.wav qi-feng-le-zh.wav
-rw-r--r--@ 1 fangjun staff 1.8M May 23 15:59 audio_example.wav
-rw-r--r--@ 1 fangjun staff 4.4M May 23 22:06 qi-feng-le-zh.wav
Hint
To make things easier, we support only *.wav
files. If you have other formats, e.g.,
*.mp3
, *.mp4
, or *.mov
, you can use
ffmpeg -i your.mp3 -vn -acodec pcm_s16le -ar 44100 -ac 2 your.wav ffmpeg -i your.mp4 -vn -acodec pcm_s16le -ar 44100 -ac 2 your.wav ffmpeg -i your.mov -vn -acodec pcm_s16le -ar 44100 -ac 2 your.wav
to convert them to *.wav
files.
The downloaded test files are given below.
Wave filename | Content |
---|---|
qi-feng-le-zh.wav | |
audio_example.wav |
Example 1/2 with qi-feng-le-zh.wav
./build/bin/sherpa-onnx-offline-source-separation \
--spleeter-vocals=sherpa-onnx-spleeter-2stems-fp16/vocals.fp16.onnx \
--spleeter-accompaniment=sherpa-onnx-spleeter-2stems-fp16/accompaniment.fp16.onnx \
--num-threads=1 \
--input-wav=./qi-feng-le-zh.wav \
--output-vocals-wav=spleeter_qi_feng_le_vocals.wav \
--output-accompaniment-wav=spleeter_qi_feng_le_non_vocals.wav
Output logs are given below:
OfflineSourceSeparationConfig(model=OfflineSourceSeparationModelConfig(spleeter=OfflineSourceSeparationSpleeterModelConfig(vocals="sherpa-onnx-spleeter-2stems-fp16/vocals.fp16.onnx", accompaniment="sherpa-onnx-spleeter-2stems-fp16/accompaniment.fp16.onnx"), uvr=OfflineSourceSeparationUvrModelConfig(model=""), num_threads=1, debug=False, provider="cpu"))
Started
Done
Saved to write to 'spleeter_qi_feng_le_vocals.wav' and 'spleeter_qi_feng_le_non_vocals.wav'
num threads: 1
Elapsed seconds: 2.052 s
Real time factor (RTF): 2.052 / 26.102 = 0.079
Hint
Pay special attention to its RTF
. It is super fast, on CPU, with only 1 thread!
Wave filename | Content |
---|---|
qi-feng-le-zh.wav | |
spleeter_qi_feng_le_vocals.wav | |
spleeter_qi_feng_le_non_vocals.wav |
Example 2/2 with audio_example.wav
./build/bin/sherpa-onnx-offline-source-separation \
--spleeter-vocals=sherpa-onnx-spleeter-2stems-fp16/vocals.fp16.onnx \
--spleeter-accompaniment=sherpa-onnx-spleeter-2stems-fp16/accompaniment.fp16.onnx \
--num-threads=1 \
--input-wav=./audio_example.wav \
--output-vocals-wav=spleeter_audio_example_vocals.wav \
--output-accompaniment-wav=spleeter_audio_example_non_vocals.wav
Output logs are given below:
/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/parse-options.cc:Read:372 ./build/bin/sherpa-onnx-offline-source-separation --spleeter-vocals=sherpa-onnx-spleeter-2stems-fp16/vocals.fp16.onnx --spleeter-accompaniment=sherpa-onnx-spleeter-2stems-fp16/accompaniment.fp16.onnx --num-threads=1 --input-wav=./audio_example.wav --output-vocals-wav=spleeter_audio_example_vocals.wav --output-accompaniment-wav=spleeter_audio_example_non_vocals.wav
OfflineSourceSeparationConfig(model=OfflineSourceSeparationModelConfig(spleeter=OfflineSourceSeparationSpleeterModelConfig(vocals="sherpa-onnx-spleeter-2stems-fp16/vocals.fp16.onnx", accompaniment="sherpa-onnx-spleeter-2stems-fp16/accompaniment.fp16.onnx"), uvr=OfflineSourceSeparationUvrModelConfig(model=""), num_threads=1, debug=False, provider="cpu"))
Started
Done
Saved to write to 'spleeter_audio_example_vocals.wav' and 'spleeter_audio_example_non_vocals.wav'
num threads: 1
Elapsed seconds: 0.787 s
Real time factor (RTF): 0.787 / 10.919 = 0.072
Hint
Pay special attention to its RTF
. It is super fast, on CPU, with only 1 thread!
Wave filename | Content |
---|---|
audio_example.wav | |
spleeter_audio_example_vocals.wav | |
spleeter_audio_example_non_vocals.wav |
RTF on RK3588
We use the following code to test the RTF of Spleeter on RK3588
with Cortex A76
CPU.
# 1 thread
taskset 0x80 ./build/bin/sherpa-onnx-offline-source-separation \
--num-threads=1 \
--spleeter-vocals=sherpa-onnx-spleeter-2stems-fp16/vocals.fp16.onnx \
--spleeter-accompaniment=sherpa-onnx-spleeter-2stems-fp16/accompaniment.fp16.onnx \
--input-wav=./qi-feng-le-zh.wav \
--output-vocals-wav=spleeter_qi_feng_le_vocals.wav \
--output-accompaniment-wav=spleeter_qi_feng_le_non_vocals.wav
# 2 threads
taskset 0xc0 ./build/bin/sherpa-onnx-offline-source-separation \
--num-threads=2 \
--spleeter-vocals=sherpa-onnx-spleeter-2stems-fp16/vocals.fp16.onnx \
--spleeter-accompaniment=sherpa-onnx-spleeter-2stems-fp16/accompaniment.fp16.onnx \
--input-wav=./qi-feng-le-zh.wav \
--output-vocals-wav=spleeter_qi_feng_le_vocals.wav \
--output-accompaniment-wav=spleeter_qi_feng_le_non_vocals.wav
# 3 threads
taskset 0xe0 ./build/bin/sherpa-onnx-offline-source-separation \
--num-threads=3 \
--spleeter-vocals=sherpa-onnx-spleeter-2stems-fp16/vocals.fp16.onnx \
--spleeter-accompaniment=sherpa-onnx-spleeter-2stems-fp16/accompaniment.fp16.onnx \
--input-wav=./qi-feng-le-zh.wav \
--output-vocals-wav=spleeter_qi_feng_le_vocals.wav \
--output-accompaniment-wav=spleeter_qi_feng_le_non_vocals.wav
# 4 threads
taskset 0xf0 ./build/bin/sherpa-onnx-offline-source-separation \
--num-threads=4 \
--spleeter-vocals=sherpa-onnx-spleeter-2stems-fp16/vocals.fp16.onnx \
--spleeter-accompaniment=sherpa-onnx-spleeter-2stems-fp16/accompaniment.fp16.onnx \
--input-wav=./qi-feng-le-zh.wav \
--output-vocals-wav=spleeter_qi_feng_le_vocals.wav \
--output-accompaniment-wav=spleeter_qi_feng_le_non_vocals.wav
The results are given below:
num_threads
1
2
3
4
RTF on Cortex A76 CPU
0.258
0.176
0.138
0.127
Python example
Please see
UVR
It is from https://github.com/TRvlvr/model_repo/releases/tag/all_public_uvr_models.
Hint
For those who want to learn how to add meta data to the original ONNX models, please see the scripts in following address:
Warning
Please download UVR
models from https://github.com/k2-fsa/sherpa-onnx/releases/tag/source-separation-models
Please don't
download UVR
models from https://github.com/TRvlvr/model_repo/releases/tag/all_public_uvr_models
We support the following UVR
models for source separation.
Model |
File size (MB) |
63.7 |
|
63.7 |
|
63.7 |
|
63.7 |
|
63.7 |
|
63.7 |
|
56.3 |
|
56.3 |
|
50.3 |
|
63.7 |
|
56.3 |
|
28.3 |
|
28.3 |
|
28.3 |
|
28.3 |
|
28.3 |
|
50.3 |
|
63.7 |
In the following, we show how to use the model UVR_MDXNET_9482.onnx
Download the model
Please use the following commands to download it:
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/source-separation-models/UVR_MDXNET_9482.onnx
ls -lh UVR_MDXNET_9482.onnx
You should see the following output:
ls -lh UVR_MDXNET_9482.onnx
-rw-r--r-- 1 fangjun staff 28M May 31 13:33 UVR_MDXNET_9482.onnx
Download test files
We use the following two test wave files:
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/source-separation-models/qi-feng-le-zh.wav
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/source-separation-models/audio_example.wav
ls -lh audio_example.wav qi-feng-le-zh.wav
-rw-r--r--@ 1 fangjun staff 1.8M May 23 15:59 audio_example.wav
-rw-r--r--@ 1 fangjun staff 4.4M May 23 22:06 qi-feng-le-zh.wav
Hint
To make things easier, we support only *.wav
files. If you have other formats, e.g.,
*.mp3
, *.mp4
, or *.mov
, you can use
ffmpeg -i your.mp3 -vn -acodec pcm_s16le -ar 44100 -ac 2 your.wav ffmpeg -i your.mp4 -vn -acodec pcm_s16le -ar 44100 -ac 2 your.wav ffmpeg -i your.mov -vn -acodec pcm_s16le -ar 44100 -ac 2 your.wav
to convert them to *.wav
files.
The downloaded test files are given below.
Wave filename | Content |
---|---|
qi-feng-le-zh.wav | |
audio_example.wav |
Example 1/2 with qi-feng-le-zh.wav
./build/bin/sherpa-onnx-offline-source-separation \
--num-threads=1 \
--uvr-model=./UVR_MDXNET_9482.onnx \
--input-wav=./qi-feng-le-zh.wav \
--output-vocals-wav=uvr_qi_feng_le_vocals.wav \
--output-accompaniment-wav=uvr_qi_feng_le_non_vocals.wav
Output logs are given below:
/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/parse-options.cc:Read:372 ./build/bin/sherpa-onnx-offline-source-separation --num-threads=1 --uvr-model=./UVR_MDXNET_9482.onnx --input-wav=./qi-feng-le-zh.wav --output-vocals-wav=uvr_qi_feng_le_vocals.wav --output-accompaniment-wav=uvr_qi_feng_le_non_vocals.wav
OfflineSourceSeparationConfig(model=OfflineSourceSeparationModelConfig(spleeter=OfflineSourceSeparationSpleeterModelConfig(vocals="", accompaniment=""), uvr=OfflineSourceSeparationUvrModelConfig(model="./UVR_MDXNET_9482.onnx"), num_threads=1, debug=False, provider="cpu"))
Started
Done
Saved to write to 'uvr_qi_feng_le_vocals.wav' and 'uvr_qi_feng_le_non_vocals.wav'
num threads: 1
Elapsed seconds: 19.110 s
Real time factor (RTF): 19.110 / 26.102 = 0.732
Hint
It is 10x
slower than Spleeter
! Also, we have selected a small model.
If you select a model with more parameters, it is even slower.
Wave filename | Content |
---|---|
qi-feng-le-zh.wav | |
uvr_qi_feng_le_vocals.wav | |
uvr_qi_feng_le_non_vocals.wav |
Example 2/2 with audio_example.wav
./build/bin/sherpa-onnx-offline-source-separation \
--num-threads=1 \
--uvr-model=./UVR_MDXNET_9482.onnx \
--input-wav=./audio_example.wav \
--output-vocals-wav=uvr_audio_example_vocals.wav \
--output-accompaniment-wav=uvr_audio_example_non_vocals.wav
Output logs are given below:
/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/parse-options.cc:Read:372 ./build/bin/sherpa-onnx-offline-source-separation --num-threads=1 --uvr-model=./UVR_MDXNET_9482.onnx --input-wav=./audio_example.wav --output-vocals-wav=uvr_audio_example_vocals.wav --output-accompaniment-wav=uvr_audio_example_non_vocals.wav
OfflineSourceSeparationConfig(model=OfflineSourceSeparationModelConfig(spleeter=OfflineSourceSeparationSpleeterModelConfig(vocals="", accompaniment=""), uvr=OfflineSourceSeparationUvrModelConfig(model="./UVR_MDXNET_9482.onnx"), num_threads=1, debug=False, provider="cpu"))
Started
Done
Saved to write to 'uvr_audio_example_vocals.wav' and 'uvr_audio_example_non_vocals.wav'
num threads: 1
Elapsed seconds: 6.420 s
Real time factor (RTF): 6.420 / 10.919 = 0.588
Wave filename | Content |
---|---|
audio_example.wav | |
uvr_audio_example_vocals.wav | |
uvr_audio_example_non_vocals.wav |
Python example
Please see