TTS: Pocket (Voice Cloning)

Generate speech with the Pocket TTS model using voice cloning. Pocket uses a reference audio clip to clone the speaker’s voice for the generated speech.

Source file

nodejs-addon-examples/test_tts_non_streaming_pocket_en.js

Code

 1// Copyright (c)  2026  Xiaomi Corporation
 2//
 3// Text-to-speech with the Pocket TTS model (voice cloning).
 4// Uses a reference audio to clone the speaker's voice.
 5//
 6// Usage:
 7//   node tts_pocket_sync.js
 8//
 9const sherpa_onnx = require('sherpa-onnx-node');
10
11function createOfflineTts() {
12  const config = {
13    model: {
14      pocket: {
15        lmFlow: './sherpa-onnx-pocket-tts-int8-2026-01-26/lm_flow.int8.onnx',
16        lmMain: './sherpa-onnx-pocket-tts-int8-2026-01-26/lm_main.int8.onnx',
17        encoder: './sherpa-onnx-pocket-tts-int8-2026-01-26/encoder.onnx',
18        decoder: './sherpa-onnx-pocket-tts-int8-2026-01-26/decoder.int8.onnx',
19        textConditioner:
20            './sherpa-onnx-pocket-tts-int8-2026-01-26/text_conditioner.onnx',
21        vocabJson: './sherpa-onnx-pocket-tts-int8-2026-01-26/vocab.json',
22        tokenScoresJson:
23            './sherpa-onnx-pocket-tts-int8-2026-01-26/token_scores.json',
24        voiceEmbeddingCacheCapacity: 50,
25      },
26      debug: true,
27      numThreads: 2,
28      provider: 'cpu',
29    },
30    maxNumSentences: 1,
31  };
32  return new sherpa_onnx.OfflineTts(config);
33}
34
35const tts = createOfflineTts();
36
37const text =
38    'Today as always, men fall into two groups: slaves and free men. Whoever does not have two-thirds of his day for himself, is a slave, whatever he may be: a statesman, a businessman, an official, or a scholar.';
39
40// Pocket TTS uses reference audio for voice cloning.
41const referenceAudioFilename =
42    './sherpa-onnx-pocket-tts-int8-2026-01-26/test_wavs/bria.wav';
43const referenceWave = sherpa_onnx.readWave(referenceAudioFilename);
44
45const generationConfig = new sherpa_onnx.GenerationConfig({
46  speed: 1.0,
47  referenceAudio: referenceWave.samples,
48  referenceSampleRate: referenceWave.sampleRate,
49  numSteps: 5,
50  extra: {max_reference_audio_len: 12, seed: 42}
51});
52
53let start = Date.now();
54const audio = tts.generate({text, generationConfig});
55let stop = Date.now();
56const elapsed_seconds = (stop - start) / 1000;
57const duration = audio.samples.length / audio.sampleRate;
58const real_time_factor = elapsed_seconds / duration;
59console.log('Wave duration', duration.toFixed(3), 'seconds');
60console.log('Elapsed', elapsed_seconds.toFixed(3), 'seconds');
61console.log(
62    `RTF = ${elapsed_seconds.toFixed(3)}/${duration.toFixed(3)} =`,
63    real_time_factor.toFixed(3));
64
65const filename = 'test-pocket-bria.wav';
66sherpa_onnx.writeWave(
67    filename, {samples: audio.samples, sampleRate: audio.sampleRate});
68
69console.log(`Saved to ${filename}`);

How to run

  1. Install the package:

    npm install sherpa-onnx-node
    
  2. Download the model:

    curl -SL -O https://github.com/k2-fsa/sherpa-onnx/releases/download/tts-models/sherpa-onnx-pocket-tts-int8-2026-01-26.tar.bz2
    tar xf sherpa-onnx-pocket-tts-int8-2026-01-26.tar.bz2
    rm sherpa-onnx-pocket-tts-int8-2026-01-26.tar.bz2
    
  3. Set the library path and run:

    # macOS
    export DYLD_LIBRARY_PATH=$(npm root)/sherpa-onnx-node/lib:$DYLD_LIBRARY_PATH
    
    # Linux
    export LD_LIBRARY_PATH=$(npm root)/sherpa-onnx-node/lib:$LD_LIBRARY_PATH
    
    node tts_pocket_sync.js
    

Notes

  • Pocket TTS uses voice cloning via referenceAudio in the GenerationConfig. Provide a WAV file of the target speaker.

  • The config key is pocket with fields: lmFlow, lmMain, encoder, decoder, textConditioner, vocabJson, tokenScoresJson, voiceEmbeddingCacheCapacity.

  • GenerationConfig fields for Pocket: - referenceAudio: Float32Array of the reference audio samples. - referenceSampleRate: Sample rate of the reference audio. - numSteps: Number of diffusion steps (e.g., 5). - extra.max_reference_audio_len: Max reference audio length in seconds. - extra.seed: Random seed for reproducibility.

  • Pocket also supports async generation with createAsync() and generateAsync(). See the async example and play async example.