Triton

Nvidia Triton Inference Server provides a cloud and edge inferencing solution optimized for both CPUs and GPUs.

The following content describes how to deploy ASR models trained by icefall using Triton.

Environment Preparetion

Installation
- Build Triton Image
- Launch a inference container

Triton Server

Triton-server
- Deploy streaming ASR models with Onnx
- Deploy offline ASR models with torchscript

Triton Client

Triton-client
- Send requests using client
- Decode manifests

Benchmark with Perf Analyzer

Perf Analyzer
- Generate Input Data from Audio Files
- Test Throughput using Perf Analyzer

TensorRT acceleration

TensorRT acceleration