Financial Expressions
Currency symbols, codes, and abbreviations (M, K, $)
The startup secured $5.2M in venture capital,
a huge leap from their initial $450K seed round.
Lightning Fast, On-Device TTS.
Incredibly lightweight and blazingly fast, running natively in your environment via ONNX.
Supertonic can process over 12,000 characters on high-end GPUs,
and up to about 2,500 characters on consumer laptops.
Watch Supertonic running on a Raspberry Pi, demonstrating on-device, real-time text-to-speech synthesis
Turns any webpage into audio in under one second, delivering lightning-fast, on-device text-to-speech with zero network dependency—free, private, and effortless.
Experience Supertonic on an Onyx Boox Go 6 e-reader in airplane mode, achieving an average RTF of 0.3× with zero network dependency
Your generated speech will appear here
Supertonic is now fully supported in Transformers.js 🤗!
It uses a quantized ONNX version of Supertonic for faster inference.
Currency symbols, codes, and abbreviations (M, K, $)
The startup secured $5.2M in venture capital,
a huge leap from their initial $450K seed round.
Time and date formats, abbreviated weekdays/months
The train delay was announced at 4:45 PM on Wed, Apr 3, 2024 due to track maintenance.
Area codes, hyphens, extensions (ext.)
You can reach the hotel front desk at (212) 555-0142 ext. 402 anytime.
Numbers with units, abbreviated technical notations
Our drone battery lasts 2.3h when flying at 30kph with full camera payload.
Optimized ONNX Runtime inference delivers speech synthesis at unprecedented speeds. No more waiting.
Minimal footprint means it runs smoothly on any device - from servers to embedded systems.
Complete privacy and zero latency. All processing happens locally - no cloud dependencies.
Seamlessly processes numbers, dates, currency, abbreviations, and complex expressions without pre-processing.
Adjust inference steps, batch processing, and other parameters to match your specific needs.
Deploy seamlessly across servers, browsers, and edge devices with multiple runtime backends.
Presents SupertonicTTS, a highly efficient TTS framework built on flow-matching and ConvNeXt blocks. Features context-sharing batch expansion, character-level processing, and cross-attention alignment without the need for external G2P modules or aligners.
Read Paper↗Introduces LARoPE, an improved position embedding method that enables faster convergence, more accurate alignment, and better stability in extended speech generation up to 30 seconds. Achieves state-of-the-art word error rate on zero-shot TTS benchmarks.
Read Paper↗Proposes Self-Purifying Flow Matching (SPFM), a principled approach to handle noisy training data. Identifies unreliable samples during training without pretrained models, ensuring accurate conditioning even with label contamination.
Read Paper↗