Supertonic

Lightning Fast, On-Device TTS.

Incredibly lightweight and blazingly fast, running natively in your environment via ONNX.

GitHub ↗ Models ↗ Supertone ↗

Parameters

66M

PROGRAMMING LANGUAGES

On-Device

100%

Characters per second

Supertonic can process over 12,000 characters on high-end GPUs,
and up to about 2,500 characters on consumer laptops.

See It In Action

Experience the speed and quality

Raspberry Pi

Watch Supertonic running on a Raspberry Pi, demonstrating on-device, real-time text-to-speech synthesis

Browser Extension ↗

Turns any webpage into audio in under one second, delivering lightning-fast, on-device text-to-speech with zero network dependency—free, private, and effortless.

E-Reader

Experience Supertonic on an Onyx Boox Go 6 e-reader in airplane mode, achieving an average RTF of 0.3× with zero network dependency

Try It Yourself

Generate speech directly in your browser

Text

Speech

Enter text to synthesize:

126 characters ✓

Voice:

Female Male

Quality (Steps): 5 Higher = Better quality, slower inference

Speech Length: 1 Higher = Longer speech duration

API Keys Optional for performance comparison

🎙️

Your generated speech will appear here

Initializing... Loading models...

Supertonic 2 now supports 5 languages and 10 voices. Try the demo here.

Try Supertonic 2 ↗

Supertonic is now fully supported in Transformers.js 🤗!
It uses a quantized ONNX version of Supertonic for faster inference.

Try it out ↗

Text handling

Handles text like humans

Financial Expressions

Currency symbols, codes, and abbreviations (M, K, $)

The startup secured $5.2M in venture capital,
a huge leap from their initial $450K seed round.

Supertonic

Time and Date

Time and date formats, abbreviated weekdays/months

The train delay was announced at 4:45 PM on Wed, Apr 3, 2024 due to track maintenance.

Supertonic

Phone Numbers

Area codes, hyphens, extensions (ext.)

You can reach the hotel front desk at (212) 555-0142 ext. 402 anytime.

Supertonic

Technical Units

Numbers with units, abbreviated technical notations

Our drone battery lasts 2.3h when flying at 30kph with full camera payload.

Supertonic

Why Supertonic?

Performance by design

Blazingly Fast

Optimized ONNX Runtime inference delivers speech synthesis at unprecedented speeds. No more waiting.

Ultra Lightweight

Minimal footprint means it runs smoothly on any device - from servers to embedded systems.

On-Device Capable

Complete privacy and zero latency. All processing happens locally - no cloud dependencies.

Natural Text Handling

Seamlessly processes numbers, dates, currency, abbreviations, and complex expressions without pre-processing.

Highly Configurable

Adjust inference steps, batch processing, and other parameters to match your specific needs.

Flexible Deployment

Deploy seamlessly across servers, browsers, and edge devices with multiple runtime backends.

Programming Languages

Fast setup,
instant execution.

Experience Supertonic in your favorite language.

Research & Innovation

Built on our cutting-edge research

SupertonicTTS: Towards Highly Efficient and Streamlined Text-to-Speech System

Hyeongju Kim, Jinhyeok Yang, Yechan Yu, Seunghun Ji, Jacob Morton, Frederik Bous, Joon Byun, Juheon Lee

Presents SupertonicTTS, a highly efficient TTS framework built on flow-matching and ConvNeXt blocks. Features context-sharing batch expansion, character-level processing, and cross-attention alignment without the need for external G2P modules or aligners.

Read Paper↗

Length-Aware Rotary Position Embedding for Text-Speech Alignment

Hyeongju Kim, Juheon Lee, Jinhyeok Yang, Jacob Morton

Introduces LARoPE, an improved position embedding method that enables faster convergence, more accurate alignment, and better stability in extended speech generation up to 30 seconds. Achieves state-of-the-art word error rate on zero-shot TTS benchmarks.

Read Paper↗

Training Flow Matching Models with Reliable Labels via Self-Purification

Hyeongju Kim, Yechan Yu, June Young Yi, Juheon Lee

Proposes Self-Purifying Flow Matching (SPFM), a principled approach to handle noisy training data. Identifies unreliable samples during training without pretrained models, ensuring accurate conditioning even with label contamination.

Read Paper↗

On-Device SDK Solutions for Your Business

Looking to integrate Supertonic into your product?
We offer customized on-device SDK solutions tailored to your business needs. Our lightweight, high-performance TTS technology can be seamlessly integrated into mobile apps, IoT devices, automotive systems, and more.

Contact ↗

Supertonic

Model comparison