Supertonic: Lightning-Fast On-Device Multilingual TTS with ONNX
文章目录
- Truly On-Device — All inference runs locally via ONNX Runtime. No audio ever leaves the device, making it ideal for privacy-sensitive applications, embedded systems, and offline-first products. Multilingual Out of the Box — Supports English (US/UK/AU), Spanish, French, German, Italian, Portuguese, Polish, Hindi, Japanese, Korean, and Chinese — with voices tailored to regional accents. Cross-Platform SDK — Native Swift library with production-ready bindings for Go, Rust, Python, Java, C#, and Node.js. Zero external dependencies at runtime beyond ONNX. Fast Inference — Optimized ONNX graph delivers real-time or faster-than-real-time synthesis on mid-range hardware, suitable for interactive applications.
-
- User zachswift615 reported a reproducibility issue: I can't identify a pattern yet, but in my tests, every 2 or 3 sentences a word or 2 get totally skipped. It doesn't seem to be specific words because if I process the same sentence multiple times, it'll be fine sometimes and some words will be missing other times. I've tried with 5x and 15x iterations on the diffusion. This spawned a 10-comment thread with users sharing workarounds involving iteration count tuning and text preprocessing. The issue remains open, and the team has acknowledged it as a diffusion refinement artifact.
- User BradDDi asked about voice customization: None of the default voices provided are suitable for my use. I'd like to create a different custom voice. Is there a way to generate custom voice profiles or adjust speech style parameters? The team pointed to their JSON-based voice descriptor format, but full training/fine-tuning tooling is not yet public. Alternatives like NeuTTS NANO and Soprano offer fine-tuning — though neither matches Supertonic's on-device performance.
- User MattyMrozKun requested Polish support:
Please, I must have the Polish version of this
This became a popular thread with 10 comments from users requesting Korean, Arabic, Czech, and other languages. The maintainers indicated language support depends on available training data and ONNX optimization for the target phoneme set.
- Supertonic is a compelling choice for developers building privacy-first, offline-capable speech products. Its ONNX-based architecture strikes an excellent balance between quality and portability, and the multi-language support covers most major markets. The active GitHub community — with 80 open issues and regular engagement — signals a project with momentum. The main areas to watch are the audio completeness edge cases and the roadmap for training/custom voice tooling. For production use today, it excels in deployment scenarios where latency, privacy, and offline operation are non-negotiable. 🔗 Project: github.com/supertone-inc/supertonic ⭐ Stars: 7,181 | Forks: 616 📅 Created: November 2025 | Last Push: May 2026 🏷️ Topics: TTS, on-device, ONNX, multilingual, Swift, Rust, Python
Supertonic is an open-source, on-device text-to-speech engine developed by Supertone Inc. that runs entirely locally via ONNX Runtime — no cloud, no API calls, no latency concerns. Built in Swift but with bindings for Go, Rust, Python, Java, C#, and Node.js, it delivers high-quality multilingual speech synthesis directly on consumer hardware.
- Truly On-Device — All inference runs locally via ONNX Runtime. No audio ever leaves the device, making it ideal for privacy-sensitive applications, embedded systems, and offline-first products.
- Multilingual Out of the Box — Supports English (US/UK/AU), Spanish, French, German, Italian, Portuguese, Polish, Hindi, Japanese, Korean, and Chinese — with voices tailored to regional accents.
- Cross-Platform SDK — Native Swift library with production-ready bindings for Go, Rust, Python, Java, C#, and Node.js. Zero external dependencies at runtime beyond ONNX.
- Fast Inference — Optimized ONNX graph delivers real-time or faster-than-real-time synthesis on mid-range hardware, suitable for interactive applications.
User zachswift615 reported a reproducibility issue:
I can't identify a pattern yet, but in my tests, every 2 or 3 sentences a word or 2 get totally skipped. It doesn't seem to be specific words because if I process the same sentence multiple times, it'll be fine sometimes and some words will be missing other times. I've tried with 5x and 15x iterations on the diffusion.
This spawned a 10-comment thread with users sharing workarounds involving iteration count tuning and text preprocessing. The issue remains open, and the team has acknowledged it as a diffusion refinement artifact.
User BradDDi asked about voice customization:
None of the default voices provided are suitable for my use. I'd like to create a different custom voice. Is there a way to generate custom voice profiles or adjust speech style parameters?
The team pointed to their JSON-based voice descriptor format, but full training/fine-tuning tooling is not yet public. Alternatives like NeuTTS NANO and Soprano offer fine-tuning — though neither matches Supertonic's on-device performance.
User MattyMrozKun requested Polish support:
Please, I must have the Polish version of this
This became a popular thread with 10 comments from users requesting Korean, Arabic, Czech, and other languages. The maintainers indicated language support depends on available training data and ONNX optimization for the target phoneme set.
Supertonic is a compelling choice for developers building privacy-first, offline-capable speech products. Its ONNX-based architecture strikes an excellent balance between quality and portability, and the multi-language support covers most major markets. The active GitHub community — with 80 open issues and regular engagement — signals a project with momentum. The main areas to watch are the audio completeness edge cases and the roadmap for training/custom voice tooling. For production use today, it excels in deployment scenarios where latency, privacy, and offline operation are non-negotiable.
🔗 Project: github.com/supertone-inc/supertonic
⭐ Stars: 7,181 | Forks: 616
📅 Created: November 2025 | Last Push: May 2026
🏷️ Topics: TTS, on-device, ONNX, multilingual, Swift, Rust, Python