AutoSubs
Local-first AI subtitles. No cloud, no subscription, no data leaving your machine.
Use it as a standalone app, or connect to DaVinci Resolve, Adobe Premiere Pro, and After Effects.
- 🎙️ Transcription: Whisper, Moonshine, and Parakeet models via whisper-rs and ONNX Runtime
- 👥 Speaker Diarization: Identifies and labels different speakers in the transcript, enabling per-speaker styling
- 🌍 100+ Languages: Transcription and translation across a wide range of languages
- 💻 Cross-Platform: macOS (Apple Silicon/Intel), Windows (Vulkan/DirectML), Linux
Download
| Platform | Installer |
|---|---|
| 🪟 Windows | AutoSubs-windows-x86_64.exe |
| 🍎 macOS (Apple Silicon) | AutoSubs-Mac-ARM.pkg |
| 🍎 macOS (Intel) | AutoSubs-Mac-Intel.pkg |
| 🐧 Linux (Debian/Ubuntu) | AutoSubs-linux-x86_64.deb |
| 🐧 Linux (Fedora/openSUSE) | AutoSubs-linux-x86_64.rpm |
macOS Homebrew
macOS users can also install AutoSubs with Homebrew:
brew install --cask auto-subsLinux install
Debian/Ubuntu (.deb):
wget https://github.com/tmoroney/auto-subs/releases/latest/download/AutoSubs-linux-x86_64.deb
sudo apt install ./AutoSubs-linux-x86_64.debFedora/openSUSE (.rpm): Download AutoSubs-linux-x86_64.rpm and open it with your package manager.
Quick Start
Standalone Mode
- Launch AutoSubs and select an audio or video file.
- Pick your model and language/translation options.
- Click Transcribe. Edit speakers and subtitles as needed.
- Export as SRT, text, or copy to clipboard.
DaVinci Resolve Mode
- Open DaVinci Resolve → Workspace → Scripts → AutoSubs.
- Select your timeline/audio source and settings.
- Click Transcribe. Edit speakers and subtitles as needed.
- Send styled subtitles back to Resolve.
[!WARNING] Mac App Store version not supported - download DaVinci Resolve from blackmagicdesign.com instead.
Adobe Premiere Pro / After Effects Mode
- Launch AutoSubs and open Premiere Pro or After Effects (the CEP extension loads automatically).
- Select the Adobe integration from AutoSubs to export timeline audio for transcription, or import generated subtitles into your project.
- In Premiere Pro, subtitles are imported as caption tracks; in After Effects, SRT entries are created as text layers.
Command Line Interface
For command-line usage, see the CLI Guide with complete reference, examples, and troubleshooting.
Documentation
- CLI Guide - Command-line interface reference
- Contributing Guide - Development setup and contribution workflow
- AutoSubs-App README - Technical architecture and code organization
- Resolve Integration - DaVinci Resolve integration architecture and development
- Adobe Extension - Adobe Premiere Pro/After Effects integration details
[!TIP] I highly recommend checking out DeepWiki for asking questions and understanding the codebase.
Integrations
AutoSubs can run as a standalone subtitle generator, connect directly to DaVinci Resolve, or communicate with Adobe Premiere Pro and After Effects through the bundled CEP extension.
| Select a Preset Style | Or create your own |
|---|---|
What's New in v3.5
Transcription: Voice Activity Detection, multiple models (Whisper/Parakeet/Moonshine), improved speaker diarization, and built-in translation.
Editing & UI: Free-text subtitle editing with auto-timing, transcript history, 6 new UI languages, and custom titlebar.
DaVinci Resolve: Animated caption macro with per-word highlighting, preset system, marker-based word timing, and instant conflict detection.
Bug Fixes (v3.5.1): Formatting improvements, Resolve export corrections, Model Manager recovery, and Linux stability fixes.
Supported Models
AutoSubs ships with several local transcription model families. All run fully on-device — nothing is sent to the cloud. Models are downloaded on demand from the in-app Model Manager.
Accuracy is a relative 1–4 rating within AutoSubs (higher is better). Sizes and RAM figures are approximate.
Whisper
OpenAI's Whisper, via whisper-rs (GGML). Each size is available in a multilingual variant and an .en English-only variant (the .en models are slightly more accurate on English audio).
| Model | Size | RAM | Languages | Accuracy |
|---|---|---|---|---|
| tiny / tiny.en | 80 MB | 1 GB | Multilingual / English | ★ |
| base / base.en | 150 MB | 1 GB | Multilingual / English | ★ |
| small / small.en | 480 MB | 2 GB | Multilingual / English | ★★ |
| medium / medium.en | 1.5 GB | 5 GB | Multilingual / English | ★★★ |
| large-v3-turbo | 1.6 GB | 6 GB | Multilingual | ★★★ |
| large-v3 | 3.1 GB | 10 GB | Multilingual | ★★★★ |
Moonshine
Useful Sensors' Moonshine, via ONNX Runtime. The tiny English model is quantized; the language-specific tiny variants and the base model are float-precision.
| Model | Size | RAM | Language | Accuracy |
|---|---|---|---|---|
| moonshine-tiny | 60 MB | 1 GB | English | ★ |
| moonshine-tiny-ar | 120 MB | 1 GB | Arabic | ★★★ |
| moonshine-tiny-zh | 120 MB | 1 GB | Chinese | ★★★ |
| moonshine-tiny-ja | 120 MB | 1 GB | Japanese | ★★★ |
| moonshine-tiny-ko | 120 MB | 1 GB | Korean | ★★★ |
| moonshine-tiny-uk | 120 MB | 1 GB | Ukrainian | ★★ |
| moonshine-tiny-vi | 120 MB | 1 GB | Vietnamese | ★★★ |
| moonshine-base | 200 MB | 1 GB | English | ★★ |
Parakeet
NVIDIA's Parakeet-TDT-0.6B-v3 (int8 ONNX). Fast and accurate, with support for 25 European languages plus Russian and Ukrainian.
| Model | Size | RAM | Languages | Accuracy |
|---|---|---|---|---|
| parakeet | 700 MB | 2 GB | 25 languages (EU + RU + UK) | ★★★★ |
SenseVoice
Alibaba's SenseVoice (int8 ONNX). Compact and well-suited to CJK audio.
| Model | Size | RAM | Languages | Accuracy |
|---|---|---|---|---|
| sense-voice | 230 MB | 1 GB | Chinese, English, Japanese, Korean, Cantonese | ★★★ |
Canary
NVIDIA's Canary-1B-v2 (int8 ONNX). A multilingual encoder-decoder model that also supports native translation.
| Model | Size | RAM | Languages | Accuracy |
|---|---|---|---|---|
| canary | 1 GB | 3 GB | 25 languages (EU + RU + UK) | ★★★★ |
Cohere
Cohere Transcribe (int4 ONNX). The highest-accuracy option for a focused set of 14 widely-spoken languages.
| Model | Size | RAM | Languages | Accuracy |
|---|---|---|---|---|
| cohere | 2 GB | 4 GB | Arabic, German, Greek, English, Spanish, French, Italian, Japanese, Korean, Dutch, Polish, Portuguese, Vietnamese, Chinese | ★★★★ |
Diarization & VAD
In addition to transcription models, AutoSubs downloads a speaker diarization model (~40 MB, user-selectable from the Model Manager) and a Silero VAD model (auto-downloaded for voice activity detection during transcription).
Contributing
PRs are welcome! See CONTRIBUTING.md for how to get started, including the dev setup and a full codebase walkthrough via AutoSubs DeepWiki.
For detailed information about the DaVinci Resolve integration architecture, Lua server, Fusion macro system, and development workflow, see Resolve-Integration/README.md.
Acknowledgments
AutoSubs is built on top of excellent open-source projects:
- whisper-rs - Rust bindings for Whisper C++ library
- transcribe-rs - ONNX Runtime transcription with Moonshine and Parakeet models
- pyannote-rs - Rust implementation of Pyannote for speaker diarization (integrated into app code for improvements)
