TryVoice

Hands-free voice runtime for AI agents. Talk to your AI coding assistant without touching the keyboard.

TryVoice wraps AI agents (like OpenClaw and Claude Code) into a voice interface with wake word activation, push-to-talk, and real-time streaming — all running in your browser.

Early Preview (v0.1.0-alpha) — actively developed, expect rough edges.

TryVoice Demo — multi-bot voice interaction with cross-device sync

What It Does

Wake word activation — say a keyword to start talking, no hands needed (powered by OpenWakeWord)
Push-to-talk — hold a button to speak, release to send
Real-time streaming — hear the AI respond as it generates, with interruptible playback
Multi-bot slots — run multiple independent agent sessions side by side
Mobile-ready — PWA support, works on phone browsers
Pluggable adapters — connect any AI agent via the Adapter SDK

Prerequisites

TryVoice is a voice layer on top of existing AI agents. You need at least one of:

Claude Code — installed on the same machine (claude CLI available in PATH)
OpenClaw — running with a gateway endpoint

More agent adapters coming soon. See Building an Adapter to connect your own agent.

Quick Start

Option A: Install from PyPI (recommended)

pip install tryvoice
tryvoice            # Start the server and open browser
# First launch shows Setup Wizard in browser — configure adapter, TTS, etc.

# If "command not found", try:
python3 -m backend.cli

Option B: Install from source

git clone https://github.com/AaronZ021/tryvoice-oss.git
cd tryvoice
bash scripts/setup.sh   # Creates venv, installs packages, builds frontend
source .venv/bin/activate
tryvoice                 # Start the server and open browser
# First launch shows Setup Wizard in browser

Configure

On first launch, the browser opens a Setup Wizard that walks you through:

API Keys (optional but recommended) — enter a Groq API key for faster speech-to-text (lower latency than local Whisper), and an Azure Speech key for high-quality text-to-speech
Adapter — choose Claude Code or OpenClaw and enter connection details
Wake word — pick a keyword (e.g., "jarvis", "americano") for hands-free voice activation

All settings can be changed later from the in-app settings panel.

Docker

git clone https://github.com/AaronZ021/tryvoice-oss.git
cd tryvoice
docker compose up
# Open https://localhost:7860 — Setup Wizard runs on first launch

Architecture

┌─────────────┐     WebSocket      ┌──────────────────┐
│  Browser UI  │◄──────────────────►│   TryVoice       │
│  (PWA)       │                    │   Runtime         │
│              │                    │                   │
│  Wake Word   │                    │  ┌────────────┐   │
│  STT / TTS   │                    │  │  Adapter    │   │──► Claude Code
│  Audio I/O   │                    │  │  Registry   │   │──► OpenClaw
│              │                    │  │  (plugin)   │   │──► Your adapter
└─────────────┘                    └──┴────────────┴───┘

Voice flow: Wake word / PTT → STT (browser Web Speech API or Groq Whisper) → Adapter → Agent → Streaming text → TTS (Edge TTS) → Audio playback

Configuration

Variable	Default	Description
`TRYVOICE_ACTIVE_ADAPTER`	`echo`	Active adapter (`claude-code`, `openclaw`, or custom)
`GROQ_API_KEY`	—	Groq API key for server-side STT (optional, browser fallback)
`EDGE_TTS_VOICE`	`zh-CN-XiaoxiaoNeural`	Edge TTS voice (300+ voices available)
`PORT`	`7860`	Server port

See .env.example for all options, or run tryvoice --setup for an interactive wizard.

Built-in Adapters

Adapter	Use Case
`claude-code`	Voice control for Claude Code terminal sessions
`openclaw`	Voice interface to OpenClaw agent gateway
`echo`	Testing and demo (echoes your speech back)

Building an Adapter

Connect TryVoice to any AI agent by implementing the Adapter protocol:

from backend.adapter_sdk import AdapterCapabilities, AdapterEvent

class MyAdapter:
    def report_capabilities(self) -> AdapterCapabilities:
        return AdapterCapabilities(supports_stream=True, ...)

    async def stream_user_turn(self, session_key, text, ...):
        # Call your agent, yield AdapterEvent chunks
        yield AdapterEvent(kind="token", text="Hello!")
        yield AdapterEvent(kind="turn_end")

[project.entry-points."tryvoice.adapters"]
my-agent = "my_package.adapter:MyAdapter"

Development

Prerequisites

Python 3.9+ (3.11 recommended)
Node.js 20+ (for frontend build)

Setup

git clone https://github.com/AaronZ021/tryvoice-oss.git
cd tryvoice
bash scripts/setup.sh
source .venv/bin/activate
tryvoice

Project structure

tryvoice/
├── apps/
│   ├── host-runtime/      # Python FastAPI backend (adapter layer, session FSM, voice providers)
│   └── client-web/        # TypeScript frontend (Vite, state machine, wake word, audio)
├── scripts/               # Setup and build scripts
├── pyproject.toml          # Python package config
├── Dockerfile              # Multi-stage build (Node + Python)
└── docker-compose.yml      # Single-command deployment

License

Apache License 2.0 — see LICENSE.

TryVoice

What It Does

Prerequisites

Quick Start

Option A: Install from PyPI (recommended)

Option B: Install from source

Configure

Docker

Architecture

Configuration

Built-in Adapters

Building an Adapter

Development

Prerequisites

Setup

Project structure

License

关于 About

语言 Languages

提交活跃度 Commit Activity

核心贡献者 Contributors