{ "cells": [ { "cell_type": "markdown", "id": "b2758080", "metadata": { "id": "b2758080" }, "source": [ "##### Copyright 2026 Google LLC." ] }, { "cell_type": "code", "execution_count": 1, "id": "551d0e5d", "metadata": { "cellView": "form", "id": "551d0e5d" }, "outputs": [], "source": [ "#@title Licensed under the Apache License, Version 2.0 (the \"License\");\n", "# you may not use this file except in compliance with the License.\n", "# You may obtain a copy of the License at\n", "#\n", "# https://www.apache.org/licenses/LICENSE-2.0\n", "#\n", "# Unless required by applicable law or agreed to in writing, software\n", "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", "# See the License for the specific language governing permissions and\n", "# limitations under the License." ] }, { "cell_type": "markdown", "id": "7ebb7d02", "metadata": { "id": "7ebb7d02" }, "source": [ "# Multimodal Live API - Translation Quickstart\n", "\n", "\n", "\n", "**Preview**: The Live API is in preview.\n", "\n", "This notebook demonstrates usage of the Gemini Live API for real-time audio translation. For an overview of new capabilities refer to the [Gemini Live API docs](https://ai.google.dev/gemini-api/docs/live-api/capabilities).\n", "\n", "Some features of the API (such as low-latency bidirectional voice and video streaming using the local microphone and camera) are not supported in a standard Colab environment due to its headless cloud VM nature. To try full local hardware streaming, check out the CLI examples in the [Cookbook repository](https://github.com/google-gemini/cookbook/tree/main/quickstarts).\n", "\n", "In this notebook, you will learn how to **stream and translate audio from a URL** in real-time using the Live Translation API, displaying live transcripts and playing the translated audio output." ] }, { "cell_type": "markdown", "id": "c84b5646", "metadata": { "id": "c84b5646" }, "source": [ "## Setup\n", "\n", "### Install SDK and Dependencies\n", "\n", "The new **[Google Gen AI SDK](https://ai.google.dev/gemini-api/docs/sdks)** provides programmatic access to Gemini.\n", "\n", "> **Note**: This notebook also uses `ffmpeg` to process the audio stream. `ffmpeg` is pre-installed in Google Colab environments." ] }, { "cell_type": "code", "execution_count": 2, "id": "bb4d3878", "metadata": { "id": "bb4d3878" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m53.0/53.0 kB\u001b[0m \u001b[31m1.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m832.5/832.5 kB\u001b[0m \u001b[31m21.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m246.1/246.1 kB\u001b[0m \u001b[31m9.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", "\u001b[?25h\u001b[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.\n", "google-colab 1.0.0 requires google-auth==2.47.0, but you have google-auth 2.53.0 which is incompatible.\n", "google-adk 1.29.0 requires google-genai<2.0.0,>=1.64.0, but you have google-genai 2.8.0 which is incompatible.\u001b[0m\u001b[31m\n", "\u001b[0m" ] } ], "source": [ "%pip install -U -q google-genai" ] }, { "cell_type": "markdown", "id": "15cde47c", "metadata": { "id": "15cde47c" }, "source": [ "### Set up your API key\n", "\n", "To run the following cell, your API key must be stored in a Colab Secret named `GEMINI_API_KEY`. If you don't already have an API key, or you're not sure how to create a Colab Secret, see [Authentication](https://github.com/google-gemini/cookbook/blob/main/quickstarts/Authentication.ipynb) for details." ] }, { "cell_type": "code", "execution_count": 3, "id": "b5acd6e2", "metadata": { "id": "b5acd6e2" }, "outputs": [], "source": [ "from google.colab import userdata\n", "import os\n", "\n", "os.environ['GEMINI_API_KEY'] = userdata.get('GEMINI_API_KEY')" ] }, { "cell_type": "markdown", "id": "0fa034f1", "metadata": { "id": "0fa034f1" }, "source": [ "### Initialize SDK client\n", "\n", "The client will pick up your API key from the environment variable." ] }, { "cell_type": "code", "execution_count": 4, "id": "0f60a3e2", "metadata": { "id": "0f60a3e2" }, "outputs": [], "source": [ "from google import genai\n", "from google.genai import types\n", "\n", "client = genai.Client(api_key=os.environ['GEMINI_API_KEY'])" ] }, { "cell_type": "markdown", "id": "351e67b1", "metadata": { "id": "351e67b1" }, "source": [ "### Select a model\n", "\n", "The Live Translation API uses the translation-capable Live models." ] }, { "cell_type": "code", "execution_count": 5, "id": "e1be5ab8", "metadata": { "id": "e1be5ab8" }, "outputs": [], "source": [ "MODEL = 'gemini-3.5-live-translate-preview' # @param ['gemini-3.5-live-translate-preview'] {allow-input: true, isTemplate: true}" ] }, { "cell_type": "markdown", "id": "e637eae2", "metadata": { "id": "e637eae2" }, "source": [ "### Import Modules\n", "\n", "Import the necessary packages for handling async events, audio format streams, and file writing." ] }, { "cell_type": "code", "execution_count": 6, "id": "02944a4f", "metadata": { "id": "02944a4f" }, "outputs": [], "source": [ "import array\n", "import asyncio\n", "import contextlib\n", "import wave\n", "\n", "from IPython.display import display, Audio\n", "\n", "from google import genai\n", "from google.genai import types" ] }, { "cell_type": "markdown", "id": "c139c078", "metadata": { "id": "c139c078" }, "source": [ "### Helper to Write WAV Files\n", "\n", "Let's define a helper context manager to write received audio chunks to a `.wav` file for playback in the notebook:" ] }, { "cell_type": "code", "execution_count": 7, "id": "fabf537a", "metadata": { "id": "fabf537a" }, "outputs": [], "source": [ "@contextlib.contextmanager\n", "def wave_file(filename, channels=1, rate=24000, sample_width=2):\n", " with wave.open(filename, \"wb\") as wf:\n", " wf.setnchannels(channels)\n", " wf.setsampwidth(sample_width)\n", " wf.setframerate(rate)\n", " yield wf" ] }, { "cell_type": "markdown", "id": "87cce836", "metadata": { "id": "87cce836" }, "source": [ "## Audio URL Streaming & Translation\n", "\n", "You can stream audio in real-time to the Live API, and receive translated audio back. Here, we'll stream audio from a public audio URL in 100ms chunks to mimic real-time audio input, and stream translation responses back.\n", "\n", "### Helper for Streaming Audio URL\n", "\n", "We define a helper function to stream audio from an HTTP URL and use `ffmpeg` to transcode it to raw PCM 16kHz mono audio." ] }, { "cell_type": "code", "execution_count": 8, "id": "5c493b94", "metadata": { "id": "5c493b94" }, "outputs": [], "source": [ "async def stream_audio_url(url: str, audio_queue: asyncio.Queue, sample_rate: int = 16000, channels: int = 1, chunk_size: int = 1600):\n", " \"\"\"Streams audio from an HTTP URL, decoding it via ffmpeg and putting raw PCM bytes into the audio_queue.\"\"\"\n", " print(f\"\\n[Info] Starting audio stream via ffmpeg from: {url}\")\n", " # Spawn ffmpeg to decode stream to raw PCM 16kHz mono 16-bit\n", " process = await asyncio.create_subprocess_exec(\n", " 'ffmpeg',\n", " '-i', url,\n", " '-f', 's16le',\n", " '-acodec', 'pcm_s16le',\n", " '-ar', str(sample_rate),\n", " '-ac', str(channels),\n", " '-',\n", " stdout=asyncio.subprocess.PIPE,\n", " stderr=asyncio.subprocess.DEVNULL\n", " )\n", "\n", " # 1600 samples * 2 bytes/sample (16-bit) = 3200 bytes per chunk\n", " chunk_size_bytes = chunk_size * 2\n", " bytes_per_second = sample_rate * 2\n", " start_time = asyncio.get_event_loop().time()\n", " bytes_sent = 0\n", "\n", " try:\n", " while True:\n", " data = await process.stdout.read(chunk_size_bytes)\n", " if not data:\n", " break\n", "\n", " await audio_queue.put(data)\n", " bytes_sent += len(data)\n", "\n", " # Rate limit to real-time speed (1.0x) so we simulate real mic streaming\n", " expected_elapsed = bytes_sent / bytes_per_second\n", " actual_elapsed = asyncio.get_event_loop().time() - start_time\n", " sleep_time = expected_elapsed - actual_elapsed\n", " if sleep_time > 0:\n", " await asyncio.sleep(sleep_time)\n", " except asyncio.CancelledError:\n", " pass\n", " finally:\n", " if process.returncode is None:\n", " try:\n", " process.terminate()\n", " await process.wait()\n", " except Exception:\n", " pass\n", " print(\"\\n[Info] Audio stream finished.\")" ] }, { "cell_type": "markdown", "id": "fd486189", "metadata": { "id": "fd486189" }, "source": [ "### Helper for Sending Audio & Receiving Translated Responses\n", "\n", "Next, we define functions to push the audio chunks from the queue to the Live session, and to receive and print the source and translation transcripts." ] }, { "cell_type": "code", "execution_count": 9, "id": "a5a6cf08", "metadata": { "id": "a5a6cf08" }, "outputs": [], "source": [ "async def send_realtime(session, audio_queue: asyncio.Queue, sample_rate: int = 16000):\n", " \"\"\"Sends audio from the input queue to the GenAI session.\"\"\"\n", " try:\n", " while True:\n", " chunk = await audio_queue.get()\n", " await session.send_realtime_input(\n", " audio=types.Blob(\n", " data=chunk,\n", " mime_type=f\"audio/pcm;rate={sample_rate}\"\n", " )\n", " )\n", " audio_queue.task_done()\n", " except asyncio.CancelledError:\n", " pass" ] }, { "cell_type": "markdown", "id": "842b0d86", "metadata": { "id": "842b0d86" }, "source": [ "### Run Audio URL Translation\n", "\n", "Now, we set up our main translation runner. We'll use a static audio URL: `https://storage.googleapis.com/generativeai-downloads/gemini-cookbook/audio/gemini-live-translate-sample.wav`.\n", "The code runs the input audio stream, upload stream, and receiver concurrently in a `TaskGroup`. The translated Spanish audio is written to a wave file and played back." ] }, { "cell_type": "code", "execution_count": 10, "id": "968a5af7", "metadata": { "id": "968a5af7" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[Info] Connecting to Gemini Live (gemini-3.5-live-translate-preview)...\n", "[Info] Connected successfully. Starting stream...\n", "\n", "[Info] Starting audio stream via ffmpeg from: https://storage.googleapis.com/generativeai-downloads/gemini-cookbook/audio/gemini-live-translate-sample.wav\n", "\n", "[Source (en)] 20 years ago,\n", "[Translation (es)] Hace 20 años,\n", "\n", "[Source (en)] translation at Google\n", "[Translation (es)] la traducción\n", "\n", "[Source (en)] began as one of our\n", "[Translation (es)] en Google comenzó\n", "\n", "[Source (en)] pioneering\n", "[Translation (es)] como una de nuestras\n", "\n", "[Source (en)] machine learning\n", "[Translation (es)] máquinas de aprendizaje\n", "\n", "[Source (en)] experiments to turn\n", "[Translation (es)] automático\n", "\n", "[Source (en)] the science of\n", "[Translation (es)] para convertir la\n", "\n", "[Source (en)] language into the\n", "[Translation (es)] ciencia del lenguaje\n", "\n", "[Source (en)] magic of\n", "[Translation (es)] en la magia\n", "\n", "[Source (en)] human connection.\n", "[Translation (es)] de la conexión humana.\n", "\n", "[Source (en)] That\n", "[Translation (es)] Ese\n", "\n", "[Source (en)] experiment has come a\n", "[Translation (es)] experimento ha recorrido\n", "\n", "[Source (en)] long way, with\n", "[Translation (es)] un largo camino,\n", "\n", "[Source (en)] over a\n", "[Translation (es)] con más de\n", "\n", "[Source (en)] trillion words being\n", "[Translation (es)] un billón de palabras\n", "\n", "[Source (en)] translated for\n", "[Translation (es)] traducidas\n", "\n", "[Source (en)] billions of users\n", "[Translation (es)] para miles de millones\n", "\n", "[Source (en)] across our\n", "[Translation (es)] de usuarios en\n", "\n", "[Source (en)] products every\n", "[Translation (es)] nuestros productos cada\n", "\n", "[Source (en)] month.\n", "[Translation (es)] mes.\n", "\n", "[Source (en)] Today, we're\n", "[Translation (es)] Hoy, estamos\n", "\n", "[Source (en)] taking our next\n", "[Translation (es)] dando nuestro siguiente\n", "\n", "[Source (en)] step with the release\n", "[Translation (es)] paso con el lanzamiento\n", "\n", "[Source (en)] of Gemini\n", "[Translation (es)] de Gemini\n", "\n", "[Source (en)] 3.5\n", "[Translation (es)] 3.5\n", "\n", "[Source (en)] Live Translate,\n", "[Translation (es)] Live Translate,\n", "\n", "[Source (en)] our latest\n", "[Translation (es)] nuestra última\n", "\n", "[Source (en)] audio model for\n", "[Translation (es)] modelo de audio\n", "\n", "[Source (en)] live speech-to-\n", "[Translation (es)] para traducción\n", "\n", "[Source (en)] speech translation.\n", "[Translation (es)] de voz en tiempo real.\n", "\n", "[Source (en)] The model\n", "[Translation (es)] El modelo\n", "\n", "[Source (en)] automatically\n", "[Translation (es)] detecta automáticamente\n", "\n", "[Source (en)] detects 70\n", "[Translation (es)] más de 70\n", "\n", "[Source (en)] plus languages\n", "[Translation (es)] idiomas\n", "\n", "[Source (en)] and generates\n", "[Translation (es)] y genera una\n", "\n", "[Source (en)] smooth, natural\n", "[Translation (es)] voz fluida y natural\n", "\n", "[Source (en)] sounding translated\n", "[Translation (es)] en el idioma traducido.\n", "\n", "[Source (en)] speech that\n", "[Translation (es)] que preserva\n", "\n", "[Source (en)] preserves the speaker's\n", "[Translation (es)] la entonación\n", "\n", "[Source (en)] intonation,\n", "[Translation (es)] y el ritmo del\n", "\n", "[Source (en)] pacing, and pitch,\n", "[Translation (es)] hablante, y\n", "\n", "[Source (en)] unlike\n", "[Translation (es)] a diferencia de\n", "\n", "[Source (en)] turn-by-turn\n", "[Translation (es)] los sistemas de turno\n", "\n", "[Source (en)] systems that wait for the\n", "[Translation (es)] por turno que esperan\n", "\n", "[Source (en)] speaker to finish speaking\n", "[Translation (es)] a que el hablante termine\n", "\n", "[Source (en)] before responding,\n", "[Translation (es)] de hablar para responder,\n", "\n", "[Source (en)] 3.5\n", "[Translation (es)] 3.5\n", "\n", "[Source (en)] Live Translate\n", "[Translation (es)] Live Translate\n", "\n", "[Source (en)] generates speech\n", "[Translation (es)] genera voz\n", "\n", "[Source (en)] continuously.\n", "[Translation (es)] continuamente.\n", "\n", "[Source (en)] Balancing the\n", "[Translation (es)] Equilibrando\n", "\n", "[Source (en)] tradeoff between waiting for\n", "[Translation (es)] la compensación entre\n", "\n", "[Source (en)] context to improve\n", "[Translation (es)] esperar el contexto\n", "\n", "[Source (en)] quality and\n", "[Translation (es)] para mejorar la calidad\n", "\n", "[Source (en)] translating\n", "[Translation (es)] y traducir\n", "\n", "[Source (en)] immediately to stay in\n", "[Translation (es)] inmediatamente para\n", "\n", "[Source (en)] sync with the speaker,\n", "[Translation (es)] mantenerse sincronizado\n", "\n", "[Source (en)] it delivers\n", "[Translation (es)] con el hablante,\n", "\n", "[Source (en)] fluid audio\n", "[Translation (es)] ofrece audio\n", "\n", "[Source (en)] without awkward\n", "[Translation (es)] fluido sin pausas\n", "\n", "[Source (en)] pauses and stays\n", "[Translation (es)] incómodas y\n", "\n", "[Source (en)] just a few seconds\n", "[Translation (es)] se mantiene a solo\n", "\n", "[Source (en)] behind the speaker\n", "[Translation (es)] unos segundos detrás\n", "\n", "[Source (en)] throughout the session.\n", "[Translation (es)] del hablante durante toda\n", "\n", "[Source (en)] Gemini\n", "[Translation (es)] la sesión.\n", "\n", "[Source (en)] 3.5\n", "[Translation (es)] Gemini 3.5\n", "\n", "[Source (en)] Live Translate is\n", "[Translation (es)] Live Translate\n", "\n", "[Source (en)] rolling out starting\n", "[Translation (es)] se está implementando\n", "\n", "[Source (en)] today\n", "[Translation (es)] a partir de hoy\n", "\n", "[Source (en)] across Google products.\n", "[Translation (es)] en los productos de Google.\n", "\n", "[Source (en)] For\n", "[Translation (es)] Para\n", "\n", "[Source (en)] developers in public\n", "[Translation (es)] los desarrolladores en\n", "\n", "[Source (en)] preview via the Gemini\n", "[Translation (es)] vista previa pública\n", "\n", "[Source (en)] Live API\n", "[Translation (es)] a través de la API\n", "\n", "[Source (en)] and Google AI\n", "[Translation (es)] de Gemini Live\n", "\n", "[Source (en)] Studio, for\n", "[Translation (es)] y Google AI Studio,\n", "\n", "[Source (en)] enterprises in\n", "[Translation (es)] para empresas\n", "\n", "[Source (en)] private preview starting\n", "[Translation (es)] en vista previa privada\n", "\n", "[Source (en)] this month in\n", "[Translation (es)] a partir de este mes\n", "\n", "[Source (en)] Google Meet,\n", "[Translation (es)] en Google Meet,\n", "\n", "[Source (en)] for everyone via\n", "[Translation (es)] para todos\n", "\n", "[Source (en)] Google Translate\n", "[Translation (es)] vía Google Translate\n", "\n", "[Info] Audio stream finished.\n", "\n", "[Source (en)] on Android and\n", "[Translation (es)] en Android e\n", "\n", "Translation complete!\n" ] }, { "data": { "text/html": [ "\n", " \n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "async def run_audio_translation(url: str, target_lang: str):\n", " # Configure live connection with translation settings\n", " config = types.LiveConnectConfig(\n", " response_modalities=[types.Modality.AUDIO],\n", " translation_config=types.TranslationConfig(\n", " echo_target_language=True,\n", " target_language_code=target_lang,\n", " ),\n", " input_audio_transcription=types.AudioTranscriptionConfig(),\n", " output_audio_transcription=types.AudioTranscriptionConfig(),\n", " )\n", "\n", " audio_queue_input = asyncio.Queue(maxsize=10)\n", " file_name = 'audio_translation.wav'\n", "\n", " print(f\"[Info] Connecting to Gemini Live ({MODEL})...\")\n", "\n", " try:\n", " async with client.aio.live.connect(model=MODEL, config=config) as session:\n", " print(\"[Info] Connected successfully. Starting stream...\")\n", "\n", " with wave_file(file_name) as wav:\n", " async def receive_responses():\n", " try:\n", " async for response in session.receive():\n", " server_content = response.server_content\n", " if server_content:\n", " # Write translated audio chunks to WAV file\n", " if server_content.model_turn:\n", " for part in server_content.model_turn.parts:\n", " if part.inline_data and isinstance(part.inline_data.data, bytes):\n", " wav.writeframes(part.inline_data.data)\n", "\n", " # Print input (source) transcript\n", " if server_content.input_transcription and server_content.input_transcription.text:\n", " lang = f\" ({server_content.input_transcription.language_code})\" if server_content.input_transcription.language_code else \"\"\n", " print(f\"\\n[Source{lang}] {server_content.input_transcription.text}\", flush=True)\n", "\n", " # Print output (translated) transcript\n", " if server_content.output_transcription and server_content.output_transcription.text:\n", " lang = f\" ({server_content.output_transcription.language_code})\" if server_content.output_transcription.language_code else \"\"\n", " print(f\"[Translation{lang}] {server_content.output_transcription.text}\", flush=True)\n", " except asyncio.CancelledError:\n", " pass\n", " except Exception as e:\n", " print(f\"\\n[Error] Receiving loop encountered error: {e}\")\n", "\n", " async with asyncio.TaskGroup() as tg:\n", " # Task 1: Stream original audio from URL to a queue\n", " stream_task = tg.create_task(\n", " stream_audio_url(url, audio_queue_input)\n", " )\n", " # Task 2: Upload audio queue to Gemini Live Translate API\n", " send_task = tg.create_task(send_realtime(session, audio_queue_input))\n", " # Task 3: Receive transcripts and translated audio response chunks\n", " receive_task = tg.create_task(receive_responses())\n", "\n", " # Wait for the audio stream to finish reading\n", " await stream_task\n", "\n", " # Wait for all buffered input chunks to be sent to Gemini\n", " await audio_queue_input.join()\n", " send_task.cancel()\n", "\n", " # Give Gemini a few seconds to finish translating the final chunks\n", " await asyncio.sleep(4.0)\n", " receive_task.cancel()\n", "\n", " except Exception as e:\n", " print(f\"[Error] Live session error: {e}\")\n", "\n", " print(\"\\nTranslation complete!\")\n", " display(Audio(filename=file_name, autoplay=True))\n", "\n", "#@title Run Translation { run: \"auto\" }\n", "# Audio: Gemini Live Translate sample (or enter your own public audio URL)\n", "audio_url = \"https://storage.googleapis.com/generativeai-downloads/gemini-cookbook/audio/gemini-live-translate-sample.wav\" #@param {type:\"string\"}\n", "target_lang = \"es\" #@param {type:\"string\"}\n", "\n", "await run_audio_translation(\n", " url=audio_url,\n", " target_lang=target_lang\n", ")" ] }, { "cell_type": "markdown", "id": "f51f387e", "metadata": { "id": "f51f387e" }, "source": [ "## Next steps\n", "\n", "This tutorial shows basic audio translation capabilities using the Multimodal Live API.\n", "\n", "- Try it out in [Google AI Studio](https://aistudio.google.com/live?model=gemini-3.5-live-translate-preview)\n", "- Read the [docs](https://ai.google.dev/gemini-api/docs/live-api/live-translate)\n", "- Clone the [Live API examples from GitHub](https://github.com/google-gemini/gemini-live-api-examples)\n" ] } ], "metadata": { "colab": { "name": "Get_started_LiveTranslate.ipynb", "toc_visible": true }, "kernelspec": { "display_name": "Python 3", "name": "python3" } }, "nbformat": 4, "nbformat_minor": 0 }