{ "cells": [ { "cell_type": "markdown", "metadata": { "id": "0fgOxpmGrOvn" }, "source": [ "##### Copyright 2026 Google LLC." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "id": "zxdx4xJxrTfP" }, "outputs": [], "source": [ "# @title Licensed under the Apache License, Version 2.0 (the \"License\");\n", "# you may not use this file except in compliance with the License.\n", "# You may obtain a copy of the License at\n", "#\n", "# https://www.apache.org/licenses/LICENSE-2.0\n", "#\n", "# Unless required by applicable law or agreed to in writing, software\n", "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", "# See the License for the specific language governing permissions and\n", "# limitations under the License." ] }, { "cell_type": "markdown", "metadata": { "id": "Qw6ttkOtrQ_D" }, "source": [ "# Gemini API: Gemini Text-to-speech", "\n", "" ] }, { "cell_type": "markdown", "metadata": { "id": "y7f4kFby0E6j" }, "source": [ "The Gemini API can transform text input into single speaker or multi-speaker audio (podcast-like experience like in [NotebookLM](https://notebooklm.google.com/). This notebook provides an example of how to control the *Text-to-speech* (TTS) capability of the Gemini model and guide its style, accent, pace, and tone.\n", "\n", "Before diving in the code, you should try this capability on [AI Studio](https://aistudio.google.com/prompts/new_chat?model=gemini-2.5-flash-preview-tts).\n", "\n", "**Note that the TTS model can only do TTS, it does not have the reasoning capabilities of the Gemini models, so you can ask things like \"say this in that style\", but not \"tell me why the sky is blue\".** If that's what you want, you should use the [Live API](./Get_started_LiveAPI.ipynb) instead.\n", "\n", "The [documentation](https://ai.google.dev/gemini-api/docs/audio-generation) is also a good place to start discovering the TTS capability." ] }, { "cell_type": "markdown", "metadata": { "id": "fzgIhXhB4KSR" }, "source": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", "
\n", " 🪧\n", " \n", "

Audio-out is a preview feature. It is free to use for now with quota limitations, but is subject to change.

\n", "
" ] }, { "cell_type": "markdown", "metadata": { "id": "Mfk6YY3G5kqp" }, "source": [ "## Setup" ] }, { "cell_type": "markdown", "metadata": { "id": "CTIfnvCn9HvH" }, "source": [ "### Setup your API key\n", "\n", "To run the following cell, your API key must be stored it in a Colab Secret named `GOOGLE_API_KEY`. If you don't already have an API key, or you're not sure how to create a Colab Secret, see [Authentication ![image](https://storage.googleapis.com/generativeai-downloads/images/colab_icon16.png)](../quickstarts/Authentication.ipynb) for an example." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "id": "A1pkoyZb9Jm3" }, "outputs": [], "source": [ "from google.colab import userdata\n", "\n", "GOOGLE_API_KEY=userdata.get('GOOGLE_API_KEY')" ] }, { "cell_type": "markdown", "metadata": { "id": "d5027929de8f" }, "source": [ "### Install and initialize the SDK\n" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "id": "46zEFO2a9FFd" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[?25l \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m0.0/196.3 kB\u001b[0m \u001b[31m?\u001b[0m eta \u001b[36m-:--:--\u001b[0m\r\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m196.3/196.3 kB\u001b[0m \u001b[31m5.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", "\u001b[?25h" ] } ], "source": [ "!pip install -U -q \"google-genai>=1.16.0\" # 1.16 is needed for multi-speaker audio\n" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "id": "HghvVpbU0Uap" }, "outputs": [], "source": [ "from google import genai\n", "from google.genai import types\n", "\n", "client = genai.Client(api_key=GOOGLE_API_KEY)" ] }, { "cell_type": "markdown", "metadata": { "id": "QOov6dpG99rY" }, "source": [ "### Select a model\n", "\n", "Audio-out is only supported by the \"`tts`\" models, `gemini-2.5-flash-preview-tts` and `gemini-2.5-pro-preview-tts`.\n", "\n", "For more information about all Gemini models, check the [documentation](https://ai.google.dev/gemini-api/docs/models/gemini) for extended information on each of them.\n" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "id": "27Fikag0xSaB" }, "outputs": [], "source": [ "MODEL_ID = \"gemini-2.5-flash-preview-tts\" # @param [\"gemini-2.5-flash-preview-tts\",\"gemini-2.5-pro-preview-tts\"] {\"allow-input\":true, isTemplate: true}" ] }, { "cell_type": "markdown", "metadata": { "id": "D_XlihP2FZeg" }, "source": [ "Next create a helper function to prompt the model and play back the audio in the notebook:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "cellView": "form", "id": "uwY8N8_YBrQn" }, "outputs": [], "source": [ "# @title Helper functions (just run that cell)\n", "\n", "import contextlib\n", "import wave\n", "from IPython.display import Audio\n", "\n", "file_index = 0\n", "\n", "@contextlib.contextmanager\n", "def wave_file(filename, channels=1, rate=24000, sample_width=2):\n", " with wave.open(filename, \"wb\") as wf:\n", " wf.setnchannels(channels)\n", " wf.setsampwidth(sample_width)\n", " wf.setframerate(rate)\n", " yield wf\n", "\n", "def play_audio_blob(blob):\n", " global file_index\n", " file_index += 1\n", "\n", " fname = f'audio_{file_index}.wav'\n", " with wave_file(fname) as wav:\n", " wav.writeframes(blob.data)\n", "\n", " return Audio(fname, autoplay=True)\n", "\n", "def play_audio(response):\n", " return play_audio_blob(response.candidates[0].content.parts[0].inline_data)" ] }, { "cell_type": "markdown", "metadata": { "id": "5Th7TDK4J1ot" }, "source": [ "## Generate a simple audio output\n", "\n", "Let's start with something simple:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "IeuFiGTwrmna" }, "outputs": [], "source": [ "response = client.models.generate_content(\n", " model=MODEL_ID,\n", " contents=\"Say 'hello, my name is Gemini!'\",\n", " config={\"response_modalities\": ['Audio']},\n", ")" ] }, { "cell_type": "markdown", "metadata": { "id": "IcEpMs6QARw3" }, "source": [ "The generated ouput is in the response `inline_data` and as you can see it's indeed audio data." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "SM2keMYxAPxm" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "audio/L16;codec=pcm;rate=24000\n" ] } ], "source": [ "blob = response.candidates[0].content.parts[0].inline_data\n", "print(blob.mime_type)" ] }, { "cell_type": "markdown", "metadata": { "id": "aqfGze37BIRs" }, "source": [ "To be able to listen to the generated audio in colab, you're going to use our helper function to write the output in a file and play it." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "iv3AEtM6BH_k" }, "outputs": [ { "data": { "text/html": [ "\n", " \n", " " ], "text/plain": [ "" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "play_audio_blob(blob)" ] }, { "cell_type": "markdown", "metadata": { "id": "MRWi6kgQQB7l" }, "source": [ "Note that the model can only do TTS, so you should always tell it to \"say\", \"read\", \"TTS\" something, otherwise it won't do anything." ] }, { "cell_type": "markdown", "metadata": { "id": "8okc98mP4Fjw" }, "source": [ "## Control how the model speaks\n", "\n", "There are 30 different built-in voices you can use and 24 supported languages which gives you plenty of combinations to try." ] }, { "cell_type": "markdown", "metadata": { "id": "aOAHrb6wCxy7" }, "source": [ "### Choose a voice\n", "\n", "Choose a voice among the 30 different ones. You can find their characteristics in the [documentation](https://ai.google.dev/gemini-api/docs/speech-generation#voices)." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "Z32vkVi0F14F" }, "outputs": [], "source": [ "voice_name = \"Sadaltager\" # @param [\"Zephyr\", \"Puck\", \"Charon\", \"Kore\", \"Fenrir\", \"Leda\", \"Orus\", \"Aoede\", \"Callirhoe\", \"Autonoe\", \"Enceladus\", \"Iapetus\", \"Umbriel\", \"Algieba\", \"Despina\", \"Erinome\", \"Algenib\", \"Rasalgethi\", \"Laomedeia\", \"Achernar\", \"Alnilam\", \"Schedar\", \"Gacrux\", \"Pulcherrima\", \"Achird\", \"Zubenelgenubi\", \"Vindemiatrix\", \"Sadachbia\", \"Sadaltager\", \"Sulafar\"]" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "WnzJIGTf4WKH" }, "outputs": [ { "data": { "text/html": [ "\n", " \n", " " ], "text/plain": [ "" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "response = client.models.generate_content(\n", " model=MODEL_ID,\n", " contents=\"\"\"Say \"I am a very knowlegeable model, especially when using grounding\", wait 5 seconds then say \"Don't you think?\".\"\"\",\n", " config={\n", " \"response_modalities\": ['Audio'],\n", " \"speech_config\": {\n", " \"voice_config\": {\n", " \"prebuilt_voice_config\": {\n", " \"voice_name\": voice_name\n", " }\n", " }\n", " }\n", " },\n", ")\n", "\n", "play_audio(response)" ] }, { "cell_type": "markdown", "metadata": { "id": "pyb8MZRM69su" }, "source": [ "### Change the language\n", "\n", "Just tell the model to speak in a certain language and it will. The [documentation](https://ai.google.dev/gemini-api/docs/speech-generation#languages) lists all the supported ones." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "NlF5ZyabJ8bV" }, "outputs": [ { "data": { "text/html": [ "\n", " \n", " " ], "text/plain": [ "" ] }, "execution_count": 36, "metadata": {}, "output_type": "execute_result" } ], "source": [ "response = client.models.generate_content(\n", " model=MODEL_ID,\n", " contents=\"\"\"\n", " Read this in French:\n", "\n", " Les chaussettes de l'archiduchesse sont-elles sèches ? Archi-sèches ?\n", " Un chasseur sachant chasser doit savoir chasser sans son chien.\n", " \"\"\",\n", " config={\"response_modalities\": ['Audio']},\n", ")\n", "\n", "play_audio(response)" ] }, { "cell_type": "markdown", "metadata": { "id": "iSlQQx8oFcP3" }, "source": [ "### Prompt the model to speak in certain ways\n", "\n", "You can control style, tone, accent, and pace using natural language prompts, for example:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "xGv7CzCTFfB6" }, "outputs": [ { "data": { "text/html": [ "\n", " \n", " " ], "text/plain": [ "" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "response = client.models.generate_content(\n", " model=MODEL_ID,\n", " contents=\"\"\"\n", " Say in an spooky whisper:\n", " \"By the pricking of my thumbs...\n", " Something wicked this way comes!\"\n", " \"\"\",\n", " config={\"response_modalities\": ['Audio']},\n", ")\n", "\n", "play_audio(response)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "sHpL2YK4F_AZ" }, "outputs": [ { "data": { "text/html": [ "\n", " \n", " " ], "text/plain": [ "" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "response = client.models.generate_content(\n", " model=MODEL_ID,\n", " contents=\"\"\"\n", " Read this disclaimer in as fast a voice as possible while remaining intelligible:\n", "\n", " [The author] assumes no responsibility or liability for any errors or omissions in the content of this site.\n", " The information contained in this site is provided on an 'as is' basis with no guarantees of completeness, accuracy, usefulness or timeliness\n", " \"\"\",\n", " config={\"response_modalities\": ['Audio']},\n", ")\n", "\n", "play_audio(response)" ] }, { "cell_type": "markdown", "metadata": { "id": "VfJOUILY9AdS" }, "source": [ "## Mutlti-speakers\n", "\n", "The TTS model can also read discussions between 2 speakers (like [NotebookLM](https://Fnotebooklm.google.com) podcast feature). You just need to tell it that there are two speakers:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "IctZ59WSIDyv" }, "outputs": [ { "data": { "text/html": [ "\n", " \n", " " ], "text/plain": [ "" ] }, "execution_count": 37, "metadata": {}, "output_type": "execute_result" } ], "source": [ "response = client.models.generate_content(\n", " model=MODEL_ID,\n", " contents=\"\"\"\n", " Make Speaker1 sound tired and bored, and Speaker2 sound excited and happy:\n", "\n", " Speaker1: So... what's on the agenda today?\n", " Speaker2: You're never going to guess!\n", " \"\"\",\n", " config={\"response_modalities\": ['Audio']},\n", ")\n", "\n", "play_audio(response)" ] }, { "cell_type": "markdown", "metadata": { "id": "XDenr3leML_F" }, "source": [ "You can also select the voices for each participants and pass their names to the model.\n", "\n", "But first let's generate a discussion between two scientists:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "id": "4BdZB-bM91_0" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "**(Intro music fades slightly)**\n", "\n", "**Dr. Claire:** Welcome back to 'Scales & Tales'! Claire here, buzzing after our fieldwork today.\n", "\n", "**Aurora:** Buzzing is an understatement, Dr. Claire! That *frog*! I still can't believe it.\n", "\n", "**Dr. Claire:** Oh, Aurora, the vibrancy of that *Agalychnis callidryas*… the red-eyed tree frog, for our listeners! Such an incredible find.\n", "\n", "**Aurora:** The *eyes*! And those orange feet! It just popped out of nowhere. Seriously, my heart was pounding!\n", "\n", "**Dr. Claire:** Absolutely. Seeing that flash of iridescent green against the leaf litter… pure magic. A perfect specimen, thriving in its habitat. It’s why we do this, right?\n", "\n", "**Aurora:** Exactly! Best. Day. Ever!\n" ] } ], "source": [ "transcript = client.models.generate_content(\n", " model='gemini-2.5-flash',\n", " contents=\"\"\"\n", " Hi, please generate a short (like 100 words) transcript that reads like\n", " it was clipped from a podcast by excited herpetologists, Dr. Claire and\n", " her assistant, the young Aurora.\n", " \"\"\"\n", " ).text\n", "\n", "print(transcript)" ] }, { "cell_type": "markdown", "metadata": { "id": "Vk6_K4k7MbXz" }, "source": [ "Then let's have the TTS model render the conversation using the voices you want." ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "id": "0fE7lyew-yJB" }, "outputs": [ { "data": { "text/html": [ "\n", " \n", " " ], "text/plain": [ "" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "config = types.GenerateContentConfig(\n", " response_modalities=[\"AUDIO\"],\n", " speech_config=types.SpeechConfig(\n", " multi_speaker_voice_config=types.MultiSpeakerVoiceConfig(\n", " speaker_voice_configs=[\n", " types.SpeakerVoiceConfig(\n", " speaker='Dr. Claire',\n", " voice_config=types.VoiceConfig(\n", " prebuilt_voice_config=types.PrebuiltVoiceConfig(\n", " voice_name='sulafat',\n", " )\n", " )\n", " ),\n", " types.SpeakerVoiceConfig(\n", " speaker='Aurora',\n", " voice_config=types.VoiceConfig(\n", " prebuilt_voice_config=types.PrebuiltVoiceConfig(\n", " voice_name='Leda',\n", " )\n", " )\n", " ),\n", " ]\n", " )\n", " )\n", ")\n", "\n", "response = client.models.generate_content(\n", " model=MODEL_ID,\n", " contents=\"TTS the following conversation between a very excited Dr. Claire and her assistant, the young Aurora: \"+transcript,\n", " config=config,\n", ")\n", "\n", "play_audio(response)" ] }, { "cell_type": "markdown", "metadata": { "id": "PXRw4IpZkkbN" }, "source": [ "# What's next?\n", "\n", "Now that you know how to generate multi-speaker conversations, here are other cool things to try:\n", "* Instead of speech, learn how to generate music conversation using the [Lyria RealTime](./Get_started_LyriaRealTime.ipynb),\n", "* Discover how to generate [images](./Get_started_imagen.ipynb) or [videos](./Get_started_Veo.ipynb),\n", "* Instead of generation music or audio, find out how to Gemini can [understand Audio files](./Audio.ipynb),\n", "* Have a real-time conversation with Gemini using the [Live API](./Get_started_LiveAPI.ipynb)." ] } ], "metadata": { "colab": { "collapsed_sections": [ "0fgOxpmGrOvn" ], "name": "Get_started_TTS.ipynb", "toc_visible": true }, "kernelspec": { "display_name": "Python 3", "name": "python3" } }, "nbformat": 4, "nbformat_minor": 0 }