{ "cells": [ { "cell_type": "markdown", "metadata": { "id": "0fgOxpmGrOvn" }, "source": [ "##### Copyright 2026 Google LLC." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "id": "zxdx4xJxrTfP" }, "outputs": [], "source": [ "# @title Licensed under the Apache License, Version 2.0 (the \"License\");\n", "# you may not use this file except in compliance with the License.\n", "# You may obtain a copy of the License at\n", "#\n", "# https://www.apache.org/licenses/LICENSE-2.0\n", "#\n", "# Unless required by applicable law or agreed to in writing, software\n", "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", "# See the License for the specific language governing permissions and\n", "# limitations under the License." ] }, { "cell_type": "markdown", "metadata": { "id": "Qw6ttkOtrQ_D" }, "source": [ "# Get started with Music generation using Lyria RealTime" ] }, { "cell_type": "markdown", "metadata": { "id": "d4f919f05306" }, "source": [ "" ] }, { "cell_type": "markdown", "metadata": { "id": "uX1mTOHNO2gz" }, "source": [ "[Lyria RealTime](https://deepmind.google/technologies/lyria/),\n", "provides access to a state-of-the-art, real-time, streaming music\n", "generation model. It allows developers to build applications where users\n", "can interactively create, continuously steer, and perform instrumental\n", "music using text prompts.\n", "\n", "Lyria RealTime main characteristics are:\n", "* **Highest quality text-to-audio model**: Lyria RealTime generates high-quality instrumental music (no voice) using the latest models produced by DeepMind.\n", "* **Non-stopping music**: Using websockets, Lyria RealTime continuously generates music in real time.\n", "* **Mix and match influences**: Prompt the model to describe musical idea, genre, instrument, mood, or characteristic. The prompts can be mixed to blend\n", "influences and create unique compositions.\n", "* **Creative control**: Set the `guidance`, the `bpm`, the `density` of musical notes/sounds, the `brightness` and the `scale` in real time. The model will smoothly transition based on the new input.\n", "\n", "Check Lyria RealTime's [documentation](https://ai.google.dev/gemini-api/docs/realtime-music-generation) for more details." ] }, { "cell_type": "markdown", "metadata": { "id": "e87HPlxJ7Gih" }, "source": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", "
\n", " 🪧\n", " \n", "

Lyria RealTime is a preview feature. It is free to use for now with quota limitations, but is subject to change.

\n", "
" ] }, { "cell_type": "markdown", "metadata": { "id": "j2-tpadpzS14" }, "source": [ "**Also note that due to Colab limitation, you won't be able to experience the real time capabilities of Lyria RealTime but only limited audio output. Use the [Python script](./Get_started_LyriaRealTime.py) or the AI studio's apps, [Prompt DJ](https://aistudio.google.com/apps/bundled/promptdj) and\n", "[MIDI DJ](https://aistudio.google.com/apps/bundled/promptdj-midi) to fully experience Lyria RealTime**" ] }, { "cell_type": "markdown", "metadata": { "id": "s0EUKDT5776Z" }, "source": [ "# Setup" ] }, { "cell_type": "markdown", "metadata": { "id": "IMLUXP3e8FUy" }, "source": [ "## Install the SDK\n", "Even if this notebook won't use the SDK, it will still use python and colab function to manage the websockets and the audio output." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "RtmzUezfdVZA" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[?25l \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m0.0/196.3 kB\u001b[0m \u001b[31m?\u001b[0m eta \u001b[36m-:--:--\u001b[0m\r\u001b[2K \u001b[91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[91m╸\u001b[0m \u001b[32m194.6/196.3 kB\u001b[0m \u001b[31m9.4 MB/s\u001b[0m eta \u001b[36m0:00:01\u001b[0m\r\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m196.3/196.3 kB\u001b[0m \u001b[31m4.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", "\u001b[?25h" ] } ], "source": [ "%pip install -U -q \"google-genai>=1.16.0\" # 1.16 is needed for the Lyria RealTime support" ] }, { "cell_type": "markdown", "metadata": { "id": "3ATH_PbR6Vcz" }, "source": [ "## API key" ] }, { "cell_type": "markdown", "metadata": { "id": "T_C_11Lu8KjK" }, "source": [ "To run the following cell, your API key must be stored in a Colab Secret named `GOOGLE_API_KEY`. If you don't already have an API key, or you're not sure how to create a Colab Secret, see [Authentication](https://colab.research.google.com/github/google-gemini/cookbook/blob/main/quickstarts/Authentication.ipynb) for an example." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "O3GSbPL99z0d" }, "outputs": [], "source": [ "from google.colab import userdata\n", "import os\n", "\n", "GOOGLE_API_KEY = userdata.get('GOOGLE_API_KEY')" ] }, { "cell_type": "markdown", "metadata": { "id": "fv1EcvfpmHjX" }, "source": [ "## Selecting the model and initializing the SDK client\n", "\n", "Lyria RealTime API is a new capability introduced with the Lyria RealTime model so only works with the `lyria-realtime-exp` model.\n", "\n", "As it's an experimental feature, you also need to use the `v1alpha` client version.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "QNxC25Pg4Hfr" }, "outputs": [], "source": [ "from google import genai\n", "from google.genai import types\n", "\n", "client = genai.Client(\n", " api_key=GOOGLE_API_KEY,\n", " http_options={'api_version': 'v1alpha'}, # v1alpha since Lyria RealTime is only experimental\n", ")\n", "\n", "MODEL_ID = 'models/lyria-realtime-exp'" ] }, { "cell_type": "markdown", "metadata": { "id": "UG-etYk47aP4" }, "source": [ "## Helpers" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "id": "30dIpYsUesm2" }, "outputs": [], "source": [ "# @title Logging\n", "# For the sake of understanding how Lyria RealTime works, all logs are going to\n", "# be displayed, but feel free to comment those lines if that's too much for you.\n", "\n", "import logging\n", "\n", "logger = logging.getLogger('Bidi')\n", "logger.setLevel('DEBUG')" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "id": "ZEbwH45fEBbc" }, "outputs": [], "source": [ "# @title Wave file writer\n", "\n", "import contextlib\n", "import wave\n", "\n", "@contextlib.contextmanager\n", "def wave_file(filename, channels=2, rate=48000, sample_width=2):\n", " with wave.open(filename, \"wb\") as wf:\n", " wf.setnchannels(channels)\n", " wf.setsampwidth(sample_width)\n", " wf.setframerate(rate)\n", " yield wf" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "id": "l6D62oLe5ENt" }, "outputs": [], "source": [ "# @title Text to prompt parser\n", "\n", "def parse_input(input_text):\n", " if \":\" in input_text:\n", " parsed_prompts = []\n", " segments = input_text.split(',')\n", " malformed_segment_exists = False # Tracks if any segment had a parsing error\n", "\n", " for segment_str_raw in segments:\n", " segment_str = segment_str_raw.strip()\n", " if not segment_str: # Skip empty segments (e.g., from \"text1:1, , text2:2\")\n", " continue\n", "\n", " # Split on the first colon only, in case prompt text itself contains colons\n", " parts = segment_str.split(':', 1)\n", "\n", " if len(parts) == 2:\n", " text_p = parts[0].strip()\n", " weight_s = parts[1].strip()\n", "\n", " if not text_p: # Prompt text should not be empty\n", " print(f\"Error: Empty prompt text in segment '{segment_str_raw}'. Skipping this segment.\")\n", " malformed_segment_exists = True\n", " continue # Skip this malformed segment\n", " try:\n", " weight_f = float(weight_s) # Weights are floats\n", " parsed_prompts.append(types.WeightedPrompt(text=text_p, weight=weight_f))\n", " except ValueError:\n", " print(f\"Error: Invalid weight '{weight_s}' in segment '{segment_str_raw}'. Must be a number. Skipping this segment.\")\n", " malformed_segment_exists = True\n", " continue # Skip this malformed segment\n", " else:\n", " # This segment is not in \"text:weight\" format.\n", " print(f\"Error: Segment '{segment_str_raw}' is not in 'text:weight' format. Skipping this segment.\")\n", " malformed_segment_exists = True\n", " continue # Skip this malformed segment\n", "\n", " if parsed_prompts: # If at least one prompt is successfully parsed\n", " prompt_repr = [f\"'{p.text}':{p.weight}\" for p in parsed_prompts]\n", " if malformed_segment_exists:\n", " print(f\"Partially sending {len(parsed_prompts)} valid weighted prompt(s) due to errors in other segments: {', '.join(prompt_repr)}\")\n", " else:\n", " print(f\"Sending multiple weighted prompts: {', '.join(prompt_repr)}\")\n", " return parsed_prompts\n", " else: # No valid prompts were parsed from the input string that contained \":\"\n", " print(\"Error: Input contained ':' suggesting multi-prompt format, but no valid 'text:weight' segments were successfully parsed. No action taken.\")\n", " return None\n", " else:\n", " print(f\"Sending single text prompt: \\\"{input_text}\\\"\")\n", " return types.WeightedPrompt(text=input_text, weight=1.0)\n" ] }, { "cell_type": "markdown", "metadata": { "id": "QutDG7r78Zf-" }, "source": [ "# Main audio loop" ] }, { "cell_type": "markdown", "metadata": { "id": "ERqyY0IFN8G9" }, "source": [ "The class below implements the interaction with the Lyria RealTime API.\n", "\n", "This is a basic implementation that could be improved but was kept as simple as possible to keep it easy to understand.\n", "\n", "The [python script](Get_started_LyriaRealTime.py) is a more complete example with better thread and error handling and most of all, real-time interractions.\n", "\n", "There are 2 methods worth describing here:" ] }, { "cell_type": "markdown", "metadata": { "id": "tXPhEdHIPBif" }, "source": [ "

generate_music - The main function

\n", "\n", "This method:\n", "\n", "- Opens a `websocket` connecting to the real time API\n", "- Sends the initial prompt to the model using `session.set_weighted_prompts`. If none was provided it asked for a prompt and parse it using the `parse_input` helper.\n", "- If provided, it then send the music generation configuration using `session.set_music_generation_config`\n", "- Finally it starts the music generation with `session.play()`" ] }, { "cell_type": "markdown", "metadata": { "id": "tLukmBhPPib4" }, "source": [ "

receive - Collects audio from the API and plays it

\n", "\n", "The `receive` method listen to the model ouputs and collects the audio chunks in a loop and writes them to a `.wav` file using the `wave_file` helper. It stops after a certain number of chunks (10 by default)." ] }, { "cell_type": "markdown", "metadata": { "id": "oCg1qFf0PV44" }, "source": [ "Ideally if you want to interact in real-time with Lyria RealTime you should also implement a send method to send the new prompts/config to the model. Check the [python code sample](./Get_started_LyriaRealTime.ipynb) for such an example." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "7p_5nXFo3ecA" }, "outputs": [], "source": [ "import asyncio\n", "\n", "file_index = 0\n", "\n", "async def generate_music(prompts=None, max_chunks=10, config=None):\n", " async with client.aio.live.music.connect(model=MODEL_ID) as session:\n", " async def receive():\n", " global file_index\n", " # Start a new `.wav` file.\n", " file_name = f\"audio_{file_index}.wav\"\n", " with wave_file(file_name) as wav:\n", " file_index += 1\n", "\n", " logger.debug('receive')\n", "\n", " # Read chunks from the socket.\n", " n = 0\n", " async for message in session.receive():\n", " n+=1\n", " if n > max_chunks:\n", " break\n", "\n", " # Write audio the chunk to the `.wav` file.\n", " audio_chunk = message.server_content.audio_chunks[0].data\n", " if audio_chunk is not None:\n", " logger.debug('Got audio_chunk')\n", " wav.writeframes(audio_chunk)\n", "\n", " await asyncio.sleep(10**-12)\n", "\n", " # This code example doesn't have a way to receive requests because of colab\n", " # limitations, check the python code sample for a more complete example\n", "\n", " while prompts is None:\n", " input_prompt = await asyncio.to_thread(input, \"prompt > \")\n", " prompts = parse_input(input_prompt)\n", "\n", " # Sending the provided prompts\n", " await session.set_weighted_prompts(\n", " prompts=prompts\n", " )\n", "\n", " # Set initial configuration\n", " if config is not None:\n", " await session.set_music_generation_config(config=config)\n", "\n", " # Start music generation\n", " await session.play()\n", "\n", " receive_task = asyncio.create_task(receive())\n", "\n", " # Don't quit the loop until tasks are done\n", " await asyncio.gather(receive_task)" ] }, { "cell_type": "markdown", "metadata": { "id": "gGYtiV2N8b2o" }, "source": [ "# Try Lyria RealTime\n", "\n", "Because of Colab limitation you won't be able to experience the \"real time\" part of Lyria RealTime, so all those examples are going to be one-offs prompt to get an audio file.\n", "\n", "One thing to note is that the audio will only be played at the end of the session when all would have been written in the wav file. When using the API for real you'll be able to start plyaing as soon as the first chunk arrives. So the longer the duration (using the dedicated parameter) you set, the longer you'll have to wait until you hear something." ] }, { "cell_type": "markdown", "metadata": { "id": "kpifUNrOhgNe" }, "source": [ "## Simple Lyria RealTime example\n", "Here's first a simple example:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "e9byyNVthoZv" }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "DEBUG:Bidi:receive\n", "DEBUG:Bidi:Got audio_chunk\n", "DEBUG:Bidi:Got audio_chunk\n", "DEBUG:Bidi:Got audio_chunk\n", "DEBUG:Bidi:Got audio_chunk\n", "DEBUG:Bidi:Got audio_chunk\n", "DEBUG:Bidi:Got audio_chunk\n", "DEBUG:Bidi:Got audio_chunk\n", "DEBUG:Bidi:Got audio_chunk\n", "DEBUG:Bidi:Got audio_chunk\n", "DEBUG:Bidi:Got audio_chunk\n" ] }, { "data": { "text/html": [ "\n", " \n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from IPython.display import display, Audio\n", "\n", "await generate_music(prompts=[{\"text\":\"piano\", \"weight\":1.0}])\n", "display(Audio(f\"audio_{file_index-1}.wav\"))" ] }, { "cell_type": "markdown", "metadata": { "id": "YPfV0XG6hjWH" }, "source": [ "## Try Lyria RealTime by yourself\n", "\n", "Now you can try mixing multiple prompts, and tinkering with the music configuration.\n", "\n", "The prompts needs to follow their specific format which is a list of prompts with weights (which can be any values, including negative, except 0) like this:\n", "```\n", "{\n", " \"text\": \"Text of the prompt\",\n", " \"weight\": 1.0,\n", "}\n", "```\n", "\n", "You should try to stay simple (unlike when you're using [image-out](../Get_Started_Nano_Banana.ipynb)) as the model will better understand things like \"meditation\", \"eerie\", \"harp\" than \"An eerie and relaxing music illustrating the verdoyant forests of Scotland using string instruments\".\n", "\n", "The music configuration options available to you are:\n", "* `bpm`: beats per minute\n", "* `guidance`: how strictly the model follows the prompts\n", "* `density`: density of musical notes/sounds\n", "* `brightness`: tonal quality\n", "* `scale`: musical scale (key and mode)\n", "* `music_generation_mode`: quality (default), diversity, or allow vocalization (you'll need to add related prompts).\n", "\n", "Other options are available (`mute_bass` for ex.). Check the [documentation](https://ai.google.dev/gemini-api/docs/music-generation#controls) for the full list.\n", "\n", "Select one of the sample prompts (genres,\tinstruments and\tmood), or write your owns. Check the [documentation](https://ai.google.dev/gemini-api/docs/music-generation#prompt-guide-lyria) for more details and prompt examples." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "id": "okh4hFuhFj9F" }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "DEBUG:Bidi:receive\n", "DEBUG:Bidi:Got audio_chunk\n", "DEBUG:Bidi:Got audio_chunk\n", "DEBUG:Bidi:Got audio_chunk\n", "DEBUG:Bidi:Got audio_chunk\n", "DEBUG:Bidi:Got audio_chunk\n", "DEBUG:Bidi:Got audio_chunk\n", "DEBUG:Bidi:Got audio_chunk\n", "DEBUG:Bidi:Got audio_chunk\n", "DEBUG:Bidi:Got audio_chunk\n", "DEBUG:Bidi:Got audio_chunk\n" ] }, { "data": { "text/html": [ "\n", " \n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# prompt: I made a mistake, I need to append to message[\"client_content\"] if prompt_2/3/4/5 are not empty\n", "\n", "# @markdown ### Enter some prompts:\n", "prompt_1 = \"Indie Pop\" # @param [\"Hard Rock\",\"Latin Jazz\",\"Polka\",\"Baroque\",\"Chiptune\",\"Indie Pop\",\"Bluegrass\",\"Heavy Metal\",\"Contemporary R&B\",\"Reggaeton\"] {\"allow-input\":true}\n", "prompt_1_weight = 0.6 # @param {type:\"slider\", min:0, max:2, step:0.1}\n", "prompt_2 = \"Sitar\" # @param [\"Piano\",\"Guitar\",\"Bagpipes\",\"Harpsichord\",\"808 Hip Hop Beat\",\"Sitar\",\"Harmonica\",\"Didgeridoo\",\"Woodwinds\",\"Organ\"] {\"allow-input\":true}\n", "prompt_2_weight = 2 # @param {type:\"slider\", min:0, max:2, step:0.1}\n", "prompt_3 = \"Danceable\" # @param [\"Chill\",\"Emotional\",\"Danceable\",\"Psychedelic\",\"Acoustic Instruments\",\"Glitchy Effects\",\"Ominous Drone\",\"Upbeat\"] {\"allow-input\":true}\n", "prompt_3_weight = 1.4 # @param {type:\"slider\", min:0, max:2, step:0.1}\n", "prompt_4 = \"\" # @param {\"type\":\"string\",\"placeholder\":\"Fourth prompt (optional)\"}\n", "prompt_4_weight = 1.0 # @param {type:\"slider\", min:0, max:2, step:0.1}\n", "prompt_5 = \"\" # @param {\"type\":\"string\",\"placeholder\":\"Fifth prompt (optional)\"}\n", "prompt_5_weight = 1.0 # @param {type:\"slider\", min:0, max:2, step:0.1}\n", "\n", "\n", "# @markdown ### Music configuration:\n", "BPM = 140 # @param {type:\"slider\", min:40, max:180, step:1}\n", "scale = \"F_MAJOR_D_MINOR\" # @param [\"SCALE_UNSPECIFIED\",\"C_MAJOR_A_MINOR\",\"D_FLAT_MAJOR_B_FLAT_MINOR\",\"D_MAJOR_B_MINOR\",\"E_FLAT_MAJOR_C_MINOR\",\"E_MAJOR_D_FLAT_MINOR\",\"F_MAJOR_D_MINOR\",\"G_FLAT_MAJOR_E_FLAT_MINOR\",\"G_MAJOR_E_MINOR\",\"A_FLAT_MAJOR_F_MINOR\",\"A_MAJOR_G_FLAT_MINOR\",\"B_FLAT_MAJOR_G_MINOR\",\"B_MAJOR_A_FLAT_MINOR\"]\n", "density = 0.2 # @param {type:\"slider\", min:0, max:1, step:0.1}\n", "brightness = 0.7 # @param {type:\"slider\", min:0, max:1, step:0.1}\n", "guidance = 4.0 # @param {type:\"slider\", min:0, max:6, step:0.1}\n", "music_generation_mode = \"QUALITY\" # @param [\"QUALITY\",\"DIVERSITY\",\"VOCALIZATION\"]\n", "\n", "# @markdown ### Duration (in seconds):\n", "duration = 20 # @param {type:\"slider\", min:2, max:60, step:2}\n", "\n", "# @markdown Now press the play button on the top right corner of this cell to run it and let Lyria RealTime generate your music\n", "\n", "prompts = [{\n", " \"text\": prompt_1,\n", " \"weight\": prompt_1_weight,\n", "}]\n", "\n", "if prompt_2:\n", " prompts.append({\n", " \"text\": prompt_2,\n", " \"weight\": prompt_2_weight,\n", " })\n", "if prompt_3:\n", " prompts.append({\n", " \"text\": prompt_3,\n", " \"weight\": prompt_3_weight,\n", " })\n", "if prompt_4:\n", " prompts.append({\n", " \"text\": prompt_4,\n", " \"weight\": prompt_4_weight,\n", " })\n", "if prompt_5:\n", " prompts.append({\n", " \"text\": prompt_5,\n", " \"weight\": prompt_5_weight,\n", " })\n", "\n", "config = {\n", " 'music_generation_config': {\n", " 'bpm': BPM,\n", " 'scale': scale,\n", " 'density': density,\n", " 'brightness': brightness,\n", " 'guidance': guidance,\n", " 'music_generation_mode': music_generation_mode,\n", " }\n", "}\n", "\n", "await generate_music(max_chunks=duration/2, prompts=prompts, config=config)\n", "display(Audio(f\"audio_{file_index-1}.wav\"))" ] }, { "cell_type": "markdown", "metadata": { "id": "PXRw4IpZkkbN" }, "source": [ "# What's next?\n", "\n", "Now that you know how to generate music, here are other cool things to try:\n", "* Instead of music, learn how to generate multi-speakers conversation using the [TTS models](./Get_started_TTS.ipynb),\n", "* Discover how to generate [images](./Get_started_imagen.ipynb) or [videos](./Get_started_Veo.ipynb),\n", "* Instead of generation music or audio, find out how to Gemini can [understand Audio files](./Audio.ipynb),\n", "* Have a real-time conversation with Gemini using the [Live API](./Get_started_LiveAPI.ipynb)." ] } ], "metadata": { "colab": { "collapsed_sections": [ "UG-etYk47aP4" ], "name": "Get_started_LyriaRealTime.ipynb", "toc_visible": true }, "kernelspec": { "display_name": "Python 3", "name": "python3" } }, "nbformat": 4, "nbformat_minor": 0 }