{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "ZjkBUWm8ZMlc"
},
"source": [
"##### Copyright 2026 Google LLC."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"cellView": "form",
"id": "bOTfaUaSZKfF"
},
"outputs": [],
"source": [
"# @title Licensed under the Apache License, Version 2.0 (the \"License\");\n",
"# you may not use this file except in compliance with the License.\n",
"# You may obtain a copy of the License at\n",
"#\n",
"# https://www.apache.org/licenses/LICENSE-2.0\n",
"#\n",
"# Unless required by applicable law or agreed to in writing, software\n",
"# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
"# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
"# See the License for the specific language governing permissions and\n",
"# limitations under the License."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "CZC64QGBZ2v9"
},
"source": [
"# Gemini Quickstart: Gemini Robotics-ER 1.5"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "jMOdCB5FRVMZ"
},
"source": [
""
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "eDDzTW8DRVcc"
},
"source": [
"This notebook introduces the **Gemini Robotics-ER 1.5** model.\n",
"\n",
"Gemini Robotics-ER 1.5 is a vision-language model (VLM) that brings Gemini's agentic capabilities to robotics. It's designed for advanced reasoning in the physical world, allowing robots to interpret complex visual data, perform spatial reasoning, and plan actions from natural language commands.\n",
"\n",
"Key features and benefits:\n",
"\n",
"* **Enhanced autonomy:** Robots can reason, adapt, and respond to changes in open-ended environments.\n",
"* **Natural language interaction:** Makes robots easier to use by enabling complex task assignments using natural language.\n",
"* **Task orchestration:** Deconstructs natural language commands into subtasks and integrates with existing robot controllers and behaviors to complete long-horizon tasks.\n",
"* **Versatile capabilities:** Locates and identifies objects, understands object relationships, plans grasps and trajectories, and interprets dynamic scenes."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "81jqmyhKZTZ1"
},
"source": [
"## Setup\n",
"\n",
"This section will need to be run any time you start up Colab. The example sections following this are intended to be able to be run without reliance on any other example section, so you may skip through to the examples that are most relevant/interesting to you at the time (though it is strongly recommended that you read through each of these examples at least once!)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "0cdSPBCltkeQ"
},
"source": [
"### Install SDK"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"id": "ONn2UcU7akFl"
},
"outputs": [],
"source": [
"%pip install -U -q google-genai"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "PD3FmivAUXPk"
},
"source": [
"### Setup your API key\n",
"\n",
"To run the following cells, your API key must be stored it in a Colab Secret named `GOOGLE_API_KEY`. If you don't already have an API key, or you're not sure how to create a Colab Secret, see [Authentication](https://github.com/google-gemini/cookbook/blob/main/quickstarts/Authentication.ipynb) for an example."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"id": "6L0iD346and2"
},
"outputs": [],
"source": [
"from google.colab import userdata\n",
"\n",
"GOOGLE_API_KEY = userdata.get(\"GOOGLE_API_KEY\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "Al7zVIf2UqqY"
},
"source": [
"### Initialize SDK client\n",
"\n",
"Initialize a Gemini SDK client with your API key."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"id": "1ezNXUK8arKa"
},
"outputs": [],
"source": [
"from google import genai\n",
"from google.genai import types\n",
"\n",
"client = genai.Client(api_key=GOOGLE_API_KEY)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "95E3SYATU_u4"
},
"source": [
"### Select the Gemini Robotics-ER 1.5 model and test\n"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"id": "wIcki0r8VNHY"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Yes, I am here. What can I help you with?\n"
]
}
],
"source": [
"MODEL_ID = \"gemini-robotics-er-1.5-preview\"\n",
"\n",
"print(\n",
" client.models.generate_content(\n",
" model=MODEL_ID, contents=\"Are you there?\"\n",
" ).text\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "U-70SggjWURa"
},
"source": [
"### Imports and utility code"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "E12V7NJ6WlMV"
},
"source": [
"#### Imports"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"id": "P37Dv6YQaYw9"
},
"outputs": [],
"source": [
"import json\n",
"import textwrap\n",
"import time"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "MgFZ72Bva1FU"
},
"source": [
"#### Parsing JSON output"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"id": "ZH6qyE1Uacoq"
},
"outputs": [],
"source": [
"def parse_json(json_output):\n",
" # Parsing out the markdown fencing\n",
" lines = json_output.splitlines()\n",
" for i, line in enumerate(lines):\n",
" if line == \"```json\":\n",
" # Remove everything before \"```json\"\n",
" json_output = \"\\n\".join(lines[i + 1 :])\n",
" # Remove everything after the closing \"```\"\n",
" json_output = json_output.split(\"```\")[0]\n",
" break # Exit the loop once \"```json\" is found\n",
" return json_output"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "XRlsn7tJYUjI"
},
"source": [
"#### Resize images\n",
"\n",
"Resize images for faster rendering and smaller API calls."
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"id": "RMbTihD9YU9D"
},
"outputs": [],
"source": [
"from PIL import Image\n",
"\n",
"def get_image_resized(img_path):\n",
" img = Image.open(img_path)\n",
" img = img.resize(\n",
" (800, int(800 * img.size[1] / img.size[0])), Image.Resampling.LANCZOS\n",
" )\n",
" return img"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "VuvfFW2sa1wu"
},
"source": [
"#### Visualization helpers"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"id": "yQQ36nsiX0w-"
},
"outputs": [],
"source": [
"import base64\n",
"import dataclasses\n",
"from io import BytesIO\n",
"import numpy as np\n",
"from PIL import ImageColor, ImageDraw, ImageFont\n",
"from typing import Tuple\n",
"\n",
"import IPython\n",
"from IPython import display\n",
"\n",
"def generate_point_html(pil_image, points_json):\n",
" buffered = BytesIO()\n",
" pil_image.save(buffered, format=\"PNG\")\n",
" img_str = base64.b64encode(buffered.getvalue()).decode()\n",
" points_json = parse_json(points_json)\n",
"\n",
" return f\"\"\"\n",
"\n",
"\n",
"