{ "cells": [ { "cell_type": "markdown", "id": "license-header", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Cosmos3 Generator Audiovisual with Diffusers\n", "\n", "This notebook runs Cosmos3 audiovisual generation directly with `Cosmos3OmniPipeline`.\n", "\n", "Run all Cosmos3-Nano examples first, then run the Cosmos3-Super T2V/I2V examples without audio. Each section loads the matching model explicitly.\n", "\n", "Note: if you have already completed steps 1-3 and installed the `Cosmos3 Diffusers (Python 3.13)` kernel, switch to that kernel and jump directly to step 4. Run the restore cell there, then continue with verification and the examples.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1. Prerequisites\n", "\n", "Use a Linux machine with NVIDIA GPU access, model access on Hugging Face, and either `uvx hf@latest auth login` or `HF_TOKEN` set.\n", "\n", "> **Headless servers:** if you see an error like `libxcb.so.1: cannot open shared object file` (a missing system graphics library) when importing or running the pipeline, install the required system libraries:\n", ">\n", "> ```bash\n", "> apt-get install -y libxcb1 libgl1 libglib2.0-0\n", "> ```\n", "\n", "> **uv version:** these notebooks need `uv >= 0.11.3`. Older versions fail to parse the project config and do not recognize newer `--torch-backend` values such as `cu130` (you may see errors like `a value is required for '--torch-backend'` or an invalid-value list that stops at `cu129`). If you hit version-related errors, upgrade with `uv self update` (or reinstall from https://astral.sh/uv).\n" ], "id": "d610933a" }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2. Configure Paths and Environment\n", "\n", "The defaults are relative to this `cosmos` checkout and use the CUDA 13 or 12.8 Torch backend depending on the CUDA version installed on your system (`cu130` or `cu128`):\n", "\n", "```bash\n", "export COSMOS3_DIFFUSERS_VENV=/path/to/.venv-cosmos3-diffusers\n", "export COSMOS3_TORCH_BACKEND=cu130\n", "export HF_HOME=/path/to/large/huggingface/cache\n", "export UV_LINK_MODE=copy\n", "export CUDA_VISIBLE_DEVICES=0\n", "```\n" ], "id": "518de78c" }, { "cell_type": "code", "metadata": {}, "source": [ "from pathlib import Path\n", "import os\n", "\n", "\n", "def find_repo_root(start: Path) -> Path:\n", " for path in [start, *start.parents]:\n", " if (path / \"README.md\").exists() and (path / \"cookbooks\").exists():\n", " return path\n", " return start\n", "\n", "\n", "def configure_diffusers_environment() -> None:\n", " global COSMOS_ROOT\n", " global COSMOS3_AUDIOVISUAL_ROOT\n", " global COSMOS3_DIFFUSERS_VENV\n", " global COSMOS3_TORCH_BACKEND\n", " global COSMOS3_AUDIOVISUAL_OUTPUT_ROOT\n", "\n", " COSMOS_ROOT = find_repo_root(Path.cwd().resolve())\n", " COSMOS3_AUDIOVISUAL_ROOT = COSMOS_ROOT / \"cookbooks\" / \"cosmos3\" / \"generator\" / \"audiovisual\"\n", " COSMOS3_DIFFUSERS_VENV = Path(\n", " os.environ.get(\"COSMOS3_DIFFUSERS_VENV\", COSMOS_ROOT / \".venv-cosmos3-diffusers\")\n", " ).resolve()\n", " COSMOS3_TORCH_BACKEND = os.environ.get(\"COSMOS3_TORCH_BACKEND\", \"cu130\")\n", " COSMOS3_AUDIOVISUAL_OUTPUT_ROOT = Path(\n", " os.environ.get(\"COSMOS3_AUDIOVISUAL_OUTPUT_ROOT\", COSMOS3_AUDIOVISUAL_ROOT / \"outputs\" / \"notebooks\")\n", " ).resolve()\n", "\n", " os.environ[\"COSMOS3_DIFFUSERS_VENV\"] = str(COSMOS3_DIFFUSERS_VENV)\n", " os.environ[\"COSMOS3_TORCH_BACKEND\"] = COSMOS3_TORCH_BACKEND\n", " os.environ[\"COSMOS3_AUDIOVISUAL_OUTPUT_ROOT\"] = str(COSMOS3_AUDIOVISUAL_OUTPUT_ROOT)\n", " os.environ.setdefault(\"UV_CACHE_DIR\", str(Path.home() / \".cache\" / \"uv\"))\n", " os.environ.setdefault(\"UV_LINK_MODE\", \"copy\")\n", " os.environ.setdefault(\"HF_HOME\", str(Path.home() / \".cache\" / \"huggingface\"))\n", " os.environ.setdefault(\"HF_HUB_DISABLE_XET\", \"1\")\n", " os.environ.setdefault(\"CUDA_VISIBLE_DEVICES\", \"0\")\n", "\n", " print(f\"COSMOS_ROOT: {COSMOS_ROOT}\")\n", " for key in [\n", " \"COSMOS3_DIFFUSERS_VENV\",\n", " \"COSMOS3_TORCH_BACKEND\",\n", " \"COSMOS3_AUDIOVISUAL_OUTPUT_ROOT\",\n", " \"UV_CACHE_DIR\",\n", " \"UV_LINK_MODE\",\n", " \"HF_HOME\",\n", " \"HF_HUB_DISABLE_XET\",\n", " \"CUDA_VISIBLE_DEVICES\",\n", " ]:\n", " print(f\"{key}: {os.environ[key]}\")\n", " print(\"HF_TOKEN:\", \"\" if os.environ.get(\"HF_TOKEN\") else \"\")\n", "\n", "\n", "configure_diffusers_environment()\n" ], "execution_count": null, "outputs": [], "id": "d7d2812b" }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 3. Install Diffusers Dependencies\n" ], "id": "6e263442" }, { "cell_type": "code", "metadata": {}, "source": [ "%%bash\n", "set -euo pipefail\n", "\n", "if ! command -v uv >/dev/null 2>&1; then\n", " echo \"uv is not installed. Install it first: https://docs.astral.sh/uv/getting-started/installation/\"\n", " exit 1\n", "fi\n", "\n", "export UV_LINK_MODE=\"${UV_LINK_MODE:-copy}\"\n", "uv venv \"$COSMOS3_DIFFUSERS_VENV\" --python 3.13 --seed --managed-python --allow-existing\n", "source \"$COSMOS3_DIFFUSERS_VENV/bin/activate\"\n", "uv pip install --torch-backend=\"$COSMOS3_TORCH_BACKEND\" \\\n", " \"diffusers @ git+https://github.com/huggingface/diffusers.git\" \\\n", " accelerate \\\n", " av \\\n", " cosmos_guardrail \\\n", " huggingface_hub \\\n", " imageio \\\n", " imageio-ffmpeg \\\n", " ipykernel \\\n", " torch \\\n", " torchvision \\\n", " transformers\n", "\n", "\"$COSMOS3_DIFFUSERS_VENV/bin/python\" -m ipykernel install --user \\\n", " --name cosmos3-diffusers \\\n", " --display-name \"Cosmos3 Diffusers (Python 3.13)\"\n", "\n", "echo\n", "echo \"Installed dependencies into: $COSMOS3_DIFFUSERS_VENV\"\n", "echo \"Next: switch this notebook kernel to: Cosmos3 Diffusers (Python 3.13)\"\n", "echo \"After switching kernels, run the Restore Environment cell below, then continue with Verify.\"\n" ], "execution_count": null, "outputs": [], "id": "35c54e56" }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 4. Select the Diffusers Kernel\n", "\n", "The install cell creates and registers the `Cosmos3 Diffusers (Python 3.13)` Jupyter kernel. \n", "\n", "**Note**: Switch this notebook to that kernel before running the remaining Python cells, then run the restore cell immediately below. It can take some time for the new Jupyter kernel to show up in the notebook interface." ], "id": "1675286a" }, { "cell_type": "code", "metadata": {}, "source": [ "# Run this cell immediately after switching to the Cosmos3 Diffusers kernel.\n", "# It restores the same paths and cache settings as the setup cell above.\n", "from pathlib import Path\n", "import os\n", "\n", "\n", "def find_repo_root(start: Path) -> Path:\n", " for path in [start, *start.parents]:\n", " if (path / \"README.md\").exists() and (path / \"cookbooks\").exists():\n", " return path\n", " return start\n", "\n", "\n", "def configure_diffusers_environment() -> None:\n", " global COSMOS_ROOT\n", " global COSMOS3_AUDIOVISUAL_ROOT\n", " global COSMOS3_DIFFUSERS_VENV\n", " global COSMOS3_TORCH_BACKEND\n", " global COSMOS3_AUDIOVISUAL_OUTPUT_ROOT\n", "\n", " COSMOS_ROOT = find_repo_root(Path.cwd().resolve())\n", " COSMOS3_AUDIOVISUAL_ROOT = COSMOS_ROOT / \"cookbooks\" / \"cosmos3\" / \"generator\" / \"audiovisual\"\n", " COSMOS3_DIFFUSERS_VENV = Path(\n", " os.environ.get(\"COSMOS3_DIFFUSERS_VENV\", COSMOS_ROOT / \".venv-cosmos3-diffusers\")\n", " ).resolve()\n", " COSMOS3_TORCH_BACKEND = os.environ.get(\"COSMOS3_TORCH_BACKEND\", \"cu130\")\n", " COSMOS3_AUDIOVISUAL_OUTPUT_ROOT = Path(\n", " os.environ.get(\"COSMOS3_AUDIOVISUAL_OUTPUT_ROOT\", COSMOS3_AUDIOVISUAL_ROOT / \"outputs\" / \"notebooks\")\n", " ).resolve()\n", "\n", " os.environ[\"COSMOS3_DIFFUSERS_VENV\"] = str(COSMOS3_DIFFUSERS_VENV)\n", " os.environ[\"COSMOS3_TORCH_BACKEND\"] = COSMOS3_TORCH_BACKEND\n", " os.environ[\"COSMOS3_AUDIOVISUAL_OUTPUT_ROOT\"] = str(COSMOS3_AUDIOVISUAL_OUTPUT_ROOT)\n", " os.environ.setdefault(\"UV_CACHE_DIR\", str(Path.home() / \".cache\" / \"uv\"))\n", " os.environ.setdefault(\"UV_LINK_MODE\", \"copy\")\n", " os.environ.setdefault(\"HF_HOME\", str(Path.home() / \".cache\" / \"huggingface\"))\n", " os.environ.setdefault(\"HF_HUB_DISABLE_XET\", \"1\")\n", " os.environ.setdefault(\"CUDA_VISIBLE_DEVICES\", \"0\")\n", "\n", " print(f\"COSMOS_ROOT: {COSMOS_ROOT}\")\n", " for key in [\n", " \"COSMOS3_DIFFUSERS_VENV\",\n", " \"COSMOS3_TORCH_BACKEND\",\n", " \"COSMOS3_AUDIOVISUAL_OUTPUT_ROOT\",\n", " \"UV_CACHE_DIR\",\n", " \"UV_LINK_MODE\",\n", " \"HF_HOME\",\n", " \"HF_HUB_DISABLE_XET\",\n", " \"CUDA_VISIBLE_DEVICES\",\n", " ]:\n", " print(f\"{key}: {os.environ[key]}\")\n", " print(\"HF_TOKEN:\", \"\" if os.environ.get(\"HF_TOKEN\") else \"\")\n", "\n", "\n", "configure_diffusers_environment()\n" ], "execution_count": null, "outputs": [], "id": "e7dd97ad" }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 5. Verify GPU and Python Environment\n" ], "id": "652f7144" }, { "cell_type": "code", "metadata": {}, "source": [ "import os\n", "import sys\n", "from pathlib import Path\n", "\n", "if \"COSMOS3_DIFFUSERS_VENV\" not in os.environ:\n", " raise RuntimeError(\"Run the Restore Environment cell after switching to the Diffusers kernel.\")\n", "\n", "expected_python = (Path(os.environ[\"COSMOS3_DIFFUSERS_VENV\"]) / \"bin\" / \"python\").resolve()\n", "current_python = Path(sys.executable).resolve()\n", "print(\"kernel python:\", current_python)\n", "print(\"expected python:\", expected_python)\n", "if current_python != expected_python:\n", " raise RuntimeError(\n", " \"This notebook is not running inside the Diffusers venv. \"\n", " \"Switch the notebook kernel to 'Cosmos3 Diffusers (Python 3.13)', then run the Restore Environment cell above.\"\n", " )\n", "\n", "import torch\n", "import diffusers\n", "\n", "print(\"diffusers:\", diffusers.__version__)\n", "print(\"torch:\", torch.__version__)\n", "print(\"torch cuda:\", torch.version.cuda)\n", "print(\"cuda available:\", torch.cuda.is_available())\n", "print(\"device count:\", torch.cuda.device_count())\n", "if torch.cuda.is_available():\n", " print(\"device 0:\", torch.cuda.get_device_name(0))\n" ], "execution_count": null, "outputs": [], "id": "277b3c47" }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 6. Preview Available Inputs\n" ], "id": "c3228cd1" }, { "cell_type": "code", "metadata": {}, "source": [ "from pathlib import Path\n", "import json\n", "from IPython.display import Image, display\n", "\n", "assets_dir = COSMOS3_AUDIOVISUAL_ROOT / \"assets\"\n", "for prompt_dir in sorted((assets_dir / \"prompts\").iterdir()):\n", " if not prompt_dir.is_dir():\n", " continue\n", " print(f\"{prompt_dir.relative_to(assets_dir)}:\")\n", " for prompt_path in sorted(prompt_dir.glob(\"*.json\")):\n", " data = json.loads(prompt_path.read_text())\n", " caption = (\n", " data.get(\"temporal_caption\")\n", " or data.get(\"comprehensive_t2i_caption\")\n", " or data.get(\"extra\", {}).get(\"prompt\", \"\")\n", " )\n", " print(f\" {prompt_path.name}: {caption[:180]}{'...' if len(caption) > 180 else ''}\")\n", " print()\n", "\n", "for image_dir in sorted((assets_dir / \"images\").iterdir()):\n", " if not image_dir.is_dir():\n", " continue\n", " print(f\"{image_dir.relative_to(assets_dir)}:\")\n", " for image_path in sorted(image_dir.iterdir()):\n", " if image_path.suffix.lower() in {\".jpg\", \".jpeg\", \".png\", \".webp\", \".bmp\"}:\n", " print(f\" {image_path.name}\")\n", " display(Image(filename=str(image_path), width=420))\n" ], "execution_count": null, "outputs": [], "id": "f57b45a8" }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 7. Define Asset Sets, Payload Helpers, Runner, and Viewer Helpers\n" ], "id": "e9219609" }, { "cell_type": "code", "metadata": {}, "source": [ "import json\n", "import os\n", "from pathlib import Path\n", "from IPython.display import Image, display\n", "\n", "IMAGE_EXTENSIONS = {\".jpg\", \".jpeg\", \".png\", \".webp\", \".bmp\"}\n", "\n", "FIXED_SAMPLING = {\n", " \"num_steps\": 35,\n", " \"guidance\": 6.0,\n", " \"shift\": 10.0,\n", " \"fps\": 24,\n", " \"num_frames\": 189,\n", " \"resolution\": \"720\",\n", " \"aspect_ratio\": \"16,9\",\n", " \"seed\": 1234,\n", "}\n", "\n", "# All asset paths are repo-relative under cookbooks/cosmos3/generator/audiovisual.\n", "# The Nano/Super cases without audio are intentionally separate so each run uses the matching model.\n", "ASSET_SETS = {\n", " \"t2i\": {\n", " \"model\": \"Cosmos3-Nano\",\n", " \"mode\": \"text2image\",\n", " \"prompt\": \"assets/prompts/text2image/robot_draping.json\",\n", " \"enable_sound\": False,\n", " },\n", " \"t2i_super\": {\n", " \"model\": \"Cosmos3-Super\",\n", " \"mode\": \"text2image\",\n", " \"prompt\": \"assets/prompts/text2image/robot_draping.json\",\n", " \"enable_sound\": False,\n", " },\n", " \"t2v_nano_noaudio\": {\n", " \"model\": \"Cosmos3-Nano\",\n", " \"mode\": \"text2video\",\n", " \"prompt\": \"assets/prompts/text2video/robot_kitchen.json\",\n", " \"enable_sound\": False,\n", " },\n", " \"t2vs\": {\n", " \"model\": \"Cosmos3-Nano\",\n", " \"mode\": \"text2video\",\n", " \"prompt\": \"assets/prompts/text2video/robot_pouring_water_audio.json\",\n", " \"enable_sound\": True,\n", " },\n", " \"i2v_nano_noaudio\": {\n", " \"model\": \"Cosmos3-Nano\",\n", " \"mode\": \"image2video\",\n", " \"prompt\": \"assets/prompts/image2video/car_driving.json\",\n", " \"image\": \"assets/images/image2video/car_driving.jpg\",\n", " \"enable_sound\": False,\n", " },\n", " \"i2vs\": {\n", " \"model\": \"Cosmos3-Nano\",\n", " \"mode\": \"image2video\",\n", " \"prompt\": \"assets/prompts/image2video/coastal_road_audio.json\",\n", " \"image\": \"assets/images/image2video/coastal_road_audio.jpg\",\n", " \"enable_sound\": True,\n", " },\n", " \"t2v_super_noaudio\": {\n", " \"model\": \"Cosmos3-Super\",\n", " \"mode\": \"text2video\",\n", " \"prompt\": \"assets/prompts/text2video/robot_kitchen.json\",\n", " \"enable_sound\": False,\n", " },\n", " \"i2v_super_noaudio\": {\n", " \"model\": \"Cosmos3-Super\",\n", " \"mode\": \"image2video\",\n", " \"prompt\": \"assets/prompts/image2video/car_driving.json\",\n", " \"image\": \"assets/images/image2video/car_driving.jpg\",\n", " \"enable_sound\": False,\n", " },\n", "}\n", "\n", "\n", "def asset_path(relative_path: str) -> Path:\n", " path = COSMOS3_AUDIOVISUAL_ROOT / relative_path\n", " if not path.exists():\n", " raise FileNotFoundError(path)\n", " return path.resolve()\n", "\n", "\n", "def compact_json_file(path: Path) -> str:\n", " return json.dumps(json.loads(path.read_text()), ensure_ascii=True, separators=(\",\", \":\"))\n", "\n", "\n", "def payload_dimensions(payload: dict) -> tuple[int, int]:\n", " if payload.get(\"resolution\") == \"720\" and payload.get(\"aspect_ratio\") == \"16,9\":\n", " return 720, 1280\n", " if payload.get(\"resolution\") == \"256\" and payload.get(\"aspect_ratio\") == \"16,9\":\n", " return 192, 320\n", " raise ValueError(f\"Unsupported payload resolution/aspect ratio: {payload.get('resolution')} {payload.get('aspect_ratio')}\")\n", "\n", "\n", "def resolve_payload_path(payload_path: Path, value: str) -> Path:\n", " path = Path(value)\n", " if path.is_absolute():\n", " return path\n", " return (payload_path.parent / path).resolve()\n", "\n", "\n", "def create_payload(use_case: str, *, backend: str) -> tuple[Path, Path, str]:\n", " spec = ASSET_SETS[use_case]\n", " payload_dir = Path(os.environ[\"COSMOS3_AUDIOVISUAL_OUTPUT_ROOT\"]) / backend / \"payloads\" / use_case\n", " output_dir = Path(os.environ[\"COSMOS3_AUDIOVISUAL_OUTPUT_ROOT\"]) / backend / use_case\n", " payload_dir.mkdir(parents=True, exist_ok=True)\n", " output_dir.mkdir(parents=True, exist_ok=True)\n", "\n", " prompt_path = asset_path(spec[\"prompt\"])\n", " negative_prompt = \"\"\n", " if spec[\"mode\"] != \"text2image\":\n", " negative_prompt_path = asset_path(f\"assets/negative_prompts/{spec['mode']}/neg_prompt.json\")\n", " negative_prompt = compact_json_file(negative_prompt_path)\n", " payload_path = payload_dir / f\"{use_case}.json\"\n", " payload = {\n", " \"model_mode\": spec[\"mode\"],\n", " \"name\": use_case,\n", " \"prompt\": compact_json_file(prompt_path),\n", " \"negative_prompt\": negative_prompt,\n", " \"enable_sound\": spec[\"enable_sound\"],\n", " **FIXED_SAMPLING,\n", " }\n", " if spec[\"mode\"] == \"image2video\":\n", " image_path = asset_path(spec[\"image\"])\n", " payload[\"vision_path\"] = os.path.relpath(image_path, payload_path.parent)\n", "\n", " payload_path.write_text(json.dumps(payload, indent=2) + \"\\n\")\n", "\n", " os.environ[f\"COSMOS3_{backend.upper()}_{use_case.upper()}_INPUT\"] = str(payload_path)\n", " os.environ[f\"COSMOS3_{backend.upper()}_{use_case.upper()}_OUTPUT\"] = str(output_dir)\n", "\n", " print(f\"model: {spec['model']}\")\n", " print(f\"payload: {payload_path}\")\n", " print(f\"output: {output_dir}\")\n", " print(f\"prompt: {prompt_path.relative_to(COSMOS_ROOT)}\")\n", " if \"vision_path\" in payload:\n", " image_display_path = resolve_payload_path(payload_path, payload[\"vision_path\"])\n", " print(f\"image: {image_display_path.relative_to(COSMOS_ROOT)}\")\n", " display(Image(filename=str(image_display_path), width=420))\n", " print(json.dumps({k: payload[k] for k in [\"model_mode\", \"name\", \"enable_sound\", \"num_steps\", \"guidance\", \"shift\", \"fps\", \"num_frames\", \"resolution\", \"aspect_ratio\", \"seed\"]}, indent=2))\n", " return payload_path, output_dir, spec[\"model\"]\n", "\n", "\n", "import json\n", "import gc\n", "import os\n", "import time\n", "from pathlib import Path\n", "\n", "import sys\n", "\n", "if \"COSMOS3_DIFFUSERS_VENV\" not in os.environ:\n", " raise RuntimeError(\"Run the Restore Environment cell after switching to the Diffusers kernel.\")\n", "expected_python = (Path(os.environ[\"COSMOS3_DIFFUSERS_VENV\"]) / \"bin\" / \"python\").resolve()\n", "if Path(sys.executable).resolve() != expected_python:\n", " raise RuntimeError(\"Switch the notebook kernel to 'Cosmos3 Diffusers (Python 3.13)' before running Diffusers cells.\")\n", "\n", "import torch\n", "from diffusers import Cosmos3OmniPipeline\n", "from diffusers import logging as diffusers_logging\n", "from diffusers.schedulers.scheduling_unipc_multistep import UniPCMultistepScheduler\n", "from diffusers.utils import encode_video, export_to_video, load_image\n", "\n", "MODEL_IDS = {\n", " \"Cosmos3-Nano\": \"nvidia/Cosmos3-Nano\",\n", " \"Cosmos3-Super\": \"nvidia/Cosmos3-Super\",\n", "}\n", "\n", "_pipe = None\n", "_pipe_model = None\n", "\n", "\n", "def resolve_model_id(model: str) -> str:\n", " return MODEL_IDS.get(model, model)\n", "\n", "\n", "def get_pipe(model: str) -> Cosmos3OmniPipeline:\n", " global _pipe, _pipe_model\n", " model_id = resolve_model_id(model)\n", " if _pipe is not None and _pipe_model == model_id:\n", " return _pipe\n", " if _pipe is not None:\n", " del _pipe\n", " _pipe = None\n", " gc.collect()\n", " if torch.cuda.is_available():\n", " torch.cuda.empty_cache()\n", " diffusers_logging.set_verbosity_info()\n", " print(f\"loading {model_id}...\")\n", " t0 = time.time()\n", " pipe = Cosmos3OmniPipeline.from_pretrained(\n", " model_id,\n", " torch_dtype=torch.bfloat16,\n", " safety_checker=None,\n", " enable_safety_checker=True,\n", " token=os.environ.get(\"HF_TOKEN\") or None,\n", " )\n", " pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config, flow_shift=FIXED_SAMPLING[\"shift\"])\n", " pipe.to(\"cuda\")\n", " _pipe = pipe\n", " _pipe_model = model_id\n", " print(f\"loaded pipeline in {time.time() - t0:.1f}s\")\n", " return _pipe\n", "\n", "\n", "def run_diffusers_payload(payload_path: Path, output_dir: str | Path, *, model: str) -> Path:\n", " payload_path = Path(payload_path)\n", " output_dir = Path(output_dir)\n", " output_dir.mkdir(parents=True, exist_ok=True)\n", " payload = json.loads(payload_path.read_text())\n", " height, width = payload_dimensions(payload)\n", " pipe = get_pipe(model)\n", " pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config, flow_shift=payload[\"shift\"])\n", " generator = torch.Generator(device=\"cuda\").manual_seed(payload[\"seed\"])\n", " image = load_image(str(resolve_payload_path(payload_path, payload[\"vision_path\"]))) if payload[\"model_mode\"] == \"image2video\" else None\n", " output_ext = \".png\" if payload[\"model_mode\"] == \"text2image\" else \".mp4\"\n", " output_path = output_dir / f\"{payload['name']}{output_ext}\"\n", "\n", " print(f\"generating {payload['model_mode']} {payload['name']} with {model} -> {output_path}\")\n", " t0 = time.time()\n", " if payload[\"model_mode\"] == \"text2image\":\n", " result = pipe(\n", " prompt=payload[\"prompt\"],\n", " negative_prompt=\"\",\n", " num_frames=1,\n", " height=height,\n", " width=width,\n", " num_inference_steps=payload[\"num_steps\"],\n", " guidance_scale=payload[\"guidance\"],\n", " add_resolution_template=False,\n", " add_duration_template=False,\n", " generator=generator,\n", " )\n", " print(f\"generated in {time.time() - t0:.1f}s\")\n", " result.video[0].save(output_path)\n", " print(f\"wrote {output_path}\")\n", " return output_path\n", "\n", " result = pipe(\n", " prompt=payload[\"prompt\"],\n", " negative_prompt=payload[\"negative_prompt\"],\n", " image=image,\n", " num_frames=payload[\"num_frames\"],\n", " height=height,\n", " width=width,\n", " fps=payload[\"fps\"],\n", " num_inference_steps=payload[\"num_steps\"],\n", " guidance_scale=payload[\"guidance\"],\n", " enable_sound=payload[\"enable_sound\"],\n", " add_resolution_template=False,\n", " add_duration_template=False,\n", " generator=generator,\n", " )\n", " print(f\"generated in {time.time() - t0:.1f}s\")\n", " if payload[\"enable_sound\"] and result.sound is not None:\n", " encode_video(\n", " result.video,\n", " fps=payload[\"fps\"],\n", " output_path=str(output_path),\n", " audio=result.sound,\n", " audio_sample_rate=pipe.sound_tokenizer.config.sampling_rate,\n", " )\n", " else:\n", " export_to_video(result.video, str(output_path), fps=payload[\"fps\"], macro_block_size=1)\n", " print(f\"wrote {output_path}\")\n", " return output_path\n", "\n", "\n", "import base64\n", "import html\n", "from pathlib import Path\n", "from IPython.display import HTML, display\n", "\n", "\n", "def display_video(path: Path, *, width: int = 720) -> None:\n", " data = base64.b64encode(path.read_bytes()).decode(\"ascii\")\n", " label = html.escape(str(path))\n", " markup = f\"\"\"\n", "\n", "
{label}
\n", "\"\"\"\n", " display(HTML(markup))\n", "\n", "\n", "def view_run(output_dir: str | Path) -> None:\n", " output_dir = Path(output_dir)\n", " videos = [\n", " path\n", " for path in sorted(output_dir.rglob(\"*.mp4\"))\n", " if not path.name.endswith((\"_preview.mp4\", \"_browser.mp4\"))\n", " ]\n", " images = sorted(output_dir.rglob(\"*.png\"))\n", " if not videos and not images:\n", " print(f\"No generated media found under {output_dir}\")\n", " return\n", " for src in videos:\n", " print(f\"source: {src} ({src.stat().st_size // 1024} KB)\")\n", " display_video(src)\n", " for src in images:\n", " print(f\"source: {src} ({src.stat().st_size // 1024} KB)\")\n", " display(Image(filename=str(src), width=720))\n", "" ], "execution_count": null, "outputs": [], "id": "77e18231" }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Use Cases\n", "\n", "Run each use case top-to-bottom: create the JSON payload, run inference, then view the generated media. Nano examples come first; Super examples are last.\n" ], "id": "d988d45b" }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Nano: Text to Image\n", "\n", "Nano text-to-image generation using a structured JSON prompt.\n", "\n", "### Create Payload" ] }, { "cell_type": "code", "metadata": {}, "source": [ "t2i_payload, t2i_output, t2i_model = create_payload(\"t2i\", backend=\"diffusers\")" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Run" ] }, { "cell_type": "code", "metadata": {}, "source": [ "run_diffusers_payload(t2i_payload, t2i_output, model=\"Cosmos3-Nano\")" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### View Results" ] }, { "cell_type": "code", "metadata": {}, "source": [ "view_run(t2i_output)" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Super: Text to Image\n", "\n", "Super text-to-image generation using the same structured JSON prompt.\n", "\n", "### Create Payload" ], "id": "2253bd65" }, { "cell_type": "code", "metadata": {}, "source": [ "t2i_super_payload, t2i_super_output, t2i_super_model = create_payload(\"t2i_super\", backend=\"diffusers\")" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Run" ] }, { "cell_type": "code", "metadata": {}, "source": [ "run_diffusers_payload(t2i_super_payload, t2i_super_output, model=\"Cosmos3-Super\")" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### View Results" ] }, { "cell_type": "code", "metadata": {}, "source": [ "view_run(t2i_super_output)" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Nano: Text to Video Without Audio\n", "\n", "Nano text-to-video generation with audio disabled.\n", "\n", "### Create Payload\n" ] }, { "cell_type": "code", "metadata": {}, "source": [ "t2v_nano_noaudio_payload, t2v_nano_noaudio_output, t2v_nano_noaudio_model = create_payload(\"t2v_nano_noaudio\", backend=\"diffusers\")\n" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Run\n" ] }, { "cell_type": "code", "metadata": {}, "source": [ "run_diffusers_payload(t2v_nano_noaudio_payload, t2v_nano_noaudio_output, model=\"Cosmos3-Nano\")\n" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### View Results\n" ] }, { "cell_type": "code", "metadata": {}, "source": [ "view_run(t2v_nano_noaudio_output)\n" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Nano: Text to Video with Audio\n", "\n", "Nano text-to-video generation with generated audio.\n", "\n", "### Create Payload\n" ], "id": "f98c6022" }, { "cell_type": "code", "metadata": {}, "source": [ "t2vs_payload, t2vs_output, t2vs_model = create_payload(\"t2vs\", backend=\"diffusers\")\n" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Run\n" ] }, { "cell_type": "code", "metadata": {}, "source": [ "run_diffusers_payload(t2vs_payload, t2vs_output, model=\"Cosmos3-Nano\")\n" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### View Results\n" ] }, { "cell_type": "code", "metadata": {}, "source": [ "view_run(t2vs_output)\n" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Nano: Image to Video Without Audio\n", "\n", "Nano image-to-video generation using its paired image asset, with audio disabled.\n", "\n", "### Create Payload\n" ], "id": "1fc31aa0" }, { "cell_type": "code", "metadata": {}, "source": [ "i2v_nano_noaudio_payload, i2v_nano_noaudio_output, i2v_nano_noaudio_model = create_payload(\"i2v_nano_noaudio\", backend=\"diffusers\")\n" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Run\n" ] }, { "cell_type": "code", "metadata": {}, "source": [ "run_diffusers_payload(i2v_nano_noaudio_payload, i2v_nano_noaudio_output, model=\"Cosmos3-Nano\")\n" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### View Results\n" ] }, { "cell_type": "code", "metadata": {}, "source": [ "view_run(i2v_nano_noaudio_output)\n" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Nano: Image to Video with Audio\n", "\n", "Nano image-to-video generation using its paired image asset and generated audio.\n", "\n", "### Create Payload\n" ] }, { "cell_type": "code", "metadata": {}, "source": [ "i2vs_payload, i2vs_output, i2vs_model = create_payload(\"i2vs\", backend=\"diffusers\")\n" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Run\n" ] }, { "cell_type": "code", "metadata": {}, "source": [ "run_diffusers_payload(i2vs_payload, i2vs_output, model=\"Cosmos3-Nano\")\n" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### View Results\n" ] }, { "cell_type": "code", "metadata": {}, "source": [ "view_run(i2vs_output)\n" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Super: Text to Video Without Audio\n", "\n", "Super text-to-video generation with audio disabled.\n", "\n", "### Create Payload\n" ], "id": "9f58eb8b" }, { "cell_type": "code", "metadata": {}, "source": [ "t2v_super_noaudio_payload, t2v_super_noaudio_output, t2v_super_noaudio_model = create_payload(\"t2v_super_noaudio\", backend=\"diffusers\")\n" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Run\n" ] }, { "cell_type": "code", "metadata": {}, "source": [ "run_diffusers_payload(t2v_super_noaudio_payload, t2v_super_noaudio_output, model=\"Cosmos3-Super\")\n" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### View Results\n" ] }, { "cell_type": "code", "metadata": {}, "source": [ "view_run(t2v_super_noaudio_output)\n" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Super: Image to Video Without Audio\n", "\n", "Super image-to-video generation using its paired image asset, with audio disabled.\n", "\n", "### Create Payload\n" ] }, { "cell_type": "code", "metadata": {}, "source": [ "i2v_super_noaudio_payload, i2v_super_noaudio_output, i2v_super_noaudio_model = create_payload(\"i2v_super_noaudio\", backend=\"diffusers\")\n" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Run\n" ] }, { "cell_type": "code", "metadata": {}, "source": [ "run_diffusers_payload(i2v_super_noaudio_payload, i2v_super_noaudio_output, model=\"Cosmos3-Super\")\n" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### View Results\n" ] }, { "cell_type": "code", "metadata": {}, "source": [ "view_run(i2v_super_noaudio_output)\n" ], "execution_count": null, "outputs": [] } ], "metadata": { "kernelspec": { "display_name": "base", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.7" } }, "nbformat": 4, "nbformat_minor": 5 }