Star 历史趋势
数据来源: GitHub API · 生成自 Stargazers.cn
README.md

FLOW KIT

License: MIT Python 3.10+ Chrome MV3 FastAPI ffmpeg Documentation GitHub stars GitHub issues DeepWiki


☕ Sponsor this project

Vietnam QR — MoMo / VietQR / napas247
📱 Vietnam
MoMo · VietQR · napas247
Binance Pay QR — Cris Ng
💰 Binance Pay
Crypto / cross-border

🌍 International (card): Ko-fi

(yes — I moved this up here on purpose. Was afraid nobody scrolls past the badges 😅)


FLOW KIT

Standalone system to generate AI videos via Google Flow API. Uses a Chrome extension as browser bridge for authentication, reCAPTCHA solving, and API proxying.

Showcase

All outputs below were generated end-to-end by this system — from story concept to final YouTube-ready video with thumbnails, narration, and branding.

Generated YouTube Thumbnails

Hormuz Strait naval blockade thumbnail F-15E pilot rescue thumbnail

Operation Absolute Resolve thumbnail Tapalpa cartel operation thumbnail

North Korea defection thumbnail Iran vs Israel conflict thumbnail

Visual Consistency Across Scenes

The reference image system keeps characters consistent across an entire video. Each character is generated once as a reference, then the AI uses that reference in every scene — maintaining the same face, clothing, and features.

Doctor character — same face, glasses, white coat across 4 different scenes:

Doctor in surgery Doctor in operating theater Doctor interview — gesturing Doctor interview — smiling

Defector character — same face across ICU, hospital, interview, and Seoul streets:

Defector in ICU Defector in hospital with nurse Defector interview Defector walking Seoul streets

All frames from a single 50-scene project. Both characters maintain consistent appearance across completely different settings and lighting conditions — powered by the reference image system.

F-15E Rescue — Full Story Arc (25 scenes)

Scene 1: Strategic map overview Scene 3: Pilot walks from F-15E Scene 6: F-15E formation refueling

Scene 10: F-15E hit at night Scene 15: CSAR command center alert Scene 20: Pilot surviving in mountains

Strategic briefing → pilot departure → formation flight → aircraft hit → CSAR alert → pilot survival.

Hormuz Strait — Naval Scenes

Iranian patrol boats in formation US Navy commander on bridge

CIWS engagement at sea Warship sailing into sunset

What the Pipeline Produces

Each project goes through: story → entities → reference images → scene images → 8s video clips → narration (TTS) → concat → thumbnails → YouTube upload — all orchestrated via API or AI agent skills.

OutputDescription
Reference imagesOne per character/location/prop — maintains visual consistency
Scene imagesComposed using all referenced entities
8-second video clipsGenerated from scene images with camera motion + sound effects
4K upscaleOptional upscale to 4K resolution
Narrator TTSVoice-cloned narration per scene
Final videoAll clips concatenated, trimmed to narrator timing
ThumbnailsYouTube-optimized with text overlays + branding
YouTube metadataSEO-optimized title, description, tags, hashtags

Chrome Extension — Live Dashboard

Chrome extension showing request log, video generation progress, and Google Flow interface

The Chrome extension runs alongside Google Flow — showing real-time request log (614 total, 328 success), video generation progress, and token status. The Python agent communicates with the extension via WebSocket to automate all API calls.

Architecture

┌──────────────────┐     WebSocket      ┌──────────────────────┐
│  Python Agent    │◄──────────────────►│  Chrome Extension     │
│  (FastAPI+SQLite)│     localhost:9222  │  (MV3 Service Worker) │
│                  │                    │                       │
│  - REST API :8100│  ── commands ──►   │  - Token capture      │
│  - Queue worker  │  ◄── results ──    │  - reCAPTCHA solve    │
│  - Post-process  │                    │  - API proxy          │
│  - SQLite DB     │                    │  (on labs.google)     │
└──────────────────┘                    └──────────────────────┘

Quick Start

One-command setup

./setup.sh

This checks and installs: Python 3.10+, pip, ffmpeg, ffprobe, Chrome, creates venv, installs dependencies, verifies imports.

Windows: Use WSL (wsl --install) or Git Bash. All bash scripts and commands assume a Unix shell.

Manual setup

# Prerequisites: Python 3.10+, ffmpeg, Chrome pip install -r requirements.txt

Run

# 1. Load Chrome extension: chrome://extensions → Developer mode → Load unpacked → extension/ # 2. Open https://labs.google/fx/tools/flow and sign in # 3. Start agent source venv/bin/activate # if using setup.sh python -m agent.main # 4. Verify curl http://127.0.0.1:8100/health # {"status":"ok","extension_connected":true}

End-to-End Example: "Pippip the Fish Merchant"

A chubby cat sells fish at a market. 3 scenes, vertical, Pixar 3D style.

How it works (read this first)

The system uses reference images to keep visuals consistent across scenes. Here's the mental model:

1. Identify every visual element that should look the same across scenes:

  • Characters → entity_type: "character" (portrait reference)
  • Places → entity_type: "location" (landscape reference)
  • Important objects → entity_type: "visual_asset" (detail reference)

2. Describe ONLY appearance in the entity description — this generates the reference image:

  • "Chubby orange tabby cat with blue apron, straw hat" (what it looks like)

3. Write scene prompts as ACTION — reference entities by name, describe what they DO:

  • "Pippip stands behind Fish Stall, arranging fish..." (what happens)
  • NOT: "A chubby orange tabby cat wearing a blue apron stands behind a wooden stall..." (don't repeat appearance)

4. List all entities that appear in each scene's character_names array — their reference images get passed to the AI as visual input, ensuring consistency.

Story idea
    ↓
Break into visual elements → characters[] array with entity_type + description
    ↓
Write scene prompts using entity NAMES → character_names lists which refs to use
    ↓
System generates ref image per entity → then composes scenes using those refs

Using Skills (recommended)

Skills handle all the API calls, polling, and verification automatically. Use with Claude Code (/fk-command) or follow the recipe in skills/*.md for any AI agent.

/fk-create-project             ← interactive: asks story, creates entities + scenes
/fk-gen-refs <project_id>      ← generates all reference images, verifies UUIDs
/fk-gen-images <pid> <vid>     ← generates scene images with all refs applied
/fk-gen-videos <pid> <vid>     ← generates videos (2-5 min each, polls automatically)
/fk-concat <vid>               ← downloads + merges into final video
/fk-status <pid>               ← dashboard: what's done, what's next

Full pipeline in 5 commands. Each skill pre-checks dependencies (e.g. /fk-gen-images verifies all refs exist first).

Manual API (step by step)

Click to expand raw curl commands

Step 1: Create project with reference entities

From the story, identify every visual element that repeats across scenes:

Elemententity_typedescription (appearance only)
PippipcharacterChubby orange tabby cat, big green eyes, blue apron, straw hat
Fish StalllocationRustic wooden stall, thatched roof, ice display
Open MarketlocationSoutheast Asian market, colorful awnings, lanterns
Golden Fishvisual_assetGolden koi, shimmering scales, magical glow
curl -X POST http://127.0.0.1:8100/api/projects \ -H "Content-Type: application/json" \ -d '{ "name": "Pippip the Fish Merchant", "story": "Pippip is a chubby orange tabby cat who sells fish at a Southeast Asian open market. Scene 1: Morning setup. Scene 2: Staring at the golden fish. Scene 3: Eating the last fish at sunset.", "characters": [ {"name": "Pippip", "entity_type": "character", "description": "Chubby orange tabby cat with big green eyes, blue apron, straw hat. Walks upright. Pixar-style 3D."}, {"name": "Fish Stall", "entity_type": "location", "description": "Small rustic wooden market stall with thatched bamboo roof, crushed ice display, hanging brass scale."}, {"name": "Open Market", "entity_type": "location", "description": "Bustling Southeast Asian open-air market with colorful awnings, hanging lanterns, stone walkway."}, {"name": "Golden Fish", "entity_type": "visual_asset", "description": "Magnificent golden koi fish with shimmering iridescent scales, elegant fins, slight magical glow."} ] }' # Save project_id from response

Step 2: Create video + scenes

Scene prompts reference entities by name (not description). character_names lists which reference images to apply.

# Create video curl -X POST http://127.0.0.1:8100/api/videos \ -H "Content-Type: application/json" \ -d '{"project_id": "<PID>", "title": "Pippip Episode 1"}' # Scene 1 (ROOT) — Pippip + Fish Stall + Open Market appear curl -X POST http://127.0.0.1:8100/api/scenes \ -H "Content-Type: application/json" \ -d '{ "video_id": "<VID>", "display_order": 0, "prompt": "Pippip stands behind Fish Stall, arranging fresh fish on ice. Sunrise, golden light in Open Market. Pixar 3D.", "character_names": ["Pippip", "Fish Stall", "Open Market"], "chain_type": "ROOT" }' # Scene 2 (CONTINUATION) — Golden Fish now appears curl -X POST http://127.0.0.1:8100/api/scenes \ -H "Content-Type: application/json" \ -d '{ "video_id": "<VID>", "display_order": 1, "prompt": "Pippip leans over Fish Stall, staring at Golden Fish on empty ice. Drooling. Open Market dark behind. Pixar 3D.", "character_names": ["Pippip", "Fish Stall", "Golden Fish", "Open Market"], "chain_type": "CONTINUATION", "parent_scene_id": "<scene-1-id>" }' # Scene 3 (CONTINUATION) curl -X POST http://127.0.0.1:8100/api/scenes \ -H "Content-Type: application/json" \ -d '{ "video_id": "<VID>", "display_order": 2, "prompt": "Pippip sits on stool at Fish Stall eating Golden Fish with chopsticks. SOLD OUT sign. Open Market sunset. Pixar 3D.", "character_names": ["Pippip", "Fish Stall", "Golden Fish", "Open Market"], "chain_type": "CONTINUATION", "parent_scene_id": "<scene-2-id>" }'

Step 3-6: Generate refs → images → videos → concat

# Step 3: Generate reference images (one per entity, wait for each) curl -X POST http://127.0.0.1:8100/api/requests \ -d '{"type": "GENERATE_CHARACTER_IMAGE", "character_id": "<CID>", "project_id": "<PID>"}' # Poll: GET /api/requests/<RID> until status=COMPLETED # Repeat for each entity. Verify all have UUID media_id. # Step 4: Generate scene images curl -X POST http://127.0.0.1:8100/api/requests \ -d '{"type": "GENERATE_IMAGE", "scene_id": "<SID>", "project_id": "<PID>", "video_id": "<VID>", "orientation": "VERTICAL"}' # Worker blocks if any ref is missing media_id # Step 5: Generate videos (2-5 min each) curl -X POST http://127.0.0.1:8100/api/requests \ -d '{"type": "GENERATE_VIDEO", "scene_id": "<SID>", "project_id": "<PID>", "video_id": "<VID>", "orientation": "VERTICAL"}' # Step 6: Download + concat curl -s "http://127.0.0.1:8100/api/scenes?video_id=<VID>" # get video URLs # Download each, normalize with ffmpeg, concat

Core Concepts

Reference Image System

Every visual element that should stay consistent gets a reference image — characters, locations, props. Each reference has a UUID media_id used in all scene generations via imageInputs.

Entity TypeAspect RatioComposition
characterPortraitFull body head-to-toe, front-facing, centered
locationLandscapeEstablishing shot, level horizon, atmospheric
creaturePortraitFull body, natural stance, distinctive features
visual_assetPortraitDetailed view, textures, scale reference

Scene Prompts = Action Only

Scene prompts describe what happens, not character appearance. The reference images maintain visual consistency.

DO:   "Pippip juggling fish at Fish Stall, crowd watching in Open Market"
DON'T: "Pippip the chubby orange tabby cat wearing a blue apron juggling..."

Media ID = UUID

All media_id values are UUID format (xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx). Never the base64 CAMS... mediaGenerationId.

Two Prompts per Scene

Each scene has two separate prompts:

  • prompt — describes the still image (frame 0): "Luna steps out of rocket onto candy planet. Wide shot, sunrise."
  • video_prompt — describes the 8s video motion with sub-clip timing and camera directions:
0-3s: Wide crane down, Luna steps out of rocket onto Candy Planet Surface. Luna gasps "It's beautiful!"
3-6s: Low angle tracking shot, Luna walks across candy ground, shallow DOF. Luna says "Everything is made of candy."
6-8s: Close-up Luna's face, eyes wide with wonder, golden hour backlight. Silence, ambient wind.

Character Voice

Characters can have a voice_description (max ~30 words) for voice consistency:

{"name": "Luna", "entity_type": "character", "description": "Small white cat...", "voice_description": "Soft curious childlike voice with wonder and slight purring"}

Voice descriptions are auto-appended to video prompts before generation.

No Background Music

The worker auto-appends "No background music. Keep only natural sound effects and ambient sounds." to all video prompts. Sound effects from the scene (footsteps, splashing, wind) are preserved.

Pipeline Overview

1. Create project      POST /api/projects (with entities + story)
2. Create video        POST /api/videos
3. Create scenes       POST /api/scenes (chain_type: ROOT → CONTINUATION)
4. Gen ref images      POST /api/requests {type: GENERATE_CHARACTER_IMAGE} per entity
   → Wait ALL complete, verify all have UUID media_id
5. Gen scene images    POST /api/requests {type: GENERATE_IMAGE} per scene
   → Wait ALL complete
6. Gen videos          POST /api/requests {type: GENERATE_VIDEO} per scene
   → Wait ALL complete (2-5 min each)
7. (Optional) Upscale  POST /api/requests {type: UPSCALE_VIDEO} (TIER_TWO only)
8. Download + concat   ffmpeg normalize + concat

Skills (AI Agent Workflows)

Ready-to-use workflow recipes in skills/ (also available as /slash-commands in Claude Code):

Basic Pipeline

SkillDescription
/fk-create-projectCreate project + entities + video + scenes interactively
/fk-gen-refsGenerate reference images for all entities
/fk-gen-imagesGenerate scene images with character refs
/fk-gen-videosGenerate videos from scene images
/fk-concatDownload + merge all scene videos

Advanced Video

SkillDescription
/fk-gen-chain-videosAuto start+end frame chaining for smooth transitions (i2v_fl)
/fk-insert-sceneMulti-angle shots, cutaways, close-ups within a chain
/fk-creative-mixAnalyze story + suggest all techniques (chain, insert, r2v, parallel)

Reference

SkillDescription
/fk-camera-guideCamera angles, movements, lighting, DOF for cinematic video prompts

TTS & Narration

SkillDescription
/fk-gen-tts-templateCreate a voice template for consistent narration
/fk-gen-narratorGenerate narrator text + TTS for all scenes
/fk-gen-text-overlaysGenerate text overlays from narrator text (dates, locations, stats)
/fk-concat-fit-narratorTrim scene videos to fit narrator duration, then concat

YouTube

SkillDescription
/fk-youtube-seoGenerate SEO-optimized title, description, tags
/fk-brand-logoApply channel icon watermark to video/thumbnails
/fk-youtube-uploadUpload to YouTube with rule validation + scheduling
/fk-thumbnailGenerate YouTube-optimized thumbnails

Utilities

SkillDescription
/fk-statusFull project dashboard + recommended next action
/fk-fix-uuidsRepair any CAMS... media_ids to UUID format
/fk-add-materialImage material system

AI CLI Compatibility

Skills work with any AI CLI that can read files:

CLIInstructionsHow skills work
Claude CodeCLAUDE.md (auto-loaded)Native /fk: slash commands
Codex CLIAGENTS.md → reads CLAUDE.mdUser says /fk:<name>, agent reads skills/fk:<name>.md
Gemini CLIGEMINI.md → reads CLAUDE.mdSame pattern

Video Generation Techniques

TechniqueAPI TypeUse Case
i2vGENERATE_VIDEOImage → video (standard)
i2v_flGENERATE_VIDEO + endImageStart+end frame → smooth scene transitions
r2vGENERATE_VIDEO_REFSReference images → video (intros, dream sequences)
UpscaleUPSCALE_VIDEOVideo → 4K (TIER_TWO only)

API Reference

CRUD Endpoints

ResourceCreateListGetUpdateDelete
ProjectPOST /api/projectsGET /api/projectsGET /api/projects/{id}PATCH /api/projects/{id}DELETE /api/projects/{id}
CharacterPOST /api/charactersGET /api/charactersGET /api/characters/{id}PATCH /api/characters/{id}DELETE /api/characters/{id}
VideoPOST /api/videosGET /api/videos?project_id=GET /api/videos/{id}PATCH /api/videos/{id}DELETE /api/videos/{id}
ScenePOST /api/scenesGET /api/scenes?video_id=GET /api/scenes/{id}PATCH /api/scenes/{id}DELETE /api/scenes/{id}
RequestPOST /api/requestsGET /api/requestsGET /api/requests/{id}PATCH /api/requests/{id}

Special Endpoints

EndpointDescription
GET /healthServer + extension status
GET /api/flow/statusExtension connection details
GET /api/flow/creditsUser credits + tier
GET /api/requests/pendingPending request queue
GET /api/projects/{id}/charactersEntities linked to project

Request Types

TypeRequired FieldsAsync?reCAPTCHA?
GENERATE_CHARACTER_IMAGEcharacter_id, project_idNoYes
GENERATE_IMAGEscene_id, project_id, video_id, orientationNoYes
GENERATE_VIDEOscene_id, project_id, video_id, orientationYesYes
GENERATE_VIDEO_REFSscene_id, project_id, video_id, orientationYesYes
UPSCALE_VIDEOscene_id, project_id, video_id, orientationYesYes

Worker Behavior

  • Server handles throttling — worker enforces max 5 concurrent + 10s cooldown automatically. Use POST /api/requests/batch to submit all at once; do NOT manually batch.
  • 10s cooldown between API calls (anti-spam, configurable via API_COOLDOWN)
  • Reference blocking — scene image gen refuses if any referenced entity is missing media_id
  • Skip completed — won't re-generate already-completed assets
  • Cascade clear — regenerating image auto-resets downstream video + upscale
  • Retry — failed requests retry up to 5 times
  • UUID enforcement — extracts UUID from fifeUrl if response doesn't provide it directly
  • Voice context — auto-appends character voice_description to video prompts
  • No background music — auto-appends "no background music, keep sound effects" to all video prompts
  • Dual video response schema — Lite/Fast/Ultra models return operations[] and stream URLs; Low Priority models (veo_3_1_*_low_priority, *_ultra_relaxed) return workflows + media with the MP4 inline as base64. The SDK auto-detects, validates the ftyp magic, and saves the binary to output/_workflow_videos/{media_id}.mp4. The scene's _video_url is then a file:// path which curl and ffmpeg handle natively. _video_media_id always stores the real Flow media UUID, so upscale works for both schemas.

Default Model & Tier Compatibility

The default for PAYGATE_TIER_TWO frame_2_video and start_end_frame_2_video is veo_3_1_i2v_lite_low_priority — the TRUE 0-credit Low Priority that works on every service tier including SERVICE_TIER_ADVANCED.

The *_ultra_relaxed family (Low Priority ultra-quality) silently returns empty operations on SERVICE_TIER_ADVANCED accounts because Google requires SERVICE_TIER_ULTRA for that path. ULTRA-tier users can switch back via /fk-change-model — see skills/fk-change-model.md for the full preset list and tier compatibility matrix.

Material System

Every project must have a material field that controls the visual style of generated images. Set it at project creation.

# List available materials curl -s http://127.0.0.1:8100/api/materials # Set on project curl -X POST http://127.0.0.1:8100/api/projects \ -d '{"name": "...", "material": "3d_pixar", ...}'

Materials control both entity image_prompt style and scene scene_prefix. Examples: realistic, 3d_pixar, anime, stop_motion, minecraft, oil_painting.

Configuration

VariableDefaultDescription
API_HOST127.0.0.1REST API bind address
API_PORT8100REST API port
WS_HOST127.0.0.1WebSocket server bind
WS_PORT9222WebSocket server port
POLL_INTERVAL5Worker poll interval (seconds)
MAX_RETRIES5Max retries per request
VIDEO_POLL_TIMEOUT420Video gen poll timeout (seconds)
API_COOLDOWN10Seconds between API calls (anti-spam)

Architecture

agent/
├── main.py              # FastAPI app + WebSocket server
├── config.py            # Configuration (loads models.json)
├── models.json          # Video/upscale/image model mappings
├── db/
│   ├── schema.py        # SQLite schema (aiosqlite)
│   └── crud.py          # Async CRUD with column whitelisting
├── models/              # Pydantic models + Literal enums
├── api/                 # REST routes (projects, videos, scenes, characters, requests, flow)
├── services/
│   ├── flow_client.py   # WS bridge to extension
│   ├── headers.py       # Randomized browser headers
│   ├── tts.py           # OmniVoice TTS (subprocess-based)
│   ├── scene_chain.py   # Continuation scene logic
│   └── post_process.py  # ffmpeg trim/merge/music
└── worker/
    └── processor.py     # Queue processor + poller

extension/               # Chrome MV3 extension
skills/                  # AI agent workflow recipes (CLI-agnostic)
youtube/
├── auth.py              # OAuth2 multi-channel auth
├── upload.py            # Upload with scheduling + rule validation
└── channels/            # Per-channel config (gitignored)
    └── <channel_name>/
        ├── client_secrets.json  # OAuth2 credentials
        ├── token.json           # Auth token (auto-created)
        ├── channel_rules.json   # Upload rules + SEO defaults
        └── upload_history.json  # Upload log
CLAUDE.md                # AI agent instructions (Claude Code)
AGENTS.md                # AI agent instructions (Codex CLI)
GEMINI.md                # AI agent instructions (Gemini CLI)

TTS Narration (OmniVoice)

Optional narrator voice for scenes. Uses OmniVoice — multilingual zero-shot TTS with voice cloning (600+ languages).

Setup

See skills/fk-gen-tts-template.md for full install guide. Quick version:

pip install torch==2.8.0 torchaudio==2.8.0 # or +cu128 for NVIDIA pip install omnivoice python3 -c "from omnivoice import OmniVoice; print('OK')"

If OmniVoice is in a separate venv, point to it:

export TTS_PYTHON_BIN=/path/to/omnivoice-venv/bin/python3

Workflow

  1. Create voice template/fk-gen-tts-template — generates an anchor voice WAV
  2. Add narrator text to scenes — PATCH /api/scenes/{id} with narrator_text
  3. Generate narration/fk-gen-narrator — voice-clones the template for each scene
  4. Concat with narration/fk-concat-fit-narrator — trims scene videos to match TTS duration

CPU-only recommended (MPS produces artifacts). ~15-30s per scene.

YouTube Upload Pipeline

Automated upload with per-channel rules, SEO optimization, and brand watermarking.

Setup

# 1. Place OAuth credentials cp client_secrets.json youtube/channels/<channel_name>/ # 2. Authenticate (opens browser) python3 youtube/auth.py <channel_name> # Linux / Windows (WSL) arch -arm64 python3 youtube/auth.py <channel_name> # macOS Apple Silicon # 3. Token saved to youtube/channels/<channel_name>/token.json (auto-refreshes)

Channel Rules (channel_rules.json)

Each channel has a rules file controlling upload scheduling and SEO:

{ "shorts": {"max_per_day": 3, "optimal_times": ["07:00", "12:00", "17:00"]}, "long_form": {"max_per_day": 1, "optimal_times": ["19:00"]}, "scheduling": {"min_gap_hours": 4, "avoid_hours": [0,1,2,3,4,5]}, "seo": {"niche": "...", "default_tags": [...], "title_max_chars": 65} }

Skill Chain

/fk-youtube-seo    → generates title, description, hashtags, tags
/fk-brand-logo     → applies channel icon watermark
/fk-youtube-upload  → validates rules + uploads (auto-detects Short vs Long-form)

Upload validation checks: max per day, min gap between uploads, avoid dead hours. Auto-detects Short (<61s + vertical 9:16) vs Long-form.

Error Handling

Errors can originate from four layers — Google Flow backend, Chrome extension, FastAPI layer, and the worker itself. The worker's _handle_failure (agent/worker/processor.py:414-481) routes recovery by error_message string content, not HTTP status, because Flow lumps many distinct failures under HTTP 400 with varying details.reason values.

Flow-Native Structured Errors

These arrive in the response body as data.error.details[].reason. The worker appends the reason to error_message as "<msg> [<reason>]".

Reason stringMeaningAuto-handling
PUBLIC_ERROR_UNSAFE_GENERATIONPrompt tripped safety filter (people, violence, nudity)Mark FAILED — rewrite prompt (use alias names, remove triggers)
PUBLIC_ERROR_USER_QUOTA_REACHEDDaily credits exhaustedMark FAILED — wait for reset or upgrade tier
PUBLIC_ERROR_MODEL_ACCESS_DENIEDTier mismatch (e.g. TIER_ONE trying Veo 3 / upscale)Mark FAILED — auto-detect should downgrade to allowed model
Requested entity was not foundUploaded media_id expired (~1h TTL)Auto-recover via _recover_entity_not_found — re-uploads from image_url, re-queues PENDING
Internal error encounteredFlow backend transient 500Exponential backoff retry: 2^retry * 10s, capped 300s
reCAPTCHA failed / captchaExtension couldn't solve CAPTCHARetry up to 10× without incrementing retry_count (processor.py:454-464)
PUBLIC_ERROR_UNUSUAL_ACTIVITY (403, message reCAPTCHA evaluation failed)Google flagged the session as bot-like — usually rapid bursts of submits, VPN/shared IP, or stale auth cookiesNOT auto-recoverable. Pause submits, clear cookies for google.com + labs.google in Chrome, sign back in at labs.google/fx/tools/flow, then resubmit with ≥1s gap and ≤5 concurrent. See /fk-doctor for full playbook.

HTTP Status Codes

StatusSourceMeaningHandling
400Flow APIInvalid payload, UNSAFE_GENERATION, entity not found (sometimes)Route by details.reason — some are auto-recoverable, others terminal
401Flow APIBearer token expiredExtension re-captures token from labs.google tab; request retries
403Extension (background.js:432)CAPTCHA_FAILED, NO_FLOW_TAB, or MODEL_ACCESS_DENIEDCAPTCHA → retry loop; NO_FLOW_TAB → fail (user must open Flow); tier → fail
404Flow APImedia_id not found (expired upload)Same as "Requested entity was not found" — auto re-upload
429Flow APIRate limited / quotaBack off + retry; if USER_QUOTA_REACHED appears, fail
500Flow backend or extension fetch exception (background.js:504)Transient server error OR network drop during fetchRetry with exponential backoff
502FastAPI default (agent/api/flow.py:80,92)Extension returned error without explicit statusRetry; check extension health
503FastAPI (api/flow.py)"Extension not connected" or NO_FLOW_KEYWorker waits for reconnect — status set to PENDING, not FAILED
504Agent60s timeout waiting for extension WS responseTreated as transient; re-queue PENDING

Status-code detection logic lives in agent/worker/_parsing.py:_is_error — a result is an error if result.error is set, status >= 400, or data.error is present.

Extension / Transport Errors

String patterns in error_message that the worker recognizes:

Error message containsCauseHandling
Extension not connectedChrome extension offline or WS dropped503 returned; worker re-queues PENDING and waits
extension reconnected / extension disconnectedWS bounce mid-requestRe-queue PENDING without incrementing retry_count
extension_switchedUser switched Flow tabs mid-generationRe-queue PENDING
NO_FLOW_KEYExtension has no captured bearer tokenUser must open labs.google/fx/tools/flow and sign in
NO_FLOW_TABNo Google Flow tab available for reCAPTCHAUser must open a Flow tab
Failed to fetchNetwork drop inside extension service workerRetry with backoff
timeout / WS 60s no responseExtension hung mid-requestRe-queue PENDING

Worker Retry Policy

processor.py:_handle_failure decides terminal vs retryable:

  1. Auto-recover if message contains "not found" → re-upload media, mark PENDING.
  2. Transient WS (reconnected/disconnected/switched) → re-queue PENDING, keep retry_count.
  3. CAPTCHA → retry up to 10× without counting toward MAX_RETRIES.
  4. Default → increment retry_count; if < MAX_RETRIES (5), schedule retry with 2^retry * 10s backoff (capped 300s). Otherwise mark FAILED.

YouTube Upload Errors

From youtube/upload.py (HTTP errors from YouTube Data API v3):

ErrorCauseFix
invalidTags (400)Tags exceed 500-char limit (incl. quote overhead: spaces → +2 per tag)Trim tags; validate with sum(len(t) + (2 if ' ' in t else 0) for t in tags) + (len(tags)-1) <= 500
invalidCategoryId (400)Unknown categoryUse "22" (People & Blogs) or "24" (Entertainment)
quotaExceeded (403)Daily 10K quota exhausted (uploads cost 1600)Wait 24h (Pacific midnight reset)
uploadLimitExceeded (400)Channel daily upload cap hitWait 24h or use different channel
invalid_grant (auth)Token revoked or expiredRe-run python3 youtube/auth.py <channel>
scheduledPublishTimeInPastpublishAt <= nowUse auto_schedule() or bump to next day

Common Symptoms → Fix

ProblemSolution
Extension shows "Agent disconnected"Start python -m agent.main
Extension shows "No token"Open labs.google/fx/tools/flow and sign in
CAPTCHA_FAILED: NO_FLOW_TABOpen a Google Flow tab
403 MODEL_ACCESS_DENIEDTier mismatch — check /api/flow/credits, downgrade model in models.json
403 PUBLIC_ERROR_UNUSUAL_ACTIVITY / reCAPTCHA evaluation failedPause submits, clear cookies for google.com + labs.google in Chrome, sign back in, then resubmit with ≥1s gap and ≤5 concurrent. Switch network or wait 1–6 h if still blocked
Scene images inconsistentCheck all refs have UUID media_id — run /fk-fix-uuids
media_id starts with CAMS...Run /fk-fix-uuids to extract UUID from URL
Upscale "permission denied"Requires PAYGATE_TIER_TWO account
Request stuck in PROCESSINGCheck error_message history; if extension dropped, restart extension
"Requested entity was not found" spamImage URLs expired — re-upload via POST /api/upload-image or wait for auto-recovery
YouTube upload invalidTagsTag-char overflow; reduce tags (quote overhead bytes count)
Python cryptography arch mismatchUse python3.10, not python3.13 (x86/arm64 binary mismatch)

License

MIT


Community & Support

Join the FlowKit & Flowboard Facebook Group

The shared community for both FlowKit and Flowboard. Drop in to:

  • Post the story-video runs and thumbnails you've generated
  • Share scene templates, prompt recipes, and reference-image setups
  • Ask for help when an output isn't matching what you imagined
  • Request features and report bugs you've hit in the wild
  • Trade tips on Google Flow plan limits, Veo i2v behaviour, and Chrome extension setup

facebook.com/groups/flowkit.flowboard.community

关于 About

AI agent can create video content better than you. No reason why you do it manually in google flow!

语言 Languages

Python71.3%
HTML9.7%
TypeScript9.3%
JavaScript7.4%
Shell2.2%
CSS0.1%

提交活跃度 Commit Activity

代码提交热力图
过去 52 周的开发活跃度
161
Total Commits
峰值: 94次/周
Less
More

核心贡献者 Contributors