Star ๅކๅฒ่ถ‹ๅŠฟ
ๆ•ฐๆฎๆฅๆบ: GitHub API ยท ็”Ÿๆˆ่‡ช Stargazers.cn
README.md

๐ŸŽฌ Pilipili-AutoVideo ยท ๅ™ผๅ“ฉๅ™ผๅ“ฉ

Fully Automated AI Video Agent ยท Local Deployment ยท One Sentence to Final Cut

Python FastAPI React License Tests

็ฎ€ไฝ“ไธญๆ–‡ ยท English ยท ็น้ซ”ไธญๆ–‡ ยท ๆ—ฅๆœฌ่ชž ยท ํ•œ๊ตญ์–ด


๐Ÿ“น Demo โ€” Replace this line with a GIF or video recording of the full workflow: topic input โ†’ scene review โ†’ final video output. docs/demo.gif (to be recorded โ€” see Contributing)


๐Ÿ“– Overview

Pilipili-AutoVideo (ๅ™ผๅ“ฉๅ™ผๅ“ฉ) is a fully local, end-to-end AI video agent. Describe your video in one sentence โ€” the system automatically handles script planning โ†’ keyframe image generation โ†’ TTS voiceover โ†’ video clip generation โ†’ FFmpeg assembly โ†’ subtitle burning, delivering a complete MP4 with subtitles and a CapCut/JianYing draft project for final human touch-ups.

Key differentiators from similar tools (LibTV, Huobao Drama):

  • Absolute Audio-Video Sync: TTS voiceover is generated first and its exact millisecond duration is measured, then used to control video duration โ€” audio and video are always perfectly aligned
  • Keyframe Lock Strategy: Nano Banana generates a 4K keyframe image first, then Image-to-Video (I2V) produces the clip โ€” ensuring consistently high visual quality with no subject drift
  • Digital Twin Memory: Mem0-powered memory system learns your style preferences over time, injecting your creative habits into every new generation
  • Skill Integration: The entire workflow is packaged as a standard Skill, callable by any AI Agent

๐ŸŽฏ Core Features

  • ๐Ÿค– Natural Language Driven: One sentence โ†’ full video, no manual node operations required
  • ๐ŸŽจ Premium Visual Quality: Nano Banana keyframe lock + Kling 3.0 / Seedance 1.5 dual-engine, exceptional subject consistency
  • ๐Ÿ”Š Perfect Audio-Video Sync: Measure voiceover duration first, control video duration accordingly โ€” never misaligned
  • โœ‚๏ธ CapCut/JianYing Draft Export: AI handles 90%, you fine-tune the last 10% in CapCut
  • ๐Ÿง  Gets Smarter Over Time: Mem0 memory system learns your aesthetic preferences with every project
  • ๐Ÿ”Œ Agent-Callable: Packaged as a standard Skill, seamlessly integrates into larger automation workflows

๐Ÿ› ๏ธ Architecture

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                  Pilipili-AutoVideo Architecture             โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚  Frontend    React 19 + TailwindCSS ยท 3-panel Studio ยท WS   โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚  API Layer   FastAPI ยท WebSocket ยท REST ยท LangGraph Workflow โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚  Brain Layer โ”‚  Vision Layerโ”‚  Motion Layerโ”‚  Voice Layer   โ”‚
โ”‚  DeepSeek    โ”‚  Nano Banana โ”‚  Kling 3.0   โ”‚  MiniMax TTS   โ”‚
โ”‚  Kimi        โ”‚  (Gemini 3   โ”‚  Seedance    โ”‚  Speech 2.8 HD โ”‚
โ”‚  MiniMax LLM โ”‚   Pro Image) โ”‚  1.5 Pro     โ”‚                โ”‚
โ”‚  Gemini      โ”‚              โ”‚              โ”‚                โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚  Assembly    Python + FFmpeg ยท xfade transitions ยท WhisperX  โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚  Draft Layer pyJianYingDraft ยท Auto CapCut/JianYing Draft    โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚  Memory      Mem0 ยท Local SQLite ยท Style Preference Twin     โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
LayerTechnologyDescription
Brain (LLM)DeepSeek / Kimi / MiniMax / GeminiScript generation, scene breakdown, metadata
Vision (Image)Nano Banana (Gemini 3 Pro Image)4K keyframe lock, subject consistency foundation
Motion (Video)Kling 3.0 / Seedance 1.5 ProDual-engine smart routing, I2V generation
Voice (TTS)MiniMax Speech 2.8 HDBest-in-class Chinese TTS, voice cloning support
AssemblyPython + FFmpeg + WhisperXxfade transitions + subtitle burning + audio mix
DraftpyJianYingDraftAuto-generate CapCut/JianYing draft projects
MemoryMem0 (local SQLite / cloud sync)Style preference digital twin
BackendPython 3.10+ + FastAPI + LangGraphAsync workflow orchestration, WebSocket push
FrontendReact 19 + TailwindCSS + Wouter3-panel studio, no mock data

๐Ÿš€ Quick Start

๐Ÿ“‹ Requirements

SoftwareVersionNotes
Python3.10+Backend runtime
Node.js18+Frontend build
FFmpeg4.0+Video assembly (required)
Docker20.0+Container deployment (optional)

Install FFmpeg

macOS:

brew install ffmpeg

Ubuntu / Debian:

sudo apt update && sudo apt install ffmpeg

Windows: Download from ffmpeg.org and add to PATH. Verify:

ffmpeg -version

Clone & Install

# 1. Clone the repository git clone https://github.com/OpenDemon/Pilipili-AutoVideo.git cd Pilipili-AutoVideo # 2. Install Python dependencies pip install -r requirements.txt # 3. Copy config template cp configs/config.example.yaml configs/config.yaml

Configure API Keys

Edit configs/config.yaml:

llm: provider: deepseek # deepseek | kimi | minimax | gemini api_key: "sk-xxxx" image_gen: provider: nano_banana api_key: "AIzaSy-xxxx" # Google AI Studio Key video_gen: default_engine: kling # kling | seedance | auto kling: api_key: "xxxx" api_secret: "xxxx" seedance: api_key: "xxxx" tts: provider: minimax api_key: "xxxx" group_id: "xxxx" memory: provider: local # local | mem0_cloud # mem0_api_key: "m0-xxxx" # Fill in for cloud sync

๐Ÿ’ก You can also configure API keys visually at http://localhost:3000/settings โ€” no YAML editing required.

Option 1: CLI (Recommended for debugging)

# Basic usage python cli/main.py run --topic "Cyberpunk Mars colony, 60 seconds, cold color palette" # Specify engine python cli/main.py run \ --topic "Ancient palace romance story" \ --engine seedance \ --duration 90 \ --add-subtitles # List past projects python cli/main.py list # Help python cli/main.py --help

Option 2: Web UI (Recommended for daily use)

# Start backend python cli/main.py server # In another terminal, start frontend cd frontend pnpm install && pnpm dev # Visit http://localhost:3000

Option 3: Docker Compose (Recommended for production)

# Copy environment variables cp .env.example .env # Edit .env with your API keys # Start all services docker-compose up -d # Visit http://localhost:3000

๐Ÿ“ฆ Project Structure

Pilipili-AutoVideo/
โ”œโ”€โ”€ api/
โ”‚   โ””โ”€โ”€ server.py           # FastAPI backend + WebSocket
โ”œโ”€โ”€ cli/
โ”‚   โ””โ”€โ”€ main.py             # Click CLI entrypoint
โ”œโ”€โ”€ core/
โ”‚   โ””โ”€โ”€ config.py           # Global config (Pydantic Settings)
โ”œโ”€โ”€ modules/
โ”‚   โ”œโ”€โ”€ llm.py              # LLM script generation (multi-provider)
โ”‚   โ”œโ”€โ”€ image_gen.py        # Nano Banana keyframe generation
โ”‚   โ”œโ”€โ”€ tts.py              # MiniMax TTS + duration measurement
โ”‚   โ”œโ”€โ”€ video_gen.py        # Kling 3.0 / Seedance 1.5 I2V
โ”‚   โ”œโ”€โ”€ assembler.py        # FFmpeg assembly + subtitle burning
โ”‚   โ”œโ”€โ”€ jianying_draft.py   # CapCut/JianYing draft generation
โ”‚   โ””โ”€โ”€ memory.py           # Mem0 memory system
โ”œโ”€โ”€ frontend/               # React 19 frontend (3-panel studio)
โ”œโ”€โ”€ skills/
โ”‚   โ””โ”€โ”€ SKILL.md            # Skill packaging spec
โ”œโ”€โ”€ configs/
โ”‚   โ”œโ”€โ”€ config.example.yaml # Config template
โ”‚   โ””โ”€โ”€ config.yaml         # Local config (gitignored)
โ”œโ”€โ”€ tests/
โ”‚   โ””โ”€โ”€ test_pipeline.py    # Unit tests (18 test cases)
โ”œโ”€โ”€ data/
โ”‚   โ”œโ”€โ”€ outputs/            # Generated videos and drafts
โ”‚   โ””โ”€โ”€ memory/             # Memory database
โ”œโ”€โ”€ docker-compose.yml
โ”œโ”€โ”€ Dockerfile.backend
โ”œโ”€โ”€ requirements.txt
โ””โ”€โ”€ pyproject.toml

๐ŸŽฌ Workflow Deep Dive

The core workflow is orchestrated by LangGraph in the following stages:

User Input
  โ”‚
  โ–ผ
โ‘  Script Generation (LLM)
  โ”‚  DeepSeek/Kimi expands one sentence into a structured storyboard
  โ”‚  Each scene: voiceover text, visual description, motion description,
  โ”‚              duration, transition, camera motion
  โ”‚
  โ–ผ
โ‘ก Scene Review (optional human step)
  โ”‚  Web UI shows scene list; user can edit each scene before confirming
  โ”‚  CLI mode: auto-approved
  โ”‚
  โ–ผ
โ‘ข Parallel Generation (Keyframe Images + TTS Voiceover)
  โ”‚  Nano Banana generates 4K keyframe images for each scene in parallel
  โ”‚  MiniMax TTS generates voiceover for each scene, measuring exact ms duration
  โ”‚
  โ–ผ
โ‘ฃ Video Generation (Image-to-Video)
  โ”‚  Uses keyframe as first frame, voiceover duration as video duration
  โ”‚  Kling 3.0 (action/product) or Seedance 1.5 (narrative/multi-character)
  โ”‚
  โ–ผ
โ‘ค Assembly (FFmpeg)
  โ”‚  xfade transitions + background music mixing + WhisperX subtitle burning
  โ”‚
  โ–ผ
โ‘ฅ Draft Export (CapCut/JianYing)
  โ”‚  Auto-generates draft project preserving all scene assets and timeline
  โ”‚
  โ–ผ
โ‘ฆ Memory Update (Mem0)
     After user rating, system learns style preferences for future generations

๐Ÿ†š Comparison

DimensionLibTVHuobao DramaPilipili
InteractionNode canvas, manual triggerForm-based, step-by-stepNatural language, one sentence
Audio-Video SyncManual editingNot explicitly supportedMeasure TTS duration โ†’ control video duration
Subject ConsistencyPrompt guidanceReference image uploadNano Banana keyframe lock + Kling Reference API
Final DeliveryManual import to CapCutMP4 exportAuto CapCut draft + MP4 dual output
Memory SystemNoneNoneMem0 digital twin, learns your style
Agent IntegrationNoneNoneStandard Skill, callable by any Agent
DeploymentCloud SaaSCloud SaaSLocal deployment, full data ownership

๐Ÿงช Testing

# Run all unit tests (no API keys required) python -m pytest tests/test_pipeline.py -v -m "not api and not e2e" # Run API integration tests (real API keys required) python -m pytest tests/test_pipeline.py -v -m "api" # Run full E2E tests python -m pytest tests/test_pipeline.py -v -m "e2e"

Current test coverage: 18 unit tests, all passing.


๐Ÿ”Œ Skill Integration

Pilipili-AutoVideo is packaged as a standard Skill, callable by any AI Agent:

# In an Agent session Please generate a 60-second science explainer video about "The History of AI Chips", blue-purple tech aesthetic.

The Agent will automatically read skills/SKILL.md and invoke Pilipili to complete the entire workflow.


๐Ÿ“ FAQ

Q: FFmpeg not found?
A: Ensure FFmpeg is installed and in your PATH. Run ffmpeg -version to verify.

Q: Video generation is slow โ€” is that normal?
A: Video generation relies on cloud APIs (Kling/Seedance), typically 2-5 minutes per scene. This is an API-side constraint, not a local performance issue.

Q: How do I switch LLM providers?
A: Edit llm.provider in configs/config.yaml, or use the Settings page in the Web UI.

Q: Where is the CapCut/JianYing draft?
A: After generation, the draft project is at data/outputs/{project_id}/draft/. Copy the entire folder to CapCut's draft directory to open it.

Q: What aspect ratios are supported?
A: 9:16 (portrait, TikTok/Reels), 16:9 (landscape, YouTube), 1:1 (square, Instagram).


๐Ÿค Contributing

Issues and Pull Requests are welcome!

  1. Fork the repository
  2. Create a feature branch: git checkout -b feature/amazing-feature
  3. Commit your changes: git commit -m 'feat: add amazing feature'
  4. Push the branch: git push origin feature/amazing-feature
  5. Open a Pull Request

๐Ÿ“„ License

This project is licensed under the MIT License.


Pilipili-AutoVideo ยท ๅ™ผๅ“ฉๅ™ผๅ“ฉ ยท Local Deployment ยท Fully Automated AI Video Agent
If this project helps you, please give it a โญ Star!

ๅ…ณไบŽ About

๐ŸŽฌ ๅ…จ่‡ชๅŠจ AI ่ง†้ข‘ไปฃ็† ยท ไธ€ๅฅ่ฏ็”Ÿๆˆๅธฆๅญ—ๅน•ๆˆ็‰‡ ยท Fully Automated AI Video Agent ยท Local Deployment
ai-agentai-videocapcutfastapiffmpegjianyingklinglangchainmem0minimax-ttspythonreactseedancetext-to-video

่ฏญ่จ€ Languages

Python100.0%

ๆไบคๆดป่ทƒๅบฆ Commit Activity

ไปฃ็ ๆไบค็ƒญๅŠ›ๅ›พ
่ฟ‡ๅŽป 52 ๅ‘จ็š„ๅผ€ๅ‘ๆดป่ทƒๅบฆ
44
Total Commits
ๅณฐๅ€ผ: 32ๆฌก/ๅ‘จ
Less
More

ๆ ธๅฟƒ่ดก็Œฎ่€… Contributors