SoulX-LiveAct: Towards Hour-Scale Real-Time Human Animation with Neighbor Forcing and ConvKV Memory

Dingcheng Zhen*^✉ · Xu Zheng* · Ruixin Zhang* · Zhiqi Jiang*

SoulX-LiveAct presents a novel framework that enables lifelike, multimodal-controlled, high-fidelity human animation video generation for real-time streaming interactions.

(I) We identify diffusion-step-aligned neighbor latents as a key inductive bias for AR diffusion, providing a principled and theoretically grounded Neighbor Forcing for step-consistent AR video generation.

(II) We introduce ConvKV Memory, a lightweight plug-in compression mechanism that enables constant-memory hour-scale video generation with negligible overhead.

(III) We develop an optimized real-time system that achieves 20 FPS using only two H100/H200 GPUs with end-end adaptive FP8 precision, sequence parallelism, and operator fusion at 720×416 or 512×512 resolution.

🔥🔥🔥 News

📢 Mar 18, 2026: We now support consumer GPUs (e.g., RTX 4090, RTX 5090) with FP8 KV cache and CPU model offloading. In our tests, the 18B model (14B Wan2.1 + 4B audio module) achieves a throughput of 6 FPS on a single RTX 5090.
👋 Mar 16, 2026: We release the inference code and model weights of SoulX-LiveAct.

🎥 Demo

👫 Podcast

🎤 Music & Talk Show

📱 FaceTime

📑 Open-source Plan

Release inference code and checkpoints
GUI demo Support
End-end adaptive FP8 precision
Support model offloading for consumer GPUs (e.g., RTX 4090, RTX 5090) to reduce memory usage
Support FP4 precision for B-series GPUs (e.g., RTX 5090, B100, B200)
Release training code

▶️ Quick Start

🛠️ Dependencies and Installation

Step 1: Install Basic Dependencies

conda create -n liveact python=3.10
conda activate liveact
pip install -r requirements.txt
conda install conda-forge::sox -y

Step 2: Install SageAttention

To enable fp8 attention kernel, you need to install SageAttention:

Install SageAttention:

git clone https://github.com/thu-ml/SageAttention.git
cd SageAttention
git checkout v2.2.0
python setup.py install

(Optional) Install the modified version of SageAttention: To enable SageAttention for QKV's operator fusion, you need to install it by the following command:
```
git clone https://github.com/ZhiqiJiang/SageAttentionFusion.git
cd SageAttentionFusion
python setup.py install
```

Step 3: Install vllm:

To enable fp8 gemm kernel, you need to install vllm:

pip install vllm==0.11.0

Step 4 Install LightVAE:：

git clone https://github.com/ModelTC/LightX2V
cd LightX2V
python setup_vae.py install

🤗 Download Checkpoints

Model Cards

ModelName	Download
SoulX-LiveAct	🤗 Huggingface, 魔搭 ModelScope
chinese-wav2vec2-base	🤗 Huggingface

🔑 Inference

Usage of LiveAct

1. Run real-time streaming inference on two H100/H200 GPUs

USE_CHANNELS_LAST_3D=1 CUDA_VISIBLE_DEVICES=0,1 \
torchrun --nproc_per_node=2 --master_port=$(shuf -n 1 -i 10000-65535)  \
    generate.py \
    --size 416*720 \
    --ckpt_dir MODEL_PATH \
    --wav2vec_dir chinese-wav2vec2-base \
    --fps 20 \
    --dura_print \
    --input_json examples/example.json \
    --steam_audio

2. Run with action or emotion editing at real-time streaming performance

USE_CHANNELS_LAST_3D=1 CUDA_VISIBLE_DEVICES=0,1 \
torchrun --nproc_per_node=2 --master_port=$(shuf -n 1 -i 10000-65535)  \
    generate.py \
    --size 512*512 \
    --ckpt_dir MODEL_PATH \
    --wav2vec_dir chinese-wav2vec2-base \
    --fps 24 \
    --input_json examples/example_edit.json

3. Run with the best performance settings

USE_CHANNELS_LAST_3D=1 CUDA_VISIBLE_DEVICES=0,1 \
torchrun --nproc_per_node=2 --master_port=$(shuf -n 1 -i 10000-65535)  \
    generate.py \
    --size 480*832 \
    --ckpt_dir MODEL_PATH \
    --wav2vec_dir chinese-wav2vec2-base \
    --fps 24 \
    --input_json examples/example.json

4. Run on RTX 4090/RTX 5090 GPUs

Note: FP8 KV cache may slightly affect generation quality.

USE_CHANNELS_LAST_3D=1 CUDA_VISIBLE_DEVICES=0 \
python generate.py \
    --size 416*720 \
    --ckpt_dir MODEL_PATH \
    --wav2vec_dir chinese-wav2vec2-base \
    --fps 24 \
    --input_json examples/example.json \
    --fp8_kv_cache \
    --block_offload \
    --t5_cpu

5. Run with single GPU for Eval

USE_CHANNELS_LAST_3D=1 CUDA_VISIBLE_DEVICES=0 \
python generate.py \
    --size 480*832 \
    --ckpt_dir MODEL_PATH \
    --wav2vec_dir chinese-wav2vec2-base \
    --fps 24 \
    --input_json examples/example.json \
    --audio_cfg 1.7 \
    --t5_cpu

Command Line Arguments

Argument	Type	Required	Default	Description
`--size`	str	Yes	-	The width and height of the generated video.
`--t5_cpu`	bool	No	false	Whether to place T5 model on CPU.
`--offload_cache`	bool	No	-	Whether to place kv cache on CPU.
`--fps`	int	Yes	-	The target fps of the generated video.
`--audio_cfg`	float	No	1.0	Classifier free guidance scale for audio control.
`--dura_print`	bool	No	no	Whether print duration for every block.
`--input_json`	str	Yes	_	The condition json file path to generate the video.
`--seed`	int	No	42	The seed to use for generating the image or video.
`--steam_audio`	bool	No	false	Whether inference with steaming audio.
`--mean_memory`	bool	No	false	Whether to use the mean memory strategy during inference for further performance improvement.
`--fp8_kv_cache`	bool	No	false	Whether to store kv cache in FP8 and dequantize to BF16 on use. FP8 KV cache may slightly affect generation quality.
`--block_offload`	bool	No	false	Whether to offload model blocks to CPU between block forwards.

💻 GUI demo

Run SoulX-LiveAct inference on the GUI demo and evaluate real-time performance.

Note: The first few blocks during the initial run require warm-up. Normal performance will be observed from the second run onward.

1. Run real-time streaming inference on two H100/H200 GPUs

USE_CHANNELS_LAST_3D=1 CUDA_VISIBLE_DEVICES=0,1 \
torchrun --nproc_per_node=2 --master_port=$(shuf -n 1 -i 10000-65535) \
  demo.py \
  --ckpt_dir MODEL_PATH \
  --wav2vec_dir chinese-wav2vec2-base \
  --size 416*720 \
  --video_save_path ./generated_videos

2. Run on RTX 4090/RTX 5090 GPUs

USE_CHANNELS_LAST_3D=1 CUDA_VISIBLE_DEVICES=0 \
torchrun --nproc_per_node=1 --master_port=$(shuf -n 1 -i 10000-65535) \
  demo.py \
  --ckpt_dir MODEL_PATH \
  --wav2vec_dir chinese-wav2vec2-base \
  --size 416*720 \
  --fp8_kv_cache \
  --block_offload \
  --t5_cpu \
  --video_save_path ./generated_videos

📚 Citation

@misc{zhen2026soulxliveacthourscalerealtimehuman,
      title={SoulX-LiveAct: Towards Hour-Scale Real-Time Human Animation with Neighbor Forcing and ConvKV Memory}, 
      author={Dingcheng Zhen and Xu Zheng and Ruixin Zhang and Zhiqi Jiang and Yichao Yan and Ming Tao and Shunshun Yin},
      year={2026},
      eprint={2603.11746},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2603.11746}, 
}

📮 Contact Us

If you are interested in leaving a message to our work, feel free to email dingchengzhen@soulapp.cn.

You’re welcome to join our WeChat group or Soul group for technical discussions.

WeChat Group QR Code WeChat QR Code

SoulX-LiveAct: Towards Hour-Scale Real-Time Human Animation with Neighbor Forcing and ConvKV Memory

🔥🔥🔥 News

🎥 Demo

👫 Podcast

🎤 Music & Talk Show

📱 FaceTime

📑 Open-source Plan

▶️ Quick Start

🛠️ Dependencies and Installation

Step 1: Install Basic Dependencies

Step 2: Install SageAttention

Step 3: Install vllm:

Step 4 Install LightVAE:：

🤗 Download Checkpoints

Model Cards

🔑 Inference

Usage of LiveAct

1. Run real-time streaming inference on two H100/H200 GPUs

2. Run with action or emotion editing at real-time streaming performance

3. Run with the best performance settings

4. Run on RTX 4090/RTX 5090 GPUs

5. Run with single GPU for Eval

Command Line Arguments

💻 GUI demo

1. Run real-time streaming inference on two H100/H200 GPUs

2. Run on RTX 4090/RTX 5090 GPUs

📚 Citation

📮 Contact Us

关于 About

语言 Languages

提交活跃度 Commit Activity