代码库

A powerful 3B-parameter, LLM-based Reinforcement Learning audio edit model excels at editing emotion, speaking style, and paralinguistics, and features robust zero-shot text-to-speech

Python

audio-editingcross-lingualemotion-controlparalinguisticsreinforcement-learningspeaking-stylestyle-controltext-to-speechttsvoice-cloningzero-shot-tts

stepfun-ai/NextStep-1

687

[🚀 ICLR 2026 Oral] NextStep-1: SOTA Autogressive Image Generation with Continuous Tokens. A research project developed by the StepFun’s Multimodal Intelligence team.

Python