代码库
A powerful 3B-parameter, LLM-based Reinforcement Learning audio edit model excels at editing emotion, speaking style, and paralinguistics, and features robust zero-shot text-to-speech
Python
audio-editingcross-lingualemotion-controlparalinguisticsreinforcement-learningspeaking-stylestyle-controltext-to-speechttsvoice-cloningzero-shot-tts
[🚀 ICLR 2026 Oral] NextStep-1: SOTA Autogressive Image Generation with Continuous Tokens. A research project developed by the StepFun’s Multimodal Intelligence team.
Python