Star 历史趋势
数据来源: GitHub API · 生成自 Stargazers.cn
README.md

简体中文 | English

ASTRA:Automated Synthesis of agentic Trajectories and Reinforcement Arenas

Blog HuggingFace HuggingFace Paper License

🆕 Updates

DateUpdates
2026/01/30📄 Paper Release
2026/01/22🎉 Release Code, Models, and Datasets

📖 Overview

This repository provides an end-to-end pipeline for fully automated, verifiable synthesis of high-quality data and environments, with native support for process-level rewards. It is designed for training models with multi-step reasoning and tool-use capabilities and easy to scale to new tasks and tools. Here are the two main modules:

  • Trajectory Synthesis: Automatically generate high-quality, multi-step interactive trajectories and verified by reward system.

  • Environment Synthesis: Fully automatically synthesize interactive environments with no human labels required that provide step-wise process rewards to enable RLVR.

ModuleFunctionDirectory
Trajectory SynthesisTool graph construction → Task generation → Trajectory collection → Reward assessmenttrajectory_synthesis/
Environment SynthesisQuestion decomposition → Automatic tool environment generation → RLVR training dataenv_synthesis/

🏆 Model Performance

We release two models: ASTRA-32B-Thinking-v1 and ASTRA-14B-Thinking-v1, which are trained with SFT and RL using our synthesized data. Below are the evaluation results on BFCL-V3-MT:

ModelBaseLong ContextMiss FuncMiss ParamAverage ↓
Claude-Opus-4-5-2025110181.570.564.058.068.5
GLM-4.674.566.568.063.068.0
ASTRA-32B-Thinking-v176.566.565.548.564.3
Gemini-3-Pro-Preview69.064.063.056.563.1
o3-2025-04-1668.063.063.554.562.3
Claude-Sonnet-4-5-2025092969.059.065.052.561.4
Grok-4-1-fast-reasoning70.562.559.543.058.9
ASTRA-14B-Thinking-v167.061.056.048.558.1
LoopTool-32B (Report From Paper)----57.8
Claude-Haiku-4-5-2025100163.556.042.552.553.6
Kimi-K2-Instruct62.055.041.044.550.6
Qwen3-32B59.051.547.540.549.6
Qwen3-30B-A3B-Thinking-250766.058.031.535.547.8
TouCan-32B (Report From Paper)----46.5
Qwen3-14B50.548.039.540.044.5
Qwen3-30B-A3B-Instruct-250743.541.010.525.030.0

🔄 Pipelines

Part 1: Trajectory Synthesis

SFT Pipeline

Starting from MCP Server tool documentation, build tool dependency graphs and generate high-quality SFT training data.

mcp_servers.jsonl → Graph construction → Task generation → LLM interaction → Reward assessment → SFT data

👉 For detailed usage instructions, please refer to trajectory_synthesis/README.md


Part 2: Environment Synthesis

Environment Synthesis Pipeline

Automatically generate executable tool environments from Q&A pairs, supporting RLVR training.

QA data → Question decomposition → Tool necessity check → Verification → Environment synthesis → Tool merging

👉 For detailed usage instructions, please refer to env_synthesis/README.md


📜 License

This project is licensed under Apache 2.0 License.


📎 Citation

@misc{tian2026astraautomatedsynthesisagentic, title={ASTRA: Automated Synthesis of agentic Trajectories and Reinforcement Arenas}, author={Xiaoyu Tian and Haotian Wang and Shuaiting Chen and Hao Zhou and Kaichi Yu and Yudian Zhang and Jade Ouyang and Junxi Yin and Jiong Chen and Baoyan Guo and Lei Zhang and Junjie Tao and Yuansheng Song and Ming Cui and Chengwei Liu}, year={2026}, eprint={2601.21558}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2601.21558}, }

关于 About

ASTRA is an end-to-end system for synthesizing agentic trajectories and rule-verifiable environments for SFT and RL training, developed by Beike Language and Intelligence (BLI).

语言 Languages

Python97.0%
Shell1.9%
CSS1.1%

提交活跃度 Commit Activity

代码提交热力图
过去 52 周的开发活跃度
6
Total Commits
峰值: 5次/周
Less
More

核心贡献者 Contributors