代码库
DeepDive: Advancing Deep Search Agents with Knowledge Graphs and Multi-Turn RL
Python
LongBench v2 and LongBench (ACL 25'&24')
Python
benchmarkllmlong-contextlongtext
slime is an LLM post-training framework for RL Scaling.
Python
A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)
Python
chatgptgpt-4llmllm-agent