Star 历史趋势
数据来源: GitHub API · 生成自 Stargazers.cn
README.md

🦀 CRAB: Cross-platform Agent Benchmark for Multimodal Embodied Language Model Agents

arXiv Slack Discord Wechat Twitter

Documentation | Website & Demos | Blog | Chinese Blog | CAMEL-AI

Overview

CRAB is a framework for building LLM agent benchmark environments in a Python-centric way.

Key Features

🌐 Cross-platform and Multi-environment

  • Create build agent environments that support various deployment options including in-memory, Docker-hosted, virtual machines, or distributed physical machines, provided they are accessible via Python functions.
  • Let the agent access all the environments in the same time through a unified interface.

⚙ ️Easy-to-use Configuration

  • Add a new action by simply adding a @action decorator on a Python function.
  • Define the environment by integrating several actions together.

📐 Novel Benchmarking Suite

  • Define tasks and the corresponding evaluators in an intuitive Python-native way.
  • Introduce a novel graph evaluator method providing fine-grained metrics.

Installation

Prerequisites

  • Python 3.10 or newer
pip install crab-framework[client]

Experiment on CRAB-Benchmark-v0

All datasets and experiment code are in crab-benchmark-v0 directory. Please carefully read the benchmark tutorial before using our benchmark.

Examples

Run template environment with openai agent

export OPENAI_API_KEY=<your api key> python examples/single_env.py python examples/multi_env.py

Demo Video

demo_video

Cite

Please cite our paper if you use anything related in your work:

@misc{xu2024crab,
      title={CRAB: Cross-environment Agent Benchmark for Multimodal Language Model Agents}, 
      author={Tianqi Xu and Linyao Chen and Dai-Jie Wu and Yanjun Chen and Zecheng Zhang and Xiang Yao and Zhiqiang Xie and Yongchao Chen and Shilong Liu and Bochen Qian and Philip Torr and Bernard Ghanem and Guohao Li},
      year={2024},
      eprint={2407.01511},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2407.01511}, 
}

Community

Join us (Discord or WeChat) in pushing the boundaries of finding the scaling laws of agents.

  • WeChat Community: Scan the QR code below to join our WeChat community.

    WeChat QR Code

关于 About

🦀️ CRAB: Cross-environment Agent Benchmark for Multimodal Language Model Agents. https://crab.camel-ai.org/
gui-automationlanguage-model-agentlarge-language-modelsmulti-agent-systemsvisual-language-models

语言 Languages

Python99.5%
Shell0.5%

提交活跃度 Commit Activity

代码提交热力图
过去 52 周的开发活跃度
20
Total Commits
峰值: 1次/周
Less
More

核心贡献者 Contributors