Agent Safe Probe (ASP-X)
ASP-X (Agent Safe Probe X) — An open-source framework for automated safety evaluation of intelligent agents
🌐 Choose Your Language / 选择语言
English: Read the full English documentation
中文: 阅读 完整中文文档
Quick Overview / 快速概览
Agent Safe Probe is a comprehensive testing framework designed to evaluate the security and safety of AI agent systems. It provides a systematic approach to test various attack scenarios including prompt injection, backdoor attacks, memory-based attacks, and more.
Agent Safe Probe 是一个全面的测试框架,旨在评估AI代理系统的安全性和可靠性。它提供了系统化的方法来测试各种攻击场景,包括提示注入、后门攻击、基于记忆的攻击等。
✨ Key Features / 主要特性
| Feature / 特性 | English / 英文 | 中文 |
|---|---|---|
| 🎯 Attack Methods | Direct/Observation Prompt Injection, Backdoor, Memory-based | 直接/观察提示注入、后门、基于记忆 |
| 🛡️ Defense Strategies | Delimiters, Instruction, Paraphrase, Dynamic Rewriting | 分隔符、指令、释义、动态重写 |
| 🤖 Supported Models | Llama3, Qwen2, Gemma2, GPT-4, Claude, and more | Llama3、Qwen2、Gemma2、GPT-4、Claude等 |
| 🔧 Easy Setup | Ollama integration (default) | Ollama集成(默认) |
🚀 Quick Start / 快速开始
# Clone the repository / 克隆仓库 git clone https://github.com/yourusername/agent-safe-probe-x.git cd agent-safe-probe-x # Install dependencies / 安装依赖 pip install -r requirements.txt # Install Ollama models / 安装 Ollama 模型 ollama pull llama3:8b # Run attacks / 运行攻击 python main_attacker.py --config config/DPI.yml
📖 Full Documentation / 完整文档
- English Documentation (README_EN.md) - Complete guide with all features, examples, and API references
- 中文文档 (README_CN.md) - 包含所有功能、示例和API参考的完整指南
🎯 Use Cases / 应用场景
- Security Research / 安全研究: Testing agent vulnerabilities
- AI Safety / AI安全: Evaluating safety mechanisms
- Penetration Testing / 渗透测试: Red teaming AI systems
- Defensive Development / 防御开发: Building robust agents
📊 Supported Methods / 支持的方法
Attack Methods / 攻击方法
- Direct Prompt Injection (DPI) / 直接提示注入
- Observation Prompt Injection (OPI) / 观察提示注入
- Prompt-Only Triggered (POT) Backdoor / 仅提示触发的后门
- Memory-Based Attacks / 基于记忆的攻击
Defense Methods / 防御方法
- Delimiters Defense / 分隔符防御
- Instructional Prevention / 指令预防
- Paraphrase Defense / 释义防御
- Dynamic Prompt Rewriting / 动态提示重写
- Sandwich Defense / 三明治防御
🤝 Contributing / 贡献
We welcome contributions! See our contributing guidelines.
欢迎贡献!请参阅贡献指南。
📄 License / 许可证
MIT License - See LICENSE file for details.
MIT 许可证 - 详见 LICENSE 文件。
📞 Contact / 联系方式
For questions or issues, please open an issue on GitHub.
如有问题,请在GitHub上提交issue。
⚠️ Disclaimer / 免责声明
This framework is intended solely for legitimate security research and authorized testing.
本框架仅用于合法的安全研究和授权测试。