Star 历史趋势
数据来源: GitHub API · 生成自 Stargazers.cn
README.md

🔬 Awesome Autoresearch

A curated, high-signal index of autonomous improvement loops, research agents, and descendants inspired by karpathy/autoresearch.

Awesome PRs Welcome License: CC0-1.0

by Boring Dystopia Development

boringdystopia.ai   X @alvinunreal   Telegram Join channel

Contents

🛠️ General-purpose descendants

  • uditgoenka/autoresearch GitHub stars - Claude Code skill that generalizes autoresearch into a reusable loop for software, docs, security, shipping, debugging, and other measurable goals.
  • leo-lilinxiao/codex-autoresearch GitHub stars - Codex-native autoresearch skill with resume support, lessons across runs, optional parallel experiments, and mode-specific workflows.
  • supratikpm/gemini-autoresearch GitHub stars - Gemini CLI skill that generalises autoresearch to any measurable goal. Gemini-native: uses Google Search grounding as a live verification source inside the loop, true headless overnight mode via --yolo --prompt, and 1M token context. Also works in Antigravity IDE via .agents/skills/.
  • davebcn87/pi-autoresearch GitHub stars - pi extension plus dashboard for persistent experiment loops, live metrics, confidence tracking, and resumable autoresearch sessions.
  • drivelineresearch/autoresearch-claude-code GitHub stars - Claude Code plugin/skill port of pi-autoresearch, with a clean experiment-loop workflow and a concrete biomechanics case study.
  • greyhaven-ai/autocontext GitHub stars - Closed-loop control plane for repeated agent improvement, with evaluation, persistent knowledge, staged validation, and optional distillation into cheaper local runtimes.
  • jmilinovich/goal-md GitHub stars - Generalizes autoresearch into a GOAL.md pattern for repos where the agent must first construct a measurable fitness function before it can optimize.
  • mutable-state-inc/autoresearch-at-home GitHub stars - Collaborative fork of upstream autoresearch that adds experiment claiming, shared best-config syncing, hypothesis exchange, and swarm-style coordination across many single-GPU agents.
  • zkarimi22/autoresearch-anything GitHub stars - Generalizes autoresearch to any measurable metric — system prompts, API performance, landing pages, test suites, config tuning, SQL queries. "If you can measure it, you can optimize it."
  • Entrpi/autoresearch-everywhere GitHub stars - Cross-platform expansion that auto-detects hardware config and starts the loop. The "glue and generalization" half of autoresearch.
  • ShengranHu/ADAS GitHub stars - Automated Design of Agentic Systems — ICLR 2025. Meta-agents that invent novel agent architectures by programming them in code.
  • MaximeRobeyns/self_improving_coding_agent GitHub stars - SICA: Self-Improving Coding Agent that edits its own codebase. ICLR 2025 Workshop paper demonstrating scaffold-level self-improvement on coding benchmarks.
  • peterskoett/self-improving-agent GitHub stars - Alternative self-improving agent architecture with reflection and meta-learning cycles.
  • metauto-ai/HGM GitHub stars - Huxley-Gödel Machine for coding agents — applies self-improvement to SWE-bench performance via meta-level optimization.
  • gepa-ai/gepa GitHub stars - GEPA (Genetic-Pareto) — ICLR 2026 Oral. Reflective prompt evolution that outperforms RL (GRPO) on benchmarks. Optimizes any textual parameters against any metric using natural language reflection.
  • MrTsepa/autoevolve GitHub stars - GEPA-inspired autoresearch for self-play: mutate code strategies, evaluate head-to-head, rate with Elo/Bradley-Terry, branch from the Pareto front. Agent reads match traces to target mutations. Works as a Claude Code skill.
  • HKUDS/ClawTeam GitHub stars - Agent swarm intelligence for autoresearch — spawns parallel GPU research directions, distributes work across agents, aggregates results.
  • Orchestra-Research/AI-Research-SKILLs GitHub stars - Comprehensive skill library including autoresearch orchestration with two-loop architecture (inner optimization + outer synthesis).
  • WecoAI/aideml GitHub stars - AIDE: Tree-search ML engineering agent that autonomously improves model performance via iterative code generation and evaluation.
  • weco.ai - Weco: Cloud platform for AIDE with observability, experiment tracking, and managed runs — brings the autoresearch loop into production.

🔬 Research-agent systems

  • aiming-lab/AutoResearchClaw GitHub stars - End-to-end research pipeline that turns a topic into literature review, experiments, analysis, peer review, and paper drafts; broader than autoresearch, but clearly in the same lineage.
  • OpenRaiser/NanoResearch GitHub stars - End-to-end autonomous research engine that plans experiments, generates code, runs jobs locally or on SLURM, analyzes real results, and writes papers grounded in those outputs.
  • wanshuiyin/Auto-claude-code-research-in-sleep GitHub stars - Markdown-first research workflows for Claude Code and other agents, centered on autonomous literature review, experiments, paper iteration, and cross-model critique.
  • Sibyl-Research-Team/AutoResearch-SibylSystem GitHub stars - Fully autonomous AI scientist built on Claude Code, with explicit AutoResearch lineage, multi-agent research iteration, GPU experiment execution, and a self-evolving outer loop.
  • eimenhmdt/autoresearcher GitHub stars - Early open-source package for automating scientific workflows, currently centered on literature-review generation with an ambition toward broader autonomous research.
  • hyperspaceai/agi GitHub stars - Distributed, peer-to-peer research network where autonomous agents run experiments, gossip findings, maintain CRDT leaderboards, and archive results to GitHub across multiple research domains.
  • SakanaAI/AI-Scientist GitHub stars - The AI Scientist: First comprehensive system for fully automatic scientific discovery. From idea generation to paper writing with minimal human supervision.
  • SakanaAI/AI-Scientist-v2 GitHub stars - Workshop-level automated scientific discovery via agentic tree search. Removes template dependency from v1, generalizes across research domains.
  • HKUDS/AI-Researcher GitHub stars - NeurIPS 2025 paper. Full end-to-end research automation: hypothesis → experiments → manuscript → peer review. Production version at novix.science.
  • openags/Auto-Research GitHub stars - OpenAGS: Orchestrates a team of AI agents across the full research lifecycle — lit review, hypothesis generation, experiments, manuscript writing, and peer review.
  • SamuelSchmidgall/AgentLaboratory GitHub stars - End-to-end autonomous research workflow: idea → literature review → experiments → report. Supports both autonomous and co-pilot modes.
  • AgentRxiv - Collaborative autonomous research framework where agent laboratories share a preprint server to build on each other's work iteratively.
  • JinheonBaek/ResearchAgent GitHub stars - Iterative research idea generation over scientific literature with LLMs. Multi-agent review and feedback loops.
  • du-nlp-lab/MLR-Copilot GitHub stars - Autonomous ML research framework — generates ideas, implements experiments, analyzes results.
  • MASWorks/ML-Agent GitHub stars - Reinforcing LLM agents for autonomous ML engineering. Learns from trial and error to improve model performance.
  • PouriaRouzrokh/LatteReview GitHub stars - Low-code Python package for automated systematic literature reviews via AI-powered agents.
  • LitLLM/LitLLM GitHub stars - AI-powered literature review assistant using RAG for accurate, well-structured related-work sections in academic writing.
  • Agent Laboratory - Three-phase research pipeline: Literature Review → Experimentation → Report Writing, with specialized agents for each phase.
  • WecoAI/aideml GitHub stars - AIDE: AI-Driven Exploration — tree-search-based ML engineering agent that automates experiment design, code generation, and evaluation. Treats ML engineering as code optimization against any metric.

💻 Platform ports and hardware forks

  • miolini/autoresearch-macos GitHub stars - Widely adopted macOS fork that adapts upstream autoresearch for Apple Silicon / MPS while preserving the original loop shape.
  • trevin-creator/autoresearch-mlx GitHub stars - MLX-native Apple Silicon port that keeps the upstream fixed-budget val_bpb loop while removing the PyTorch/CUDA dependency entirely.
  • jsegov/autoresearch-win-rtx GitHub stars - Windows-native RTX fork focused on consumer NVIDIA GPUs, with explicit VRAM floors and a practical desktop setup path.
  • iii-hq/n-autoresearch GitHub stars - Multi-GPU autoresearch infrastructure with structured experiment tracking, adaptive search strategy, crash recovery, and queryable orchestration around the classic train.py loop.
  • lucasgelfond/autoresearch-webgpu GitHub stars - Browser/WebGPU port that lets agents generate training code, run experiments in-browser, and feed results back into the loop without a Python setup.
  • tonitangpotato/autoresearch-engram GitHub stars - Fork with persistent cognitive memory — frequency-weighted retrieval of cross-session knowledge for improved experiment continuity.
  • Colab/Kaggle T4 port - Adapts autoresearch for free T4 GPUs (Google Colab / Kaggle) with zero cost and zero local setup. Key changes: Flash Attention 3 → PyTorch SDPA, removes H100-only kernel dependency. (upstream issue #208)
  • ArmanJR-Lab/autoautoresearch - Jetson AGX Orin port with a director — a Go binary that acts as a "creative director" injecting novelty (arxiv papers + DeepSeek Reasoner) into the loop to escape local minima. Includes multi-experiment comparison (baseline vs director-guided) with detailed stall analysis.

🎯 Domain-specific adaptations

  • mattprusak/autoresearch-genealogy GitHub stars - Applies the autoresearch pattern to genealogy, using structured prompts, archive guides, source checks, and vault workflows to iteratively expand and verify family-history research.
  • ArchishmanSengupta/autovoiceevals GitHub stars - Uses adversarial callers plus keep-or-revert prompt edits to harden voice AI agents across Vapi, Smallest AI, and ElevenLabs.
  • chrisworsey55/atlas-gic GitHub stars - Applies the autoresearch keep-or-revert loop to trading agents, optimizing prompts and portfolio orchestration against rolling Sharpe ratio instead of model loss.
  • RightNow-AI/autokernel GitHub stars - Applies the autoresearch loop to GPU kernel optimization: profile bottlenecks, edit one kernel, benchmark, keep or revert, repeat.
  • Rkcr7/autoresearch-sudoku GitHub stars - Enhanced autoresearch workflow where an AI agent iteratively rewrites and benchmarks a Rust sudoku solver, ultimately beating leading human-built solvers on hard benchmark sets.

📊 Evaluation & benchmarks

  • snap-stanford/MLAgentBench GitHub stars - Benchmark suite for evaluating AI agents on ML experimentation tasks. 13 tasks from CIFAR-10 to BabyLM.
  • openai/mle-bench GitHub stars - OpenAI's benchmark for measuring how well AI agents perform at ML engineering.
  • chchenhui/mlrbench GitHub stars - MLR-Bench: Evaluating AI agents on open-ended ML research. 201 tasks from NeurIPS/ICLR/ICML workshops.
  • gersteinlab/ML-Bench GitHub stars - Evaluates LLMs and agents for ML tasks on repository-level code.
  • THUDM/AgentBench GitHub stars - Comprehensive benchmark for LLM-as-Agent evaluation across 8 distinct environments. ICLR 2024.

📈 Notable use cases and writeups

  • Shopify Liquid optimization - Tobi Lütke shared an autoresearch-style optimization run on Shopify's Liquid engine, with public traces showing major parse/render speedups and allocation reductions. (tweet, PR with traces)
  • Driveline baseball biomechanics - Public autoresearch-style experiment loop for pitch-velocity prediction from biomechanics data, with large reported gains in model quality. (tweet)
  • Tennis XGBoost prediction + reward hacking writeup - Nick Oak documents an autoresearch-inspired loop for tennis match prediction, including where the optimization setup went wrong. (blog · repo · gamed branch)
  • Vesuvius Challenge ink detection swarm - Multi-agent experimental loop applied to ancient-scroll ink detection, with a strong writeup on cross-scroll generalization improvements. (blog)
  • Earth system model optimization - Hybrid workflow where an LLM proposes equation structures and a search process tunes parameters, showing how the autoresearch pattern extends into scientific modeling. (tweet, blog)
  • The Agentic Researcher - Paper: "A Practical Guide to AI-Assisted Research in Mathematics and Machine Learning." Cites autoresearch as the canonical example of automated ML experiment pipelines. (arxiv 2603.15914)
  • Scaling Autoresearch to GPU Clusters - SkyPilot blog on running autoresearch on H100/H200 clusters with cloud orchestration. (SkyPilot Blog)
  • Self-Improving Coding Agents - Addy Osmani's practical guide to setting up self-improving agent loops with Claude Code. (article)
  • autoresearch@home: Distributed AI Research - SETI@home model applied to autoresearch — contribute GPU time to collective model optimization. (Ensue Blog)
  • Claude Code + AutoResearch for Self-Improving Skills - MindStudio guide to building self-improving AI skills using Claude Code with autoresearch patterns. (article)
  • 100 ML Experiments Overnight - Particula technical breakdown with domain-agnostic fork applications. (article)
  • PM's Guide to Autoresearch - Product manager's guide covering setup, community forks, and real-world applications. (article)
  • Autoresearch 101 Builder's Playbook - Substack deep-dive on applying autoresearch patterns to prompts, agents, and workflows with concrete examples. (article)
  • Kingy AI Technical Breakdown - Detailed technical walkthrough of the autoresearch loop architecture, mutation operators, and fitness function design. (article)
  • Fortune Feature - Business and industry context on why autoresearch matters for the future of autonomous AI agents. (article)

📚 Related resources

Curated lists and paper collections for AI agents, autonomous systems, and automated research:

Star History

Star History Chart

📄 License

This list is released under CC0-1.0.

关于 About

A curated list of autonomous improvement loops, research agents, and autoresearch-style systems inspired by Karpathy's autoresearch.
agentic-systemsai-agentsai-researchai-toolsautonomous-agentsautoresearchawesome-listclaude-codeexperiment-loopskarpathykarpathy-inspiredllm-agentsresearch-agentsscientific-discoveryself-improving-systems

语言 Languages

提交活跃度 Commit Activity

代码提交热力图
过去 52 周的开发活跃度
29
Total Commits
峰值: 29次/周
Less
More

核心贡献者 Contributors