Star 历史趋势
数据来源: GitHub API · 生成自 Stargazers.cn
README.md

swarma

growth loops, automated.

swarma.dev


swarma — growth loops, automated

Version Python License PyPI

an agent without a team is just a prompt with no context. nobody hires a writer with no research team, no analytics, no strategy, no feedback loop.

swarma builds agent teams that run growth experiments, score results, issue verdicts, and evolve their own strategy. after enough cycles, you have a validated playbook of what actually works. self-hosted. runs on a laptop or a VPS.

one folder = one team. one cycle = one experiment. the playbook writes itself.

swarma is right for you if

  • you run growth experiments (hooks, landing pages, outreach, pricing, activation) and want to run 50x more of them
  • you want agent teams that learn and improve through A/B testing, not just execute once
  • you want a validated playbook of what works for your specific audience, not generic advice
  • you care about self-hosted, no vendor lock-in, and owning your data
  • you want swarms that get smarter every cycle, not pipelines that run once and forget

quickstart

pip install swarma swarma init swarma cycle starter --topic "why do startups fail?"

three commands. python 3.11+ and an OpenRouter API key. no GPU, no postgres, no docker.

or from source:

git clone https://github.com/glitch-rabin/swarma.git cd swarma && pip install -e .

MCP integration (Claude Code / Claude Desktop)

after installing, add to your MCP config (.mcp.json):

{ "mcpServers": { "swarma": { "command": "swarma", "args": ["serve", "--mcp"], "env": { "OPENROUTER_API_KEY": "sk-or-..." } } } }

the GROWS loop

every cycle follows GROWS -- five steps, no exceptions:

  Generate       Run         Observe       Weigh        Stack
 hypothesis --> experiment --> signal --> verdict --> playbook
     ^                                                  |
     └──────────────────────────────────────────────────┘
StepWhat happens
G -- GenerateAgent reads its strategy.md, proposes a hypothesis. "contrarian opening lines will outperform data-led hooks on saves."
R -- RunAgent executes with the hypothesis active. Produces output -- research, copy, analysis, whatever the team does.
O -- ObserveA separate LLM scores the output against the agent's target metric (1-10 scale, forced decimals). Score + reasoning logged to results.tsv.
W -- WeighAfter min_sample_size cycles (default 5), swarma compares experiment average against baseline. >20% up = keep. >20% down = discard. In between = inconclusive.
S -- StackValidated patterns get written back to strategy.md and pushed to the shared playbook. Next cycle generates a new hypothesis from the evolved strategy.

this is the same loop every growth team at Uber, Spotify, and Airbnb runs. swarma removes the human bandwidth bottleneck. a swarm runs 50 experiments while a human team runs 2.

without swarma vs with swarma

WithoutWith swarma
Experiments per week2-5 (limited by human bandwidth)50+ (limited by API budget)
Knowledge sharingLives in Slack threads, dies thereEvery result logged to results.tsv, strategies evolve per team
Strategy evolutionManual review meetings, quarterlyAutomatic after every experiment cycle
Learning from failure"We tried that, it didn't work" (no record)Logged, analyzed, anti-patterns tracked
Cross-team learningTeam A doesn't know what Team B learnedOptional shared playbook via QMD (cross-team knowledge)
Starting pointBlank page or generic templatesPre-seeded strategies from real growth knowledge

generate a team from a goal

don't configure agents by hand. describe what you want and swarma builds the team.

swarma team create my-growth --from-goal "improve landing page conversion rate"

this calls an LLM to design a team with the right agents, flow, models, and a first experiment hypothesis. the generated team comes with:

  • team.yaml -- goal, flow, budget
  • agents/*.yaml -- individual agent configs with specific instructions
  • program.md -- experiment patterns and constraints
  • a first experiment ready to run

add business context for better results:

swarma team create growth-lab \ --from-goal "optimize our signup funnel" \ --context "B2B SaaS, 500 free users, 2% conversion to paid" \ --budget 50

teams as config

a team is a folder. no code required.

teams/my-squad/
├── team.yaml          # goal, flow, schedule, budget
├── program.md         # team context and constraints
└── agents/
    ├── researcher.yaml
    ├── writer.yaml
    └── strategy.md    # pre-seeded growth knowledge (evolves automatically)
# team.yaml name: my-squad goal: find what works. flow: "researcher -> writer" schedule: "0 8 * * 1-5"
# agents/writer.yaml id: writer name: Writer instructions: | turn research into a post. max 200 words. hook in the first line. practitioner voice. metric: name: content_quality target: 8.0 experiment_config: min_sample_size: 5 auto_propose: true

models are configured per-agent in their YAML files. swarma init creates a config.yaml at ~/.swarma/instances/default/ with model routing defaults.

flow DSL supports sequential (a -> b), parallel (a -> [b, c, d]), and mixed pipelines.

how the loop works (detailed)

  1. agent reads its strategy.md before every run
  2. produces output (research, copy, analysis -- whatever the team does)
  3. a cheap LLM scores the output against the agent's metric (1-10 scale, forced decimals)
  4. score + reasoning logged to results.tsv
  5. after min_sample_size cycles (default 5), verdict is issued automatically
  6. strategy.md updated with what was learned
  7. next cycle uses the evolved strategy

scoring: each output gets evaluated by a separate LLM call using the cheapest model in your routing table. the evaluator sees the output, the current strategy, the last 5 scores, and the metric definition. returns a precise score (7.3, not 7) plus reasoning and a strategy suggestion.

verdicts: after enough samples, swarma compares the experiment average against baseline. >20% improvement = keep (pattern validated, strategy updated). >20% decline = discard (reverted). in between = inconclusive (logged, try again with more data).

after a few experiments, your strategy.md evolves from seed knowledge into validated patterns:

## Validated (Exp 5) contrarian opening + specific numbers in first line > 23% improvement over baseline. keep this pattern. ## Inconclusive (Exp 2) story-led hooks vs data-led hooks -- no significant difference (avg=8.1 vs baseline=7.9) > next: increase sample size, results may be noise

feed real metrics back

LLM self-eval is a starting proxy. for production, feed back real-world signals:

# log a single metric swarma metric log hook-lab copywriter 4.2 --metric ctr_pct # attach to a specific experiment swarma metric log hook-lab copywriter 127 --metric impressions --exp 3 # bulk import from CSV swarma metric import hook-lab metrics.csv # view logged metrics swarma metric show hook-lab

CSV format: agent,value,metric_name,note

metrics auto-attach to the agent's active experiment. the experiment loop uses external metrics alongside LLM scores when both are available.

pre-seeded strategies

agents don't start from zero. each squad includes a strategy.md seeded with real growth knowledge -- validated patterns, anti-patterns, and hypotheses to test. the GROWS loop refines these over time.

example from the hook-lab strategy:

### Validated Patterns **Specificity wins** - Hooks with specific numbers outperform vague claims by 2-3x on saves - "47% of startups" > "most startups" **Loss framing > gain framing** - "You're losing $X/month" outperforms "You could save $X/month" on CTR - Exception: aspirational audiences respond better to gain framing ### Patterns to Test - [ ] First-person confession vs third-person case study - [ ] Time-anchored ("In 2024...") vs timeless hooks

18 squad templates

ready-to-use squads in examples/, each with pre-seeded strategies:

cp -r examples/hook-lab ~/.swarma/instances/default/teams/ swarma cycle hook-lab

core growth squads:

squadAARRR stagewhat it testsagents
hook-labacquisitionopening lines -- what stops the scroll3
landing-labacquisitionlanding page copy and structure3
seo-engineacquisitionsearch presence and content ranking3
cold-outboundacquisitionoutreach messaging and sequences3
channel-mixacquisitionmulti-platform content strategy3
activation-flowactivationsignup-to-value onboarding flows3
pricing-labrevenuepricing presentation and packaging3
retention-squadretentionchurn signals and win-back patterns3
referral-enginereferralviral loops and invite mechanics3
competitive-intel--market monitoring and positioning3

2026 growth ops squads:

squadwhat it testsagents
faceless-factoryautomated short-form video pipeline3
ad-creative-labperformance creative testing at scale3
ugc-factoryuser-generated content simulation3
programmatic-seoautomated SEO content at scale3
newsletter-enginenewsletter growth + retention3
acquisition-squadpaid + organic acquisition loops3
community-enginecommunity-led growth automation3
agentic-storefrontAI-powered commerce optimization3

each includes a program.md with experiment patterns and a strategy.md with real growth knowledge as starting baseline.

what swarma is not

swarma is not...Use this insteadThe difference
memoryhonchoswarma doesn't remember conversations. it runs experiment loops and builds playbooks.
workflow automationn8n, Make, Zapierthose connect apps. swarma runs hypotheses through agent teams and learns from results.
a prompt libraryagency-agents (135 templates)templates are a starting point. swarma teaches agents what works through feedback loops. templates go in, playbooks come out.
agent orchestrationCrewAI, AutoGen, LangGraphthose run pipelines. swarma adds the GROWS loop that makes pipelines improve. orchestration is step 1. learning is the whole game.
a hosted service--swarma is self-hosted. your data stays on your machine. no accounts, no telemetry, no vendor lock-in.

works with

swarma is the engine. connect it to an operator for the full experience.

Hermes

Hermes by NousResearch is the operator layer swarma is designed for. Hermes has terminal access -- it can run swarma cycle, swarma team create, and read results directly. no MCP required.

swarma is the machine. Hermes is the executive. together they're the system from the original article.

QMD cross-team knowledge

by default, each team learns individually via its own strategy.md. to learn across teams, wire in QMD -- a local knowledge engine by Tobi Lutke. BM25 + vector + rerank. runs on your machine, no GPU.

# config.yaml knowledge: engine: qmd qmd_endpoint: http://localhost:8181/mcp

with QMD: team A discovers loss framing beats gain framing, team B sees that pattern in its next cycle. without QMD: teams still learn individually. most users don't need this until running 3+ teams.

REST API

swarma serve --port 8282 # 30+ endpoints, OpenAPI docs at /docs

FAQ

do I need Hermes to use swarma?

no. swarma works standalone via CLI. Hermes adds the operator layer -- Telegram/Slack control, approval flows, scheduled cycles. it's how you get the full experience, but swarma runs experiments on its own.

do I need QMD?

no. QMD adds cross-team knowledge sharing. without it, each team learns individually via its own strategy.md. add QMD when you're running multiple teams and want them to learn from each other.

what models does it use?

any model on OpenRouter. swarma uses a routing table that maps task types (generation, evaluation, routing) to models. you configure this in the instance config.yaml created by swarma init, and can override per-agent.

is this like CrewAI / AutoGen / LangGraph?

those are orchestration frameworks -- they run agent pipelines. swarma adds the experiment loop on top. orchestration is "agent A passes output to agent B." swarma is "agent A passes output to agent B, the output gets scored, and the strategy evolves based on what works." the GROWS loop is the difference.

can I use this for things other than growth marketing?

the engine doesn't care what the team does. it cares about agents, metrics, and experiments. if you can define a north star metric, swarma can run experiments toward it.

is my data sent anywhere?

your prompts and outputs go to OpenRouter (which routes to the model provider). everything else -- strategies, experiment logs, playbooks, knowledge -- stays on your machine. no telemetry, no analytics, no phone-home.

standing on

swarma exists because of three ideas from three people:

  • autoresearch by Andrej Karpathy -- automated research loops where agents iteratively refine their approach. swarma applies this to growth experiments instead of research papers.
  • QMD by Tobi Lutke (@tobi) -- shared knowledge across agents. the insight that agents working on different problems should learn from each other.
  • Hermes by NousResearch (@NousResearch) -- the operator layer. an agent that directs teams of specialist agents through tool use. swarma is designed as infrastructure that Hermes operates.

roadmap

shipped:

  • GROWS experiment loop (generate -> run -> observe -> weigh -> stack)
  • team generator (swarma team create --from-goal)
  • 18 squad templates covering full AARRR funnel + 2026 growth ops
  • pre-seeded strategy files with real growth knowledge
  • external metric ingestion (swarma metric log/import/show)
  • QMD cross-team wiring (verdict -> playbook -> shared knowledge)
  • MCP server (16 tools) for Hermes / Claude Code / any client
  • pip install swarma on PyPI (v0.2.0)

next:

  • dashboard UI (experiment viewer, playbook, strategy evolution)
  • setup wizard (guided onboarding for first-time users)
  • expert reasoning lenses (bundled thinking frameworks)
  • Hermes skill package
  • squad marketplace (share and discover validated teams + playbooks)
  • webhook metric ingestion (real-time analytics callbacks)
  • observation-mode experiments (real-world signal primary, LLM eval secondary)

contributing

swarma is early and moving fast. see CONTRIBUTING.md for guidelines.

short version: open an issue first for anything non-trivial. PRs welcome for bug fixes, new squad templates, and documentation. if you're building a squad that works, we want it in examples/.

star history

Star History Chart

license

MIT

关于 About

experiment loop for ai agents and swarms

语言 Languages

Python100.0%

提交活跃度 Commit Activity

代码提交热力图
过去 52 周的开发活跃度
21
Total Commits
峰值: 12次/周
Less
More

核心贡献者 Contributors