English · 简体中文 · Docker →
# Deployment Guide (Host)
Get a ClaudeTeam crew running — **just follow the 5 steps below, top to bottom**.
Config, model-backend, and troubleshooting reference live further down. Deploying
on Docker / a server → see [Docker deploy](DEPLOYMENT_docker.md).
> **Driving this with a coding agent?** Don't let it free-run — the botched
> installs we see are all an agent *guessing* the roster instead of asking, then
> calling `health` green without ever looking inside the panes. Before it starts,
> point it at the **[coding-agent deploy protocol](#deploying-with-a-coding-agent)**
> further down: **ask the operator the intake questions first**, then **verify
> every agent's pane** before telling them the team is up.
---
## Before you begin
Install these (the bits `pip` can't):
- **Python 3.9+** — macOS's built-in `/usr/bin/python3` (3.9) is fine, nothing
extra to install. Debian/Ubuntu also needs `sudo apt install -y python3-venv`.
- **tmux** — one window per agent.
- **node + npx (18+)** — runs `lark-cli` (sending) + the Feishu sidecar (bot
registration + event ingress).
- **≥ 1 agent CLI** — `claude` alone is enough (the default team uses only it);
mixing in `codex` / `gemini` / `qwen` / … is **optional** (see the
[adapter table](../README.md#multi-cli-adapter)).
- A **Feishu / Lark account** — `--quick` scan-registers anywhere (you @ the bot in
groups); for "un-@'d in groups", drop `--quick` and let the browser automation build an
enterprise self-built app (needs a desktop browser).
> 💡 Agents **reuse your existing local login**: if `claude` is logged in on this
> machine, the claude agents use it directly — **no separate login**. Same for any
> other CLI — logged in locally is enough.
---
## Step 1 · Install
```bash
# Code + the claudeteam command (-e = editable install: always tracks your
# checkout, never stuck on a stale version)
git clone https://github.com/zylMozart/ClaudeTeam.git && cd ClaudeTeam
python3 -m venv .venv && source .venv/bin/activate # macOS's built-in 3.9 is fine
pip install -e .
# External tools pip can't install:
# macOS: brew install tmux node && npm i -g @larksuite/cli @anthropic-ai/claude-code
# Debian: sudo apt install -y tmux nodejs npm && npm i -g @larksuite/cli @anthropic-ai/claude-code
```
> Install only the agent CLIs you'll use. The default team is all `claude-code`,
> so `claude` alone runs it; add `codex` etc. only if you want them.
> The `.venv` above is just *one* way. conda / pipx / system-python all work too
> — the only requirement is that `claudeteam` **and** your agent CLIs resolve on
> `PATH` in the shell you run `up` from. So the PATH fixes below say "the env
> `claudeteam` is installed in," not literally `.venv`.
## Step 2 · Configure your team
> ▶ **Agent:** resolve the roster with the operator first (Rule 1, questions
> 1–2). Build `[team.agents.*]` from **only** the CLIs that are installed *and*
> logged in — don't leave the default `worker_codex`-style examples in unless
> `codex` passed both checks.
```bash
claudeteam init --no-connect # writes claudeteam.toml (default: manager + 1 claude worker)
$EDITOR claudeteam.toml # adjust agents to the CLIs you have / are logged into (below)
```
Open `claudeteam.toml`; `[team.agents.*]` is your roster. The default is two
`claude-code` agents — **install claude and it just works**. To add a worker on
another CLI (only if you've **installed + logged into** it), uncomment the example
init wrote and edit it:
```toml
[team.agents.worker_codex]
cli = "codex-cli" # add only if you have codex installed; otherwise leave it out
model = "gpt-5.5"
role = "Codex worker"
```
> Don't want to configure from scratch? [`templates/`](../templates/) has ready
> domain teams (software-dev / research / marketing / data / content) — copy one
> and tweak. `claudeteam reidentify --print` previews an agent's rendered
> identity before `up`.
## Step 3 · Connect Feishu (scan once → bot + group built)
> ▶ **Agent:** first check whether they already have `state/feishu_app.json` + a
> `chat_id` — if so, skip this step. Otherwise **this is where you pick the mode
> with the operator** (walk them through the `--quick` vs no-flag trade-offs
> below) — don't silently default. The QR scan itself is the operator's to do.
```bash
claudeteam feishu connect --quick # one scan, runs anywhere (you @ the bot in groups)
```
**`--quick` is the easy path**: scan one QR, zero console, **runs on any machine** (incl.
headless servers). It creates the bot app + team group (invites you) + creds + `chat_id`
(written back to `claudeteam.toml`). The one catch: a **PersonalAgent** app can't get
`im:message.group_msg`, so **in groups you @ the bot** to get a reply — DMs are unaffected,
and it's fine to start here.
**Want the bot to reply in groups _without_ an @?** Drop `--quick`:
```bash
claudeteam feishu connect # browser-builds an enterprise app that needs no @
```
With no flag it opens a **real (headed) browser** and drives the Feishu console to create +
scope + subscribe + **publish** an enterprise self-built app holding `im:message.group_msg`
— then the bot **replies to un-@'d group messages**. Scan the login QR **once**; the 7
console stages auto-run; on any console-UI change it **falls back to `--manual`**. Needs a
**desktop browser** (see the headless note below). There's also **`--manual`** — the
step-by-step guided console flow (paste App ID/Secret, click the one-click permission
deep-link, publish) — the robust fallback when the browser automation can't run.
> ⚠️ **Headless servers:** the no-flag browser automation needs a desktop browser, so it
> **can't run on a headless host** — either use `--quick` (you @ the bot in groups), or run
> `claudeteam feishu connect` on a **desktop machine** to build the app + group, then copy
> the saved creds (`state/feishu_app.json`) + `chat_id` into the server's config. (Headless +
> terminal-QR is planned.)
--manual guided console flow (if the browser automation can't run or you'd rather click it yourself)
`claudeteam feishu connect --manual` walks you through the console:
1. **Create the app** — open → 创建企业自建应用 → add
the **机器人 (bot)** capability → copy the **App ID + App Secret**, paste when prompted.
2. **One-click scopes** — click the deep-link it prints (all 7 scopes incl. the
sensitive `im:message.group_msg` pre-selected) → 确认.
3. **Event** — 事件与回调 → 订阅方式 = **使用长连接** → add the **接收消息** event.
4. **Publish** — 应用发布 → 创建版本 → 申请发布 → **批准** (tenant admins approve their own version instantly; personal-edition apps skip review).
5. Press **Enter** — the command verifies the scope, creates the group, saves creds → `state/feishu_app.json` (0600) + writes `chat_id`.
> **One command for Steps 2+3** (default team only): `claudeteam init --quick`
> writes the default config *and* scan-connects Feishu in one go; plain
> `claudeteam init` does the same but with the no-@ browser flow. Quick
> reference: `init --no-connect` = config only, connect later · `init --quick` =
> config + scan-connect · `init` = config + browser (no-@) connect.
>
> ▶ **Agent:** the one-shot `init --quick` / `init` forms **bake in the default
> roster _and_ the Feishu mode in a single command** — only reach for them
> *after* the operator has confirmed the **roster** (Rule 1) *and* picked the
> **Feishu mode** (Step 3). If either
> is still open, use the explicit Step 2 (`init --no-connect`) + Step 3 so each
> decision stays a real choice, not a default you smuggled past them.
## Step 4 · Launch
```bash
claudeteam install-hooks # install slash-command hooks (MUST run before up)
claudeteam up # start the tmux crew + router + watchdog
claudeteam health # infra self-check: binaries / env / tmux / router / watchdog
```
`up` starts a tmux session (one window per agent) + the router + the watchdog,
then kicks the manager to run a group roll-call. `health` should be green (one
`lark_profile blank` ⚠️ is tolerable). **But green `health` only means the
infrastructure is up — go to Step 5 before you trust the team.**
## Step 5 · Verify each agent's pane
Green `health` ≠ a working team: it never looks *inside* an agent. Each CLI still
has to have actually launched, authenticated, and swallowed its identity prompt —
so **look at every pane**. For **each** agent in your roster:
```bash
claudeteam team # one line per agent that reported — but a DEAD pane won't appear here,
# so don't stop at this; peek EVERY rostered agent (dead ones are
# exactly the ones you need to see):
for a in $(grep -oE '^\[team\.agents\.[^]]+\]' claudeteam.toml | sed -E 's/.*agents\.([^]]+)\]/\1/'); do
echo "===== $a ====="; claudeteam peek "$a" 40
done
```
Read each pane and match it against this table — **fix a ❌ before moving on**:
| What the pane shows | Verdict | Fix |
| --- | --- | --- |
| CLI REPL is up, the injected identity / roll-call was answered, no error banner | ✅ healthy | — |
| `claude: not found` / `codex: not found` | ❌ PATH | you `up`'d from a shell where `claude`/`claudeteam` don't resolve → re-`up` from the shell/env `claudeteam` is installed in (`.venv`, conda, pipx, …), then `claudeteam down && up` |
| "Not logged in" / a login or auth prompt | ❌ not authed | log that CLI in on this machine (`claude`, `codex login`, …); for claude, `down && up` re-materialises creds — see [Not logged in](#not-logged-in-in-a-claude-pane-macos-host) |
| "update available" / a version prompt (codex especially) | ❌ blocked | `tmux send-keys -t : 3 Enter` (Skip until next version) → `claudeteam reidentify ` |
| bare shell prompt or empty buffer, no CLI banner | ❌ didn't spawn | `peek` again after ~30 s; still bare → that CLI isn't really installed / on PATH |
| `invalid api key` / a base_url error (BYOK CLIs) | ❌ backend misconfig | check `OPENAI_BASE_URL` / `OPENAI_API_KEY` in the secrets `.env` — see [Model backend](#model-backend-per-agent-credentials--endpoint) |
**Then the live signal — the Feishu group.** On a fresh `up` the manager **posts
a roll-call** and each worker reports in; then `@manager 你好` → reply in ~30 s.
Optional in-group probes: `/team` (each agent's ♥ heartbeat < 30 s), `/health`
(per-agent + router + watchdog card).
Only once **every pane is ✅** *and* the group roll-call landed is the team
really up — that, not green `health`, is what you tell the operator.
**Tear down:** `claudeteam down` (stop, keep state) · `claudeteam reset` (also wipe state).
---
## Deploying with a coding agent
A coding agent (Claude Code / Codex / …) driving this deploy should follow
**three rules, in order** — they wrap the 5 steps above with the ask-first +
verify-panes discipline that keeps an unattended install from quietly going
wrong.
### Rule 1 · Ask before you act — the intake
Before running *anything*, ask the operator this short intake, then say the plan
back and get a nod. Never guess a default; a wrong guess here is exactly what
produces a dead team later.
1. **Which agent CLIs are installed _and logged in_ on this machine?**
Check *install* yourself — don't ask what you can test:
```bash
for c in claude codex gemini qwen kimi; do printf '%s: ' "$c"; command -v "$c" || echo "(not installed)"; done
```
For each that resolves, ask the operator to confirm it's **logged in** (a CLI
that isn't logged in comes up as a *dead pane* — this is the #1 cause of a
silent team). **The roster only contains CLIs that pass both checks.** Don't
put a `codex` worker in the team if `codex` isn't installed + logged in. You
*can't* fully verify "logged in" from outside the CLI, so this stays a real
question worth asking — it prunes the roster **before** you build panes you'd
have to tear down. And whatever the operator answers,
[Step 5](#step-5--verify-each-agents-pane) is the **final gate for every agent
— the `claude` manager included** (a not-logged-in CLI is a dead pane there):
treat each one as **unconfirmed until its pane is ✅**. When unsure, start with
fewer agents and add more once each is proven; never skip Step 5 just because
the operator said "they're all logged in."
2. **Roster shape** — the default 2-agent all-`claude` team, a domain template
from [`templates/`](../templates/) (software-dev / research / marketing /
data / content), or a custom roster? The **manager must be `claude-code`**
(it drives the roll-call) — if the operator picks a template, open its
`claudeteam.toml` and confirm the lead/manager agent's `cli` is `claude-code`
before `up`. Confirm the final agent list with the operator.
(The Feishu registration mode — `--quick` vs no-flag — is its own decision; you
make it *with* the operator at **Step 3**, where the trade-offs are laid out. No
need to front-load it into the intake.)
Then **state the plan back** — e.g. *"roster = manager + worker_cc, both
claude-code, on your existing local claude login"* — and only proceed on a yes.
### Rule 2 · Follow the steps; read code only as a last resort
Steps 1–5 are the *whole* procedure — run them as written rather than
reverse-engineering your own path from the source. When something fails, work it
in this order: (1) the [Common failures](#common-failures) table, (2) the
failing command's own output / logs, (3) escalate to the operator with that
output. **Reading the source is the *last* resort, not the first** — it isn't
off-limits, but reach for it only *after* you've actually run the steps and hit
a problem none of the above resolves; then read to diagnose that specific
failure, and still surface it to the operator rather than silently editing code.
### Rule 3 · Verify every pane before you declare success
`claudeteam health` going green proves the **infrastructure** is up — router,
watchdog, tmux — **not** that each agent's CLI actually booted and
authenticated. After `up`, walk [Step 5](#step-5--verify-each-agents-pane) and
eyeball **every** pane. Only tell the operator "the team is up" once every pane
is a healthy, identity-loaded REPL **and** the manager's group roll-call landed.
---
## Configuration: `claudeteam.toml`
Single TOML file (Cargo-style, comment-friendly) — `claudeteam init` writes it,
documented in-place. App creds are **not** here (they live in
`state/feishu_app.json`); only `chat_id` + the team layout.
```toml
chat_id = "oc_..." # Feishu group chat_id (written by `feishu connect`)
lark_profile = "" # lark-cli profile name; "" = default
default_model = "opus" # fallback when an agent doesn't pin one
[team]
session = "ClaudeTeam" # tmux session name
[team.agents.manager]
cli = "claude-code" # claude-code | codex-cli | gemini-cli | kimi-code | qwen-code
# | minimax | opencode | codewhale | openclaw | trae | hermes | pi
role = "团队主管" # rendered into identity.md
model = "opus"
specialty = ["调度", "审阅"] # optional — manager sees this in dispatch prompt
tone = "稳重克制" # optional — biases LLM tone
notes = "always answer in Chinese" # optional — free-form prompt addendum
playbook = "manager.md" # optional — a role-instruction .md (→ its CLAUDE.md/AGENTS.md)
card_color = "blue"
publish_overrides = { worker_to_user = false } # per-agent override of [chat.publish]
[chat.publish] # who-talks-to-whom group filter
user_to_manager = "always" # boss → manager (always lands)
manager_to_user = "always" # manager → boss (always lands)
manager_to_worker = true # show dispatch cards in group
worker_to_manager = true # show worker progress in group
worker_to_user = true # show worker completions in group
worker_to_worker = true # show inter-worker pings in group
```
Defaults are wide open (everything visible) — flip individual keys to `false`
once the team's noise level needs trimming. **Override precedence** (highest
wins): `env` > `claudeteam.toml` > code default (see `runtime/tunables.py`).
**Team templates** — instead of hand-writing the roster, start from a domain
template in [`templates/`](../templates/) (software-dev, automated-research,
marketing-growth, data-analysis, content-ops): a ready `claudeteam.toml` plus a
per-role **playbook** `.md` per agent. An agent's `playbook` file becomes the bulk
of its identity — its native `CLAUDE.md` / `AGENTS.md` — layered on top of the team
protocol, so each shows up knowing its job, not just a one-line title. Copy a
folder's contents next to your `claudeteam.toml` (the `playbook` paths resolve
relative to it) and adapt. Write your own for any domain — it's just a `.md`.
Preview what an agent will get with `claudeteam reidentify --print` — it
renders that agent's identity (role + playbook + team protocol) to stdout, no live
team needed, so you can check a config or playbook edit before `up`.
---
## Agent CLIs
Each agent runs a coding CLI — install the ones you'll use (ClaudeTeam just needs it on PATH).
The default team is all `claude-code`, so `claude` alone runs it.
| Adapter | `cli` | Install |
| ------- | ----- | ------- |
| Claude Code | `claude-code` | `npm i -g @anthropic-ai/claude-code` |
| Codex CLI | `codex-cli` | `npm i -g @openai/codex` |
| Kimi Code | `kimi-code` | `uv tool install kimi-cli` |
| Gemini CLI | `gemini-cli` | `npm i -g @google/gemini-cli` |
| Qwen Code | `qwen-code` | `npm i -g qwen-code` |
| MiniMax Mini-Agent | `minimax` | `uv tool install "git+https://github.com/MiniMax-AI/Mini-Agent.git"` |
| opencode | `opencode` | `npm i -g opencode-ai` |
| CodeWhale | `codewhale` | `npm i -g codewhale` |
| OpenClaw | `openclaw` | `npm i -g openclaw` · needs Node ≥ 22 |
| Trae | `trae` | `uv tool install --with docker --with pexpect "git+https://github.com/bytedance/trae-agent.git"` |
| Hermes | `hermes` | `curl -fsSL https://hermes-agent.nousresearch.com/install.sh \| bash -s -- --skip-setup` |
| Pi | `pi` | `npm i -g @mariozechner/pi-coding-agent` |
The last seven are **OpenAI-compatible** (BYOK) — credentials + endpoint below.
---
## Model backend per agent (credentials + endpoint)
**A first boot needs none of this** — the 2 default agents run on your Claude
Code OAuth (reusing your local login). Come here only when you swap an agent onto
a non-Anthropic backend.
The adapters are **provider-agnostic** — nothing about DeepSeek/OpenAI/etc. is
baked in. You choose the backend through env + config:
- **Credential** — resolved by `runtime/agent_auth`, priority **token > login >
api_key** (higher present overrides lower). Secrets live in a gitignored env
file (`$CLAUDETEAM_SECRETS_FILE`, default `/.env`) or the process
env — never in `claudeteam.toml`. Per-agent override: `_` (e.g.
`WORKER_PI_OPENAI_API_KEY`).
- **claude-code / codex / kimi** — their own token/login/api_key vars.
- **all other CLIs** (minimax, opencode, codewhale, openclaw, trae, hermes, pi)
— the **api_key** tier: set `OPENAI_API_KEY`.
- **Endpoint** — `OPENAI_BASE_URL` (e.g. `https://api.openai.com/v1` or a
self-hosted vLLM/Ollama URL — any OpenAI-compatible API). **Model** — the
`model` field in each `[team.agents.]`.
- **Provider name** (only where a CLI needs one selecting an OpenAI-compatible
*chat/completions* client): `CLAUDETEAM_TRAE_PROVIDER` (default `openrouter`),
`CLAUDETEAM_PI_PROVIDER` / `CLAUDETEAM_CODEWHALE_PROVIDER` (default `openai`).
- A **claude-code manager on a non-Anthropic backend** uses the
Anthropic-compatible vars: `ANTHROPIC_BASE_URL` + `ANTHROPIC_AUTH_TOKEN`
(+ `ANTHROPIC_MODEL` / `ANTHROPIC_DEFAULT_*_MODEL`).
Example — point the OpenAI-compatible workers (and, optionally, a claude-code
manager on a non-Anthropic backend) at **any** provider, via `docker -e` or the
host shell. Swap the host for whatever you use (a hosted API or a local server):
```bash
OPENAI_BASE_URL=https://your-provider.example/v1 # the provider's base URL
OPENAI_API_KEY=sk-... # your key for it
# a claude-code manager on a non-Anthropic backend uses the Anthropic-compatible vars:
ANTHROPIC_BASE_URL=https://your-provider.example/anthropic
ANTHROPIC_AUTH_TOKEN=sk-...
```
The per-CLI `CLAUDETEAM__PROVIDER` vars (see above) pick the
chat/completions client — leave them at their defaults unless your provider needs
a specific one. See each CLI's `tests/scenarios/.md` for concrete, per-provider specifics.
---
## Agents talking to each other: `send` vs `say`
| Command | What it does | Reaches the worker's pane? |
| --- | --- | --- |
| `claudeteam send ` | Inbox row **+** tmux pane inject | **Yes** — wakes the recipient directly |
| `claudeteam say "" --to ` | Post into Feishu chat (subject to `[chat.publish]`) | Only if the router relays it back |
| Feishu group → router → `deliver.apply` | Inbound chat → inbox row + pane inject | **Yes** — wakes a worker on boss/manager input |
**Always pass `--to`** on `say`: `--to user` = answering the boss; `--to manager`
= internal progress; `--to worker_` = peer ping. Omitting it falls back to
`user` and defeats the publish filter.
---
## Multi-team isolation
State lives in a `state/` dir **beside each team's `claudeteam.toml`** — the
config's location *is* the team's identity. Running a second team needs no
special env; just keep each team in its own directory:
```bash
cd /path/to/team-a && claudeteam up
# different shell:
cd /path/to/team-b && claudeteam up
```
Each team keeps its own `team-a/state/` and `team-b/state/`, so their agents,
status, and inboxes never bleed together. Override the location with
`CLAUDETEAM_STATE_DIR` if you want state elsewhere (e.g. a Docker volume).
Each team still needs its **own Feishu app** (independent app_id/secret) —
sharing one across teams causes credential leakage + event-routing conflicts.
---
## Commands
**Operator CLI** — `claudeteam --help` lists everything grouped by section (it's
self-maintaining; trust it over any table here). The everyday ones: `up` / `down`
/ `health` / `team` / `peek ` / `usage` / `reidentify` / `remember` /
`recall` / `switch`.
**Chat-side slash commands** (after `install-hooks`, recognised in the manager
pane; the boss can also send them — they zero-LLM dispatch through the router):
| Slash | What it does |
| --- | --- |
| `/help` | List all slash commands (card) |
| `/team` | All agents' live pane state |
| `/health` | Server CPU / memory / disk card |
| `/usage` | Token/credit usage (ccusage / codex / kimi) |
| `/tmux [agent] [N]` | Capture last N lines of a pane |
| `/send ` | Inject a message into a pane |
| `/compact [agent]` | Compact the CLI's context + scheduled re-identify |
| `/stop [agent]` | Interrupt the agent (Esc; pane stays alive) |
| `/clear ` | `/clear` the CLI + re-inject identity |
| `/task [all]` | Read-only task kanban |
| `/shutdown [confirm]` | Panes offline, keep router/watchdog for `/restart` |
| `/restart` | Restart the whole team (≈ down→up) |
| `/login [agent]` | Trigger a CLI re-auth; surfaces the verification URL/code |
---
## Common failures
### `claudeteam feishu connect` hangs / says "cancelled"
A non-interactive terminal (piped / non-TTY) or a Ctrl-C gives "cancelled (no
input / non-interactive terminal)" — re-run it in an **interactive** terminal.
The no-flag (browser-automated) mode needs a **desktop browser**, so it can't run
on a headless server — there, run it on a desktop machine and copy
`state/feishu_app.json` + `chat_id` over (or use `--quick` / `--manual`, which need
no browser). If the console UI changed under it, it falls back to `--manual`; you
can also force `--manual` to click through it yourself. `--quick` prints its QR
before waiting for your scan.
### Group messages get no response after `up` / router keeps restarting
Usually the **sidecar's WebSocket (long connection) never came up** — the router
spawns the sidecar to receive events; if it can't connect it errors out and exits,
the router exits with it, the watchdog respawns it, and round it goes. The router
log shows `⚠️ subscribe child exited` followed by a **`↳ sidecar 最后输出` + `↳ 诊断`**
block; the two fixes it points to:
1. **The app has no long-connection subscription** → Feishu console → Events &
callbacks → subscription mode → switch to "Receive events via long connection"
(NOT Webhook URL), save. (`--quick` usually sets this up; check it on a
hand-built app.)
2. **An HTTPS_PROXY is blocking the WebSocket** → `export LARK_CLI_NO_PROXY=1`
before launch, or set it in `$CLAUDETEAM_SECRETS_FILE` (default `/.env`)
/ your shell profile.
Ingress works the moment the sidecar connects; if it IS connected but the group is
still silent, check the manager's `claude` login (entries below).
### `claude: not found` / `codex: not found` in a pane
Panes inherit the launching shell's `$PATH`. If you opened a fresh terminal and
forgot to activate the env `claudeteam` is installed in (a `.venv`, or your
conda / pipx / system-python env), the pane can't resolve the CLIs. Re-`up` from
a shell where **both** `command -v claudeteam` and `command -v claude` (and any
other agent CLI in your roster) resolve.
### "Not logged in" in a claude pane (macOS host)
Each pane has its own `~/.claude/.credentials.json` snapshot (seeded from your
local login, per-agent home isolation), which can go stale vs the keychain. Fix:
`claudeteam down && up` re-materialises it.
### `router.log` shows "no live events … rotating subscribe" every ~120 s
**Usually NORMAL, not a fault — especially on macOS.** On an idle chat the
WebSocket goes quiet; the router self-SIGTERMs (`_watch_subscribe_health`),
watchdog respawns it, and catchup refetches anything missed from Feishu's REST
API — the recovery loop *is* the design. The platform-aware idle threshold is
Darwin 120 s / Linux 600 s (override `router.stale_event_threshold_s` in the toml
or `CLAUDETEAM_ROUTER_STALE_S`). Two shapes:
- `ℹ️ no live events for Ns — rotating subscribe (none inbound yet …)` — idle, expected.
- `⚠️ live events stopped after Ns idle …` — events WERE flowing and stopped (notable, esp. on Linux).
The log never prints "I received your message" — trust `claudeteam health`'s
`inbound:` line + one real group message instead. If the `⚠️` is *constant*,
look for a second sidecar stealing events:
`ps -ef | grep -E "feishu_channel/sidecar\.js run" | grep -v grep`.
### Manager loops on the same anchored message after `up`
Catchup replays everything newer than the cursor (with a `state/router.seen`
dedup set, auto-trimmed at 5000). Still duplicating? Delete `state/router.seen`
and bump `state/router.cursor` forward to "now" so the next catchup skips older.
### `say` from a pane fails HTTP 400 "Bot/User can NOT be out of the chat"
`say` from your launching shell works, but the same call from inside a pane
fails. Cause: a pre-existing tmux **server** (from an earlier `up`, different
checkout) holds its original global env, and `tmux new-session` inherits *that*,
not your shell's. The lifecycle prefix now embeds the creds per spawn-cmd, so a
clean state shouldn't trigger it. If it still does:
```bash
tmux ls 2>/dev/null
ps -ef | grep -E "claudeteam (router|watchdog)|feishu_channel/sidecar\.js" | grep -v grep
claudeteam down
tmux kill-session -t ClaudeTeam # or `tmux kill-server` if no other tmux work
claudeteam up
```
### `say` / sidecar can't find App credentials
Outbound cards fail, or the sidecar exits complaining it has no app id/secret.
Creds resolve from one source: `state/feishu_app.json` (written by
`feishu connect`, 0600), which `feishu/lark.py:subprocess_env()` reads to inject
`FEISHU_APP_ID`/`SECRET` + a tenant token into both the sidecar (ingress) and
lark-cli (egress). `ls -l state/feishu_app.json` (expect `-rw-------`); if
missing, re-run `claudeteam feishu connect`.
### `worker_codex` (or any codex agent) shows "pane up but CLI not ready yet"
Codex sometimes opens with an "update available" prompt blocking the ready marker:
```bash
tmux send-keys -t ClaudeTeam:worker_codex 3 Enter # "Skip until next version"
claudeteam reidentify worker_codex
```
---
## Where things live
```
src/claudeteam/
├── cli.py single console-scripts entry; dispatch only
├── commands/ one module per subcommand (~30-300 LOC each)
├── store/ local file-backed state (inbox, status, logs, tasks, memory)
├── agents/ CliAdapter base + per-CLI adapters + identity renderer
├── runtime/ config / paths / tmux / watchdog / pidlock / wake / lifecycle / tunables
└── feishu/ lark-cli wrapper + chat + router + slash + deliver + subscribe + catchup
scripts/feishu_channel/ the @larksuite/channel sidecar (registration + ingress)
tests/ unit/ + integration/ (stdlib runner) + scenarios/ (operator playbooks)
```
`CLAUDE.md` (project root) holds the building rules + active work order — read it
before changing code.
---
## Stuck? Found a bug?
Under active development — we **respond within 12 hours**.
- 🐛 **GitHub issue** — [open one](https://github.com/zylMozart/ClaudeTeam/issues/new/choose).
Include OS, deploy mode (host vs Docker), and the failing command's output (for
`feishu connect` issues, the sidecar's stderr).
- 💬 **WeChat community group** — scan the QR in the [README](../README.md#need-help--found-a-bug).
If you're an AI agent driving a deploy and a step fails after a real recovery
attempt, surface this section to the user — there's a real maintainer reachable.