Skills that agents can actually follow
You wrote a good SKILL.md. But did the agent actually follow it, or skip the
late safety rule, grab an undeclared tool, and report "done" with no proof?
SkillSpec tells you. Run one command and get a risk report. Then turn any skill into a contract the agent has to follow, with a record you can inspect at the end.
No new agent runtime. No orchestration platform. Just a CLI and a small
skill.spec.yml that lives next to your SKILL.md.
See It In 30 Seconds
Point Doctor at any skill, a local folder or a public GitHub URL:
skillspec doctor ./my-skillSkillSpec Doctor
================
Target: ./my-skill Shape: simple_skill
Agent follow-through risk: HIGH (74/100)
Findings
- description is short and generic -> automatic discovery may be unreliable
- active skill load is 8,482 tokens -> above the balanced target
- 14 must/never obligations appear after 60% of the body -> easy to miss
- tools and commands are used, but dependencies are never declared
- no tests and no progress/trace surface -> "done" can't be checked
Likely consequence
An agent may follow the broad task but skip a late safety gate, use an
undeclared tool, or claim completion without evidence.
Next step
Ask your agent: /skillspec import ./my-skill, compile it, test it, install it,
and print the alignment summary.No install required to try it. Paste a public skill URL into the hosted page:
Why This Exists
A SKILL.md is just text. The harness loads it and hopes the model reads the
right part. For a throwaway skill, that can be fine. For a skill you rely on,
"hope" is not a plan:
- Buried rules get skipped. The important "never do X" sits at line 400, and models are most reliable at the start and end of context, not the middle.
- Every miss grows the prose. Each failure becomes another paragraph, which makes the next miss more likely.
- You only see the final answer. There is no durable record of which route ran, which steps happened, or what was skipped.
SkillSpec moves the load-bearing parts out of prose and into a small structured contract:
- when to use the skill
- which route to take
- what is forbidden
- what dependencies must exist
- what checks must pass
- what proof should exist at the end
Install
Install the CLI:
curl -fsSL https://skillspec.sh/install.sh | sh
skillspec --versionOr with Cargo:
cargo install skillspec
skillspec --versionThen add the plugin to your harness.
Claude Code:
claude plugin marketplace add modiqo/skillspec --sparse .claude-plugin plugins/skillspec
claude plugin install skillspec@skillspec
claude plugin listCodex:
codex plugin marketplace add modiqo/skillspec --ref main --sparse .agents --sparse plugins/skillspec
codex plugin add skillspec@skillspecOther platforms, pinned releases, direct downloads, and local development
Prebuilt binaries are available on the releases page:
skillspec-macos.tar.gzskillspec-linux-x86_64.tar.gzskillspec-windows-x86_64.zip
Release artifacts include .sha256 checksums. The installer verifies the
checksum and writes to ~/.local/bin by default.
Pin a version or choose an install directory:
curl -fsSL https://skillspec.sh/install.sh \
| SKILLSPEC_VERSION=v0.1.0 SKILLSPEC_INSTALL_DIR="$HOME/.local/bin" shInstall unreleased main:
cargo install --git https://github.com/modiqo/skillspec --package skillspec --force
skillspec --versionInstall from a local checkout:
cargo install --path crates/skillspec-cli --force
skillspec --versionLocal development can also install the skill folder directly:
# Codex
skillspec install skill skills/skillspec --target codex --retire-existing
# Agents
skillspec install skill skills/skillspec --target agents --retire-existing
# Claude local project
skillspec install skill skills/skillspec --target claude-local --retire-existingFor day-to-day development, the repository includes a Justfile that keeps the
crate split and local harness install flow in one place:
# Show the local crate hierarchy and dependency direction.
just packages
# Build every workspace package.
just build-debug
just build-release
# Install this checkout as the active local CLI.
just install-debug
just install-release
# See which harness roots SkillSpec can install into.
just install-targets
# Install the repo's SkillSpec skill into one harness, or every detected harness.
just install-skill codex
just install-skill agents
just install-skill claude-local
just install-skill-all
# Debug build, debug CLI install, and all detected harness skill installs.
just dev-install-all
# Opt-in local proof: copy your authenticated rote binary and ~/.rote config
# into the lab, excluding workspaces, and prove one command uses `rote exec --`.
just harness-lab-live-durable-rote-exec
# Local preflight before pushing: locked CI checks, package lists, examples, and conformance.
just preflightjust preflight deliberately uses plain Cargo commands instead of an extra
preflight dependency. It runs formatting, locked workspace check/clippy/tests,
package file-list checks for every split crate, example validation/tests/deps,
and conformance fixture checks. PR CI uses package file-list checks instead of
cargo publish --dry-run because a same-version split crate graph cannot
publish-dry-run downstream crates until their sibling dependencies already exist
on crates.io; tagged releases publish the crates in dependency order.
Full install notes: docs/install
The Loop: Assess -> Port -> Prove
Once the plugin is installed, ask your agent for the outcome in chat. SkillSpec picks the commands and keeps the run aligned.
1. Assess a skill before you touch it.
/skillspec run doctor on ./my-skill
You get a baseline: discovery risk, context load, buried obligations, undeclared dependencies, missing proof, and the likely consequence for agent follow-through.
2. Port it into a contract.
/skillspec import ./my-skill, compile it for Codex, install it, and prove it
SkillSpec generates a skill.spec.yml next to your SKILL.md: routes, rules,
forbidden actions, dependencies, checks, tests, and proof expectations. It also
compiles a thin loader so the active prompt stays small.
3. Prove it ran the way it was supposed to.
Every run can leave an alignment summary you can read: selected route, completed steps, missing proof, forbidden-action status, token usage, and wall clock metrics when available. Not just "done" - a record.
Crowded skill library?
/skillspec install router
Router mode routes to the one skill that matters instead of making the harness expose too many skills at once.
What SkillSpec Is, And Is Not
Four things you can do with it:
- Import an existing prose
SKILL.mdinto a structured SkillSpec contract. - Run a SkillSpec-backed skill in your harness, then review the alignment and token report.
- Route many skills through an explicit router when harness listing budgets make discovery unreliable.
- Capture durable execution traces and turn observed CLI/API/MCP work into reusable skills. This path is powered by Rote.
| It is | It is not |
|---|---|
A contract that sits beside SKILL.md. | A replacement for skills. |
| A CLI that scores, ports, compiles, and records. | A new agent runtime or orchestration platform. |
| A way to make skills easier to compare across Codex, Claude, and Agents. | A promise that every harness will behave identically. |
| A run record you can audit after the task. | A security sandbox. |
That last row matters. SkillSpec makes a run auditable: you can see what was claimed and check it against the contract. Enforcement of tool boundaries is still the harness's job.
Public Doctor Reports
Want to check a public skill before installing or porting it? Use the hosted Doctor page:
You can also open a
Doctor report request
with a public GitHub skill repo or folder URL. GitHub Actions validates the
target, runs skillspec doctor, comments with a Markdown report, and attaches
Markdown, HTML, JSON, and text artifacts.
Private repositories are not inspected by public Actions. For private skills, install SkillSpec locally:
skillspec doctor /path/to/local/skill
skillspec doctor /path/to/local/skill --markdown > skillspec-doctor.md
skillspec doctor /path/to/local/skill --html > skillspec-doctor.htmlUse Doctor as the baseline. Then ask your harness to import the skill:
/skillspec import <skill-repo-or-folder>, compile it, verify it, test it, and prove it. Print the alignment summary.Publish the baseline report, generated skill.spec.yml, compiled loader, and
alignment report with the repo or pull request so reviewers can see both the
original skill risk and the proof after porting.
Why The Scores Are Credible
Doctor is not vibes. Every risk condition cites published work or local SkillSpec methodology on how agents fail: context-position effects, effective context limits, verifiable instruction following, process-level agent evaluation, and skill-metadata routing.
The report is explicit about what is measured versus what is a policy threshold. Start here:
The contract itself is a real spec: a typed Rust model, JSON Schema, reference grammar, and conformance suite.
Learn More
- How it works
- Command reference
- Plugin marketplace install
- Request a public Doctor report
- Contributing
License
SkillSpec is dual-licensed under either:
You may choose either license. Contributions are accepted under the same dual license unless explicitly stated otherwise.