R1 the agent that proves its work · open source · self-hostable

Agents that ship. Then prove they shipped.

R1 is an open-source agent framework that plans, runs, checks, and records software work before it merges. It can conduct Claude Code and Codex as sub-agents, but the harness keeps the proof trail and the gate checks.

For solo builders, platform teams, and regulated industries that need proof, not just output.

Ship end to end from plan to commit. Catch regressions before merge. Replay any run with the paper trail intact. Use your models without buying another runtime. Self-host cleanly when data must stay put. Review less because the harness flags risk first.
Demos 9 hands-on demos you can run right now.
Open source Apache 2.0 core you can audit, fork, and self-host.
Model families R1 default, Claude Code, Codex, and custom model wiring.
Receipts Every action signed, replayable, and ready for review.
r1 · run
live
thought
skill
memory
gate
what R1 is·02

Three things you actually get.

Not a chat window. Not a black box. Three concrete capabilities shipped in one open-source runtime.

Agents that finish what they start
R1 plans, executes, verifies, and commits, and it does not merge a change that failed its own checks.
full lifecycleRead the spec →
Roll back any step. Replay any run.
Every action is signed and recorded, so you can replay a run, inspect the proof chain, or rewind a step without guessing.
receipt-firstHow it works →
Bring your model. Bring your infrastructure.
Run on your laptop, on Heroa, or on your own infrastructure. Use the model accounts your team already trusts.
self-host readyWhere it runs →
who picks R1·03

Three kinds of teams pick R1 for three different reasons.

You are the whole engineering team. You want an agent that can pick up a feature, ship it end to end, and not require a 30-minute review of every line. R1 gives you the harness so you only have to review what the harness flagged.

laptop-friendlyhomebrew installone binaryno extra infrastructure
For solo builders →
how R1 works·05

PLAN. EXECUTE. VERIFY. COMMIT.

Four steps, one harness. Skip any of them and the work does not merge.

Plan
R1 reads the task, surveys the codebase, and writes an explicit plan you can approve or amend before execution starts.
human-readablereversibleapproval first
Execute
R1 runs the plan, chooses the right agent for each step, and records every tool call, file change, and retry path.
parallel workfull receipt trailyour models
Verify
Tests run, reviews run, and gate checks run. If anything fails, R1 surfaces it instead of merging it quietly.
cross-family reviewfails loudpolicy-gated
Commit
Only after the gates pass. The merge is serialized, the receipt is written, and the full run stays replayable from that change.
atomic mergesigned receiptrollbackable
a Tuesday with R1·06

Here is what shipping with R1 actually looks like.

Not a glossy promise. The cadence of one engineer’s day with an agent that does not ship work without a paper trail.

06:42
🌙
Overnight run · scheduled
Pulled the failing CI builds. Drafted three fix plans. Marked one as needs human.
08:11
You · reviewing
Glanced at the three plans on the way in. Approved two and asked R1 to dig deeper on the auth one.
08:14
⚙️
Plan A · executing
Refactored the rate-limit middleware. Added four tests. Verification started.
08:21
🔁
Plan B · executing
Migrated the user preference column and wrote the rollback script before touching main.
08:34
Plan A · gate failed
Cross-family review flagged a regression in refresh-token handling. R1 rolled back and logged the failure mode.
09:02
Plan B · committed
All gates passed. PR #2417 opened with the full receipt chain attached.
10:30
👀
You · reviewing
Skimmed the receipt, approved, and merged without redoing the whole task by hand.
11:15
🔍
Plan A v2 · investigating
Re-ran with the regression in mind and found a header rewrite masking the refresh token.
13:48
Plan A v2 · committed
All gates passed. The second PR landed cleanly.
16:30
📋
Daily summary · drafted
Eleven actions today. Nine shipped. Two escalated. Four hours of review time handed back.
19:11
🌙
Tomorrow’s queue · prepared
Three issues triaged. One plan drafted. Two marked needs human.
§ demo 01 / 11
what's different·live

The difference between agentic and driven.

On the left, the model decides when it is done. On the right, R1 checks. Watch the same task ship two different outcomes.

task Add rate-limit middleware to the /auth endpoint. 100 req/min per IP. Ship it.
Model-driven agent
the model decides when it's done
idle
final state
bugs shipped
caught by
incident · INC-4821 severity: high
serviceauth-gateway
users affected3
detected byproduction traffic
actionrollback + human triage
the agent said "done". the users found the bug.
R1 harness-driven
the harness decides when it's done
idle
final state
bugs shipped
caught by
STOKE attestation · sealed sha256:a1f7·c902·8b4e
taskauth/ratelimit.go
phasecommitted
attempts2 / 3
gatestests ✓ · race ✓ · lint ✓ · review ✓
reviewerclaude-opus (cross-family)
evidence14 test receipts · 1 race trace
receipt sealed · verifiable · replayable
the asymmetry: model-driven agents are confident and frequently wrong. Harnessed agents are slower in seconds and faster in days because nothing has to be unshipped.
§ demo 02 / 11
The substrate·02 / 11

Every thought. Every memory. Every skill use. Tracked.

R1's substrate is a content-addressed graph. This is one session rendered as a 3D grid. Phases on one axis, memory tiers on another, event types on the third. Scrub through the session. Click a node. See exactly what the agent was thinking, what it read, what skills it invoked, what tools it called.

time · tier · type · click a node for detail · drag to rotate
loading 3D scene…
Click any node to inspect.
Each node is one event in the STOKE ledger.
event type
memory tier
L0Identity · always loaded
L1Critical Facts · always loaded
L2Topical Recall · on demand
L3Deep Semantic · on demand
One session. "Implement RFC 5322 email parser with tests, merge atomically." Every thought, every skill injection, every memory touch, every tool call, every ledger write is a node. The STOKE protocol tracks the whole graph.
§ demo 03 / 11
How R1 decides·03 / 11

The state machine refuses to skip.

Seven gates from intake to commit. Plan, execute, verify, review, remember, skill, commit. Click a state — see what R1 requires, what's traced, what's written to memory. Try to find a way to skip. You can't.

CanCommit is a function — it checks gates, attempts, failures, memory-write, and skill-distill simultaneously. All must pass. Every pass leaves R1 smarter for the next task.
§ demo 04 / 11
works with what you already use·live

Your subscriptions, your agents. R1 conducts.

Got Claude Code. Got Codex. Got both. R1 uses each as a sub-agent, picks the right one for the step, and cross-checks the result before anything merges.

task Implement a function that parses RFC 5322 email addresses.
ready
implementer
reviewer
harness
retry path
The pattern: R1 is not trying to replace Claude Code or Codex. Models propose. R1 decides.
§ demo 05 / 11
Long builds·05 / 11

Run eight tasks at once. Merge safely.

R1's scheduler picks which tasks run in parallel. File-scope conflicts prevented. One mutex serializes all merges to main. No corruption.

Build starts
t · 0.00 / 90:00
running0
merged0
blocked10
rejected0
cost$0.00
Illustrative build. Real task profiles vary.
§ demo 06 / 11
Controls·06 / 11

Set your bar. R1 holds it.

Drag the thresholds. See what passes. Your team's bar is configurable; R1 enforces whatever you configure, exactly.

Preset
Composite score
0.98
Presets are starting points. Every weight and every threshold is configurable per repo via .stoke/gates.yaml.
§ demo 07 / 11
Daemon loop·07 / 11

Make R1 the harness and the loop.

Daemon Mode turns R1 into a long-running process with a persistent task queue, append-only ndjson WAL, runtime-resizable worker pool, and HTTP control plane. The crash boundary moves from shell glue into a resumable loop with state on disk.

loop live · 10 workers · queue restored
daemon.wal INTENT · DONE · BLOCKED
queue.json 05 queued
worker pool 03 running · 02 done
recent completions evidence written
POST /enqueue GET /status POST /workers POST /hooks/install GET /wal pause / resume without restart
Deeper page: /daemon-mode/
the structural property: the loop survives crashes because queue state and evidence live on disk. operators change admission, concurrency, and hooks without restarting the process.
§ demo 08 / 11
Audit boundary·08 / 11

Refuse weak claims at the boundary.

Truth Engine adds five anti-deception guards for the places humans miss: unsupported completion claims, path-marker corruption, destructive post-merge tree drops, shallow audit posture, and suspiciously low delivery ratios without a written explanation.

before
after
Deeper page: /truth-engine/
the structural property: completion is refused when proof is weak. the operator sees a narrow, deterministic reason for rejection instead of a vague “please verify” warning.
§ demo 09 / 11
Skills·09 / 11

Turn intent into a verified skill. In eight stages.

Skills are code, not prompts. R1's wizard walks an operator through Intent, Inputs, Outputs, Capabilities, Side effects, Failure, Tests, and Determinism, then writes a typed, content-addressed IR that replays deterministically and gets smarter with every authored skill.

current question
answer
mode
wizard interpretation

        
Deeper page: /skill-wizard/
the structural property: skills built through the wizard are deterministic by construction. analyzer rejection at any stage is a constitution-bound failure, not a warning.
§ demo 10 / 11
Remote control·10 / 11

Control any agent. From any device. Verifiably yours.

Beacon Protocol is end-to-end encrypted remote control with cryptographic identity. Beacons are claimed through SAS-verified pairing. Every action traces back to a specific operator, device, token, and permission row. The Hub relays bytes. It cannot decrypt the session.

Beacon terminal
bc-3a2b8d10-eric-laptop · /claimme
Hub · routing fabric · cannot decrypt
bytes relayed · 0000
Operator device
Out-of-band channelThe QR and spoken phrase travel visually or verbally, not across the network.
Visual fingerprint checkBeacon and phone show the same 16-byte fingerprint. A malicious relay cannot fake a human-verified match.
SAS from shared secretBoth sides derive the same code only if the X25519 exchange is honest.
Per-device certOperator identity signs per-device certificates, so a lost phone can be revoked cleanly.
capability token constitution_hash: sha256:1a2b3c...
Hover a token field to inspect what the constraint is buying you.
Deeper page: /beacon-protocol/
the structural property: every action is signed by a specific operator's specific device's specific token's specific permission. revocation propagates in seconds. self-hosted parity preserves all of this.
§ demo 11 / 11
when something looks off·live

Your agent checks with you when the risk changes.

Named situations pause the run and ask for consent, from destructive file moves to configuration changes your team owns. You decide. The agent does not continue without a recorded answer.

🔴 R1 Beacon · CRITICAL signed advisory
Critical security patch for r1 v2.3.4
Your beacon is still advertising a vulnerable client version. Upgrade now or keep risk on the books explicitly.
signed by hub.r1.run · key sha256:ab12...
Deeper page: /security-layer/
the enum: closed advisory types, signed nudges, a dismiss path, and consent recorded before the run continues.
vs the alternatives·08

How R1 compares to running an agent without a harness.

R1 Bare Claude Code Bare Codex Hand-built scripts
Plans before executing Yes · explicit plan Sometimes · implicit Sometimes · implicit No
Verifies its own work Yes · cross-family review Some self-review Some self-review No
Replayable runs Yes · receipt trail No No Only if you build it
Cross-model fallback Yes · conductor pattern No No Only if you build it
Self-host path Yes · Apache 2.0 core No No Yes
Audit-grade receipts Yes · signed proof chain No No Only if you build it
Cost model Free self-host · hosted optional Subscription Subscription Engineer time
Net effect Ships with proof Ships with trust Ships with trust You rebuild the harness yourself
Integrations

R1 is the runtime. Adjacent systems compose around it.

R1 stands alone as a native agent framework. It also plugs into eight specialized systems that govern traffic, remote control, ground facts, verification, payments, and long runs. Tap any satellite to see how it hooks in.

R1
harness
CloudSwarm
agent GUI + templates
RelayGate
programmable middleware
RelayOne
gateway · firewall
TrueCom
agent commerce
Beacon Hub
remote control · RBAC · audit
Veritize
verify + drift
DeepTap
search + fact memory
Heroa
execution substrate
R1 + STOKE are Apache-2.0 at github.com/RelayOne/r1-agent. Beacon Protocol is a sibling control-plane protocol. Heroa, CloudSwarm, Beacon Hub, DeepTap, Veritize, RelayOne, RelayGate, and TrueCom's owned rails are commercial. Open the protocols. Keep the rails.
Try R1 in the cloud · zero-install

Launch R1 on CloudSwarm now.

Spin up a durable R1 agent in your browser. Your subscriptions, your repo, their sandbox. Multi-day builds that survive laptop closes.

cloudswarm.app · signed sessions, vault-injected creds, audit trail by default.
pricing·10

Free for self-host. Hosted is optional. Enterprise is there when you need it.

You can run R1 yourself at no license cost, pay only when you want a managed runtime, and send procurement-heavy deployments to the enterprise path. See pricing.

Most direct
Self-host
Free

Apache 2.0 core. Bring your own compute and model accounts. Full harness, full receipts, no license meter.

See self-host →
Procurement path
Enterprise
Custom

SSO, export controls, long retention, and sovereign deployment options for larger teams and regulated buyers.

See enterprise →
final step·11

Stop watching your agents. Watch what they ship.

Start free, run the demos, or go straight to the docs if you already know the shape of the work you want R1 to take over.

Apache 2.0Open source core you can audit and self-host.
Self-hostRun locally or move to a managed runtime later.
Signed receiptsEvery action leaves a replayable proof trail.
Active release trainCurrent preview runtime with ongoing docs and demo updates.