February 21, 2026 · 7 min read
10 Days of Agentic AI: What One Person Built With an AI Co-Pilot
I experienced agentic velocity from the inside and realized the only thing protecting me was discipline — not enforcement. A day-by-day reconstruction.
By Chris Zimmerman, Founder at AmplefAI
I experienced agentic velocity from the inside and realized the only thing protecting me was discipline — not enforcement.
One person. One AI co-pilot. A fleet of local inference machines. And a thesis: that autonomous AI systems need governance infrastructure that doesn't exist yet.
What follows is a day-by-day reconstruction of what that looked like. Not polished. Not curated after the fact. Pulled from the daily notes my AI co-pilot logged as we worked.
This is both proof of what agentic AI enables — and evidence of why it needs to be governed.
- Wrote 3 canonical doctrine documents (architecture directive, cognitive balancing thesis, persistent context kernel spec)
- Defined 16 non-negotiable kernel invariants
- Planned 14 implementation tickets
- Locked 5 architectural decisions (staleness model, tenant isolation, ownership, salience, replay)
- Published a blog post to production (
amplefai.com/blog/cognitive-balancing) - Completed full CX audit of the live site (graded B+ content, C- conversion, D trust signals)
- Fixed site: footer, nav, author attribution, hero headline
- Shipped a public trust artifact: /docs/pck (sanitized from 45 passing tests)
One person. One day.
- Seeded realistic demo data: 21 context entries, 5 snapshots, 33 snapshot entries, 5 dependency edges across 3 simulated agents
- Shipped /docs/pck trust artifact page to production (sanitized architecture walkthrough, 18 invariants, failure semantics)
- Fixed a pre-existing crash recovery bug (multi-version mount, PK violation)
- Added 3 stress tests (55-entry replay, crash recovery, 12-tenant isolation fuzz)
- JSDoc'd all public kernel methods
- Closed out PCK v0-01 formally: 42/42 tests, all patches verified, all docs synced
- No logic changes. No new features. Pure stabilization.
- Built Golden Flow CLI: single command, full governed execution loop with proof points
- Ran stress tests: 100/100 sequential (zero drift), 20/20 parallel (no contention), kill-after-commit (orphans detected)
- First governed multi-agent handoff: dispatched a task from my machine to a local inference server via SSH file protocol. Task completed, result retrieved, contract validated.
- Analyzed a competitor (Solita/FunctionAI): positioned as "they govern the prompt, we govern the action"
- Updated fleet security across nodes
- Wrote "What the Kernel Guarantees" — an 18-invariant trust artifact translated into enterprise language. Coach G (external strategic co-pilot) reviewed: "legitimately investor-grade."
- Wrote investor one-pager. Coach G: "category memo disguised as investor doc."
- Built Coach G context packet (8 strategic questions answered, including kill-shot analysis)
- Tightened the pitch deck (v6): language corrections, TAM methodology, traction future-proofing
- All artifacts pressure-tested by Coach G before filing.
- Built Ed25519 token signing primitive from scratch. 23 tests passing. Golden vectors frozen.
- Dispatched adversarial tasks to local inference (Sindri fleet). First real bug found: truncated signature caused unhandled throw that bypassed fail-closed in broker.
- Built Broker HTTP service. 10 E2E tests covering full demo sequence.
- 12 adversarial tests added from Sindri red team output. 51 total tests.
- Wrote enforcement protocol spec, 8 invariant laws, demo sequence doc.
Completed Week 1's target (signing + broker) in one day.
- Ran 6 validation modes: corrupt ledger (detected), broken hash chain (detected), nonce replay after restart (rejected), DB failure (fail-closed), golden run (zero drift). 133 tests, 12 suites.
- Calibrated local inference (Sindri) on 3 task types: threat modeling (8/10), regulatory mapping (6/10), competitive intel (5/10).
- Invented a new metric: Cost-Per-Insight. Not cost-per-token. What does it cost to reach the actionable delta that changes a decision? 97% cost reduction vs cloud-only, equal or better quality.
- Key insight: "Route everything local first. Cloud adds the judgment delta."
- Email infrastructure verified (DKIM live for
send.amplefai.com) - Golden runs 7-8 passed (133/133, zero drift)
- Analyzed 3 competitors: Vulnu (prescribes hygiene, doesn't enforce), Klaw.sh (orchestration, not governance), 1Password SCAM benchmark (awareness gets 90%, enforcement gets 100%)
- Wrote full sprint retrospective (8 work packages, all validation runs, decision log)
- Tagged
v1-referenceat commit6fc58a1 - Trademark search: "AmplefAI" clear across EUIPO, TMview, and USPTO
- Built the canonical 5-minute GDPR demo CLI: verify → blocked → denied → approved+deleted → forensic replay. 6 bypass tests. All production primitives.
- Restructured long-term memory: 36KB → 6KB (83% reduction, zero context lost)
- Wrote Positioning Stack v1 (canonical, supersedes all prior positioning)
- Wrote Cost-Per-Insight blog post (operator voice, no product pitch)
- Made the hardest strategic call: no fundraise yet. Integration before capital. "You don't fund the crane before the steel is welded."
- Wrote v1 proof pack: 10 guarantees, 6 non-guarantees, 3 validation runs, 10/10 failure modes, 5 known limitations
- Wrote integration arc completion record (decision log, before/after, 4 gaps identified)
- Both artifacts emailed as styled PDFs for plane review
- Launched the Valtech Nordic AI Lab project: architecture, investment case, one-pager — all in one evening.
- Evaluated KMS/HSM strategy: Apple Secure Enclave on dedicated M4 mini (~5,000 DKK) vs. traditional HSM (six figures). Apple doesn't know they built an HSM.
- 3-round strategic pressure test with Coach G. Locked identity: "distributed systems primitive, entered through the security door."
- Wrote the full Policy DSL ontology: 6 concepts (subject, action, resource, condition, effect, delegation), 7 kernel invariants, non-Turing-complete constraints, 5 plain-English example policies
- Locked 8 major design decisions (first-match, escalation semantics, arithmetic bounds, overlap analysis, no broad allow, continuation envelope, ledger windows, wildcard syntax)
- New doctrine: "Shared state is a cryptographic event log, not collaborative memory."
- Analyzed Anthropic agent deployment data (software eng 49.7%, governance opportunity in long tail)
- Wrote blog post "Agent Sandboxing Is Going Mainstream" — published to production with 5 crosslinks
- Launched social media presence: X thread posted, Anthropic reply posted
- Drafted linter spec (13 rules), pressure-tested with Coach G (4 gaps + 1 contradiction resolved)
- Back-ported grammar, wrote compiler mapping table, created golden test vectors
- Implemented the linter in TypeScript: 13/13 tests passing
- Verified GEI spine: 136/136 tests passing
- Deployed the frozen enforcement kernel to Azure Confidential Computing (AMD SEV-SNP)
- Captured TPM attestation evidence, benchmarked 22,313 decisions/sec inside TEE
- Published technical attestation note to
amplefai.com/docs/confidential-computing - Wrote full GEI documentation from source code, pressure-tested 3 rounds with Coach G — room-ready
The Tally
The Point
This is what agentic AI enables. One person with the right tools operating at a velocity that would have required a 5-person team eighteen months ago.
But look at that list again. Every one of those actions — the code pushes, the site deploys, the multi-agent dispatches, the credential handling, the production database operations — was an autonomous action taken by an AI system with access to my infrastructure.
If my co-pilot had a policy error. If it had pushed to main instead of a branch. If it had sent that investor one-pager to the wrong email. If it had deployed untested code to production. If it had accessed credentials it shouldn't have.
There's no enforcement layer preventing any of that today. The only thing between "10x productivity" and "catastrophic error" is trust in a system that has no cryptographic accountability.
That's the authorization gap.
That's why we're building AmplefAI.
The enforcement kernel that made all of this possible — and that will eventually govern it — is running inside a hardware-backed confidential enclave. 136 tests passing. Deterministic replay. Hash-chained audit. No token, no execution.
AmplefAI builds the independent governance layer that ensures AI capability remains accountable to your institution — not your provider.
Learn more at amplefai.comChris Zimmerman
Founder at AmplefAI. Building constitutional governance for autonomous AI.
Continue Reading
Follow the thinking
We're building the constitutional layer for autonomous AI — in public. Get new posts delivered.
No spam. Governance-grade email only.