HomeFounder NotesArchitecture
Architecture

February 12, 2026 · 10 min read

Persistent Context Kernel: Governing What AI Agents Know

An AI agent processes a loan application, flags a risk, recommends approval. Six months later, a regulator asks why. The model is reproducible — but what did the agent actually know? The context was never governed.

By AmplefAI

An AI agent processes a loan application. It weighs the applicant's history, flags a risk, recommends approval with conditions. Six months later, a regulator asks why. The model is reproducible — same weights, same architecture, deterministic inference. But what did the agent actually know when it made that call? Which context entries were in scope? Were any later invalidated? Nobody can answer.

The context was never governed.

The model wasn't the problem. The memory was.

This is not a retrieval problem or a hallucination problem. It is a context governance problem — the failure mode enterprise AI is not prepared for.


The problem with agent memory

AI agents have a memory problem. It's not the one you think.

The obvious version: agents forget everything between invocations. Stateless by design, they start each task from zero. The industry response has been to bolt on memory — RAG pipelines, vector databases, conversation history append logs. These work, in the narrow sense that agents can now reference past information.

The less obvious version is more dangerous.

An agent with unbounded access to accumulated context is an agent with an unauditable attack surface. What did it know when it made that decision? Can you prove it? Can you prove what it didn't know? If agent B consumed context from agent A, and agent A's context was later invalidated, which downstream decisions are affected?

Current approaches don't answer these questions because they don't treat context as governed infrastructure. RAG retrieves by similarity. Vector databases rank by embedding distance. Conversation history appends chronologically. None version context. None enforce scope boundaries. None support deterministic replay.

None can reconstruct, byte-for-byte, what an agent knew at the moment it acted.

Today's Memory Systems vs. Persistent Context Kernel

Optimizes for

Recall, similarity, convenience

Asks

"What's relevant?"

RAG retrieval by similarity
Vector ranking by embedding distance
Chronological history
No versioning
No scope boundaries
No replay

Context is an attack surface, not a feature.


Governed context infrastructure

The category is governed context infrastructure: treating agent memory with the same discipline you'd apply to a database. Schema, transactions, isolation, audit trail, replay. Not a cache. Not a convenience layer. Infrastructure.

The core shift: from "what does the agent know" to "what is the agent allowed to know — and can we prove it."

Versioning

Every context entry is append-only. Updates create new versions, forming a directed acyclic graph. The original version is never mutated. If an agent consumed version 1 and the entry is now at version 5, that fact is recorded, traceable, and replayable. Without versioning, forensic reconstruction becomes impossible the moment any entry changes.

Scope

Context has a four-level visibility hierarchy: organization > department > workflow > agent. An agent-scoped entry is invisible to other agents — not access-denied, invisible. Returning "access denied" would confirm existence, which is itself an information leak. Without structural scope boundaries, isolation becomes suggestive rather than enforced.

Four-Level Scope Isolation

Organization

Global policies, shared knowledge

Department

Team-scoped context, budget boundaries

Workflow

Task-specific context, dependency edges

Agentinvisible to outer scopes

Private state — structurally invisible to all others

Each layer constrains further. Not access control — structural invisibility.

Replay

Any governed action can be forensically reconstructed: the exact context entries the agent saw, in the exact order, with the exact relevance scores, under the exact policy version. Not approximately. Exactly.

Audit

Every context mount, every dependency between agents, every version transition is recorded. Not as a log you might grep — as structured, queryable data with foreign key relationships.


Architecture

The design bias is toward invariants, not flexibility. The architecture is intentionally small. The kernel exists to enforce those guarantees, not to be a platform in itself.

The PCK is an embedded database with a kernel layer that enforces all business invariants. Agents are stateless. They mount context from the kernel, execute, produce new context, and surrender it back. The kernel outlives every agent instance. It is the single source of truth for what any agent knew at any point in time.

Lifecycle: mount, commit, snapshot, replay

Kernel Lifecycle

KernelPersists across all agent instances
Commitwrite

Validate & persist atomically

Mountread

Scope, budget, score, project

Snapshotfreeze

Freeze versions, order, scores

Replayprove

Byte-for-byte reconstruction

Commit

An agent produces context entries. The kernel validates each entry (tenant identity present, agent identity present, content within size limits), assigns an identifier, computes an initial relevance score, and persists atomically. The entire batch is wrapped in a single transaction — if any entry fails validation, the whole batch rolls back. Zero partial commits. The invariant: atomicity at the write boundary. A crash mid-commit cannot corrupt the store.

Mount

An agent needs context to execute. The kernel queries visible entries filtered by tenant and scope, computes relevance scores, filters by threshold (pinned entries always pass), fits to a token budget by relevance order, creates an immutable snapshot, records cross-agent dependency edges, and returns a governed projection of context. The agent never sees the full store. Every read is scoped, budgeted, and recorded.

Snapshot

Every mount creates a snapshot: a frozen record of exact entry version pairs plus per-entry forensic metadata. Snapshots are immutable after creation.

Replay

Given a trace identifier, the kernel finds the snapshot, hydrates the exact versions that were snapshotted, applies the pinned ordering and relevance scores from mount time, and returns a byte-for-byte reconstruction of what the agent saw. No network calls. Everything from local storage.

Forensic Reconstruction

Trace ID

Query by identifier

Find Snapshot

Locate frozen record

Hydrate Versions

Load exact entry versions

Apply Pins

Restore order + scores

Reconstructionverified

Byte-for-byte identical

Zero network calls. Everything from local storage. Deterministic.

Storage model

The kernel uses a small number of purpose-built tables: one for context entries (append-only), one for frozen snapshots, one for per-entry forensic metadata within snapshots, and one for cross-agent dependency records. The schema is intentionally minimal — four tables, each with a single responsibility. The forensic metadata table is what makes replay exact. Without it, you can reconstruct which entries an agent saw, but not in what order or with what scores. Both matter. Relevance ordering determines what the agent prioritizes. Score drift from decay or access patterns would make replayed snapshots diverge from the original.

Tenant isolation

Tenant identity is embedded in the storage structure and appears in every query. There are no global queries. No access path omits tenant identity. Cross-tenant reads return empty results — not errors.

Structural isolation at the query layer. Not authorization middleware you hope doesn't have bugs.

Append-only model

Entries are never mutated. Never deleted. Version updates create a new row with an incremented version and a pointer to the parent. The original row is untouched. This is the foundation of replay — if entries could mutate, every snapshot referencing them would become forensically invalid.

Append-Only Version Chain

v1🔒 never mutated

2026-02-12 09:14

Original entry

v2parent: v1

2026-02-14 11:02

Updated entry

v3current

2026-02-18 15:30

Latest version

Updates create new rows. Originals are untouchable. Replay returns the exact version snapshotted.

Scope changes are blocked entirely on version updates. Not just widening — narrowing too. An entry scoped to a single agent cannot be widened to the organization, and vice versa. This was a deliberate decision: scope changes without governance approval are risky in either direction.

The append-only model raises questions we acknowledge but do not solve in v0. Entries accumulate indefinitely. Retention policies, cold storage tiers, and archival strategies will be required at scale. More pointedly, regulatory frameworks like GDPR assert a right to deletion that sits in direct tension with an append-only log. Reconciling cryptographic deletion overlays, tombstone records, or retention-window purging with forensic guarantees is a real design problem. We are aware of it. v0 does not address it. The architecture will need to.

Three-pin replay guarantee

Three-Pin Replay Guarantee

📌

Version Pin

Exact entry version pairs stored. Content drift prevented.

Returns v1 even if entry is now at v5

📌

Order Pin

Exact position captured in forensic metadata. Relevance drift prevented.

Position survives decay and access changes

📌

Score Pin

Relevance score captured at mount time. Score drift prevented.

Score survives subsequent decay cycles

Without three-pin replay, any post-hoc analysis is contaminated by current state.

The policy version is also pinned from the snapshot record. Replay returns the rules that applied, not the current rules.

Relevance scoring

v0 uses a deterministic heuristic — no ML, no embeddings, no network calls. Scores are computed from entry type, age-based decay, and access frequency. Pinned entries always receive maximum relevance. Different entry types (decisions, observations, errors, etc.) carry different base weights, reflecting their governance significance.

Same inputs, same time, same output. Required for replay determinism — and tested: two mounts with identical params produce identical results.

Token budgeting

Mount reserves budget for pinned entries first. If pinned entries alone exceed the budget, that's a hard error, not a silent truncation. Remaining budget fills by relevance descending.

The budget is a hard cap. Never exceeded. If violated, hard error.


Governance of invalid context

The PCK guarantees forensic reconstruction — you can always prove what an agent knew. But what happens when that context turns out to be wrong? Governance workflows for validating, approving, or revoking context entries sit above the kernel layer. The kernel provides the substrate: immutable versions, dependency edges tracing which agents consumed what, snapshots pinning exact state at decision time. Policies that act on this information — flagging downstream decisions when upstream entries are revoked, requiring human approval before certain context types enter scope — are governance concerns, not kernel concerns. Out of scope for v0. The separation is deliberate: the kernel guarantees the data. Governance workflows interpret it.


Failure semantics

Crash recovery

The storage engine provides atomic transactions. All entry inserts in a commit are wrapped in a single transaction. On crash, recovery on next open restores committed state. The implication is simple: once committed, context is durable.

Bounded staleness

Mount enforces staleness bounds — configurable maximum age and maximum versions behind. Stale reads can be configured to block or warn. Policy reads always enforce strict consistency with zero staleness tolerance.

What v0 does not solve

No batch cross-entry transactions beyond commit. No distributed replay — single-node storage only. No real-time staleness push. No ML scoring. No garbage collection. No migration tooling. Token counting is approximate. Scope coverage testing is partial at some hierarchy levels.


Enterprise implications

These guarantees are not academic — they change how organizations can delegate decisions to machines.

Auditability

Every context mount creates a snapshot. Every cross-agent consumption creates a dependency edge. Every version creates a traceable lineage. This isn't "we log stuff" — it's structured, queryable, forensically reconstructable records. Compliance becomes provable, not performative.

Forensic reconstruction

Given a trace ID, you can reconstruct the exact knowledge state of any agent at any point: what entries it saw, in what order, with what relevance scores, under what policy version. This answers the question regulators actually ask: "What did the system know when it made this decision?"

Multi-agent governance

When agent B consumes context produced by agent A, the dependency is recorded. If agent A's context is later invalidated, all downstream consumers are identifiable. Scope isolation ensures agents in different tenants share zero context — not by convention, but by structural enforcement.


What we deliberately left out

No embeddings. No vector similarity. No semantic search. No RAG integration. No ML-based salience scoring.

This is not a gap — it's a decision. v0 proves that governed context infrastructure works: versioned, scoped, replayable, auditable. Adding ML scoring is additive — the relevance function is pluggable, and scores would be cached and versioned for replay. But ML scoring in the critical path would break deterministic replay.

That is the foundational guarantee everything else rests on.

Correctness over cleverness. The heuristic is boring and testable. That's the point.

v0 is a thin slice proving the model works, not a finished product. The invariants are covered. The tests pass. The architecture holds.


Where this goes

If AI agents run enterprise workflows, their memory cannot be a best-effort cache. It must be governed infrastructure.

The trajectory is clear: agents will make higher-stakes decisions, touch more regulated domains, operate in longer chains where context flows between dozens of autonomous actors. Every one of those transitions makes ungoverned memory more dangerous.

The PCK is a foundation — not for making agents smarter, but for making their knowledge accountable. Governed context infrastructure is the layer that lets enterprises trust agent decisions not because the model is good, but because the memory is provable.

That is the bar. It will only get higher.

AmplefAI builds the independent governance layer that ensures AI capability remains accountable to your institution — not your provider.

Learn more at amplefai.com

AmplefAI

Continue Reading

Follow the thinking

We're building the constitutional layer for autonomous AI — in public. Get new posts delivered.

No spam. Governance-grade email only.