Method Essay · Shadow Bridge · v0.1
If local shadows recover Faraday, what do they recover for AI?
The Shadow Faraday experiment landed Branch A: local, gauge-invariant shadow data suffices to close Faraday's law on the registered classical-vacuum domain, without reconstructing the electromagnetic potential globally. This essay is the translation. It does not derive any new physics and does not prove any new theorem about AI. It claims that the same three properties that made Branch A earnable in EM — local gauge-invariant readout, structural zero, and named quarantine — are also the properties that distinguish robust AI behavioral guarantees from empirical tolerance tests. The Faraday receipt is the worked example; AI safety is the territory the example points at.
The Gap
Why an essay between the receipt and the headline claim.
Sundog's stack of experiments — atlas, three-body, Isotrophy K_facet, mesa, structural-failure, capset, Faraday — is a trail of receipts. Each one is honest about what it does and does not show. None of them, on its own, makes a public-facing claim about AI safety: that would betray the rigor that earns the receipts in the first place.
But the receipts are the wrong size for a curious electrical engineer or a working alignment researcher to extrapolate from. The Faraday page proves Branch A; it does not explain why Branch A matters outside electrodynamics. That gap is what this essay tries to close — once, in one place — by writing the translation down.
01 Local, gauge-invariant readout, no global reconstruction.
The shadow operator
P_shadow reads a
gauge-invariant
scalar from a small stencil of the electromagnetic
potential. It never asks for a globally consistent
choice of A. The readout is
local by construction (four edges
of one
plaquette)
and gauge-invariant by Stokes
(the holonomy of an exact form vanishes on every
closed loop).
In Faraday (Branch A)
The plaquette holonomy
∮ A on a coordinate square equals
the flux ∫ F through it. The
potential A is never reconstructed
globally; only its line integral on one closed
loop appears. Gauge invariance is exact, not
approximate.
Translated to AI behavior
A behavioral verifier reads from a small stencil of the agent's trajectory or output, and does not require reconstructing the agent's full latent world-model. Two agents whose internals differ by an irrelevant reparameterisation (the AI analog of a gauge) produce the same verdict.
the load-bearing rhyme: locality + invariance, not depth of inspection
Today's alignment work mostly picks one of two
extremes: take the model apart (mechanistic
interpretability) or stare at its outputs
(behavioral evals). The shadow position is the
third option that earned Branch A in EM —
small stencil, exact invariance — and
whether it is reachable for any nontrivial AI
behavior is an open empirical question, not a Sundog
claim. The
three-body workbench shows
the same local-probe lineage in a chaotic dynamical
system; the σ₃ detector there is a non-EM cousin of
P_shadow.
02 Structural zero, not tolerance threshold.
The Phase 3 derivation gives
R_F^0(S) = 0 by an
algebraic identity:
dF = d(dA) = 0. The zero is not a number
below a tolerance. It is a structural consequence of
the operator's typing. A reader does not have to trust
a threshold; the residual is zero by construction.
This is the same distinction that earned
Isotrophy K_facet's 20
structural-zero receipts and the same one routed
through a different apparatus on
the alignment Bayes comparator.
In Faraday (Branch A)
The closure residual evaluates to an
algebraic zero. The Phase 4 battery's
tolerance = 1e-9 is a
sanity-check floor for the supporting
numerical spot-checks, not the standard the
Branch A claim is held to.
Translated to AI behavior
Behavioral guarantees that come from the architecture (a constraint the model cannot violate) versus guarantees that come from empirical testing (a constraint the model has not been observed to violate at scale). The first is a structural zero; the second is a tolerance threshold.
the load-bearing rhyme: identity, not benchmark
Most public AI-safety work today is tolerance-based: test on 10,000 prompts, report no observed violation, publish. The shadow apparatus offers a different target — show the violation is ruled out by the operator's structure — and registers in advance what would count as a structural failure versus a tolerance failure. Reaching that target for any nontrivial AI behavior is hard. This essay does not claim it has been done. It claims that the distinction is real and worth drawing.
03 Named quarantine, pre-registered before the algebra.
Phase 2 enumerated five failure modes before Phase 3 began: regularity, topology, monopole, operator-stencil commutator, and motional EMF. Each is paired with the branch it would force (B-quarantine or C-bounded-failure). When the Phase 4 monopole probe tripped, it tripped a hook that already had a name and a verdict. The precedent is the K_facet v0.3h O_617 quarantine: that audit landed 20 structural zeros and one named quarantine — not because the audit failed, but because the failure was registered in advance with a specific representation-level reason.
In Faraday (Branch A)
The five quarantine hooks are not edge cases discovered after the fact. They were filed in the ledger before the derivation ran, so any surprise in Phase 3 had to land in a pre-registered category. No mid-experiment invention of new failure classes.
Translated to AI behavior
A pre-mortem for behavioral failure: name the kinds of inputs, conditions, or mode-shifts under which the behavioral guarantee is expected to fail, and the specific way it fails, before training or evaluation. A surprise that has no pre-registered class is a class-C result — the prior is updated, the scope is corrected.
the load-bearing rhyme: pre-registered failure beats post-hoc explanation
This is the cheapest translation to start practising in AI work today, and the one most often elided. A safety case that names its Aharonov-Bohm analog, its monopole analog, and its moving-boundary analog before the model is trained is a different artifact than a safety case that explains each observed failure as it appears.
Convergent Ground
Same shape, different substrate.
Sundog's name comes from the atmospheric hologram: a 3D ice-crystal geometry casts a 2D shadow that nevertheless preserves enough of the original structure to be read off locally. The shadow apparatus on the Faraday page formalises that move algebraically — a small stencil on a local patch recovers gauge-invariant content without reconstructing the global potential.
A separate, on-the-ground line of work is asking what the same move looks like inside an autoregressive transformer's residual stream. The shape the territory is converging on is that bottleneck layers act, effectively, as companders — compressing activations into a low-rank subspace and expanding them back out — and the subspaces that emerge in that bottleneck appear to separate into two orthogonal pieces: categorical centroids on one side, generator algebras on the other. Among the measured generators, rotations (so(3)) rank near the top across many models.
This essay is not the source of that finding, does not cite it (the paper has not been pulled into the citation rail yet, by mutual courtesy), and does not claim it proves any of the three translations above. The reason it is worth mentioning at all is that the convergent shape is a load-bearing signal: the same body/shadow decomposition that earned Branch A in Faraday may have a cousin inside the model the AI translations are about. When the citation lands, this card becomes sharper. Until then, the territory is described in the territory's own vocabulary, without borrowing the credit.
Good-Faith Hooks
What the bridge is allowed to borrow.
The essay is useful only if it stays courteous to the work around it. The Faraday receipt supplies a worked method; the pending compander citation supplies a possible mechanistic neighbor when it becomes public. Neither is allowed to launder a stronger AI-safety claim than the receipts support.
Credit waits for the source.
The compander-shaped territory is described without naming or citing private work. The rollout hook is staged, but the named card only appears after a public citation or explicit go-ahead.
No borrowed proof.
Branch A proves the electromagnetic receipt in its registered domain. A transformer-substrate citation, when named, can sharpen the analogy; it cannot prove the three AI translations.
Falsifiers stay in front.
Each translation below keeps a next experiment and a failure branch attached. A surprise that misses the named hooks weakens the bridge instead of being explained away after the fact.
Claim Boundary
Not a proof of AI safety.
The Faraday receipt is a proof. This essay is a translation of the receipt's method into AI-safety language. The translation has not been run against a single nontrivial AI system, and until it is, it is a hypothesis.
Not a theory of interpretability.
Mechanistic interpretability inspects model internals; behavioral evals inspect outputs. The shadow position is between, with explicit locality and exact invariance. It is a different target, not a refutation of the others.
Not a recipe.
There is no method here for building a structural zero in any specific AI system. The three claims are what to aim at, not how to achieve it. The how is what the next experiments would test.
Not yet externally promoted.
The local share surface is now prepared: bespoke
og:image, metadata, sitemap coverage,
and the /index → /faraday
→ /safety-method path are present.
Post-deploy validators and the citation ratchet
still gate any external push.
What Would Test The Translation
Three pre-registered next experiments.
A bridge essay that cannot be falsified is not a bridge. The three claims above are each falsifiable. The cheapest experiments that would either strengthen or kill the translation are listed below. None has been run.
Shadow verifier on a toy policy.
Pick a small RL or sequence policy where the local stencil can be a sliding window of actions, and ask whether a constraint can be checked stencil-only and invariant under a registered equivalence. Falsifier: any policy where the constraint requires unbounded lookback.
Structural vs tolerance audit.
Take an existing alignment evaluation and classify each predicate as structural-by-construction or tolerance-by-empirical-floor. Falsifier: an evaluation where every predicate is already structural — in which case the distinction adds no information.
Pre-mortem catalog for one behavioral claim.
Pick a single behavioral property (e.g., refusal under a specific prompt class) and write the five-to-seven named failure classes before running any evaluation. Falsifier: the post-hoc failure does not land in any pre-registered class.
Inspection Trail