# Phase 3 Boundary Theorem Draft

> Phase 3 artifact for
> [`COARSE_GRAINING_PROOF_ROADMAP.md`](../COARSE_GRAINING_PROOF_ROADMAP.md).
> Status: reviewed, closed positive, 2026-05-16. Phase 2 closed positive in
> [`PHASE2_MDP.md`](PHASE2_MDP.md). This proves the negative direction
> (deterministic and randomized `F_σ` policies) and maps the theorem onto the
> pushable-occluder falsification slate. The reviewer pass is complete (see
> *Exit Status*); Phase 4 is unblocked.

## Entry And Gate

Phase 3 enters because Phase 2 closed positive. It imports the Phase 2 theorem:
finite-MDP sufficiency holds exactly when every `Φ`-fiber over the occupancy
support admits a common optimal action.

The pre-registered negative for this phase is:

> If a flat `𝓕_σ`-controller *does* solve a task whose decisive bit is provably
> not `𝓕_σ`-measurable, the boundary is wrong and Postulate 1's failure
> prediction is falsified — record, do not rescue.

Draft verdict: **negative not triggered.** The boundary is the contrapositive
of Phase 2: when a positive-support signature fiber has no common optimal
action, no policy measurable with respect to the current signature sigma-algebra
can be optimal.

## Local Symbols

| Symbol | Definition |
| --- | --- |
| `b(x)` | A control-relevant bit or latent variable. |
| `F_σ` | The sigma-algebra generated by `Φ`; equivalently, the information available to a flat signature policy at the decision point. ASCII rendering of Phase 0's `𝓕_σ` — same object, not a new one. |
| `R*` | Occupancy support from Phase 2: states reached with positive probability under an optimal policy from the admitted initial distribution. |
| `a_prep` | A preparatory world-changing action, such as pushing an occluder. |
| `π*_0` | The optimal first decision at the boundary state/fiber. |

## Policy Class

"Flat `F_σ` policy" means a policy measurable with respect to the **current**
signature map `Φ` — deterministic `g : Σ → A` or randomized `g : Σ → Δ(A)`.
This is the Phase 0 randomized-policy clause carried forward. A finite-memory or
belief-augmented controller whose memory is admitted into `X` is, by the Phase 0
decision-information-state convention, a policy over an *enriched* signature: it
falls outside the current `F_σ` by construction. Such a controller is therefore
not a counterexample to the theorem below — it is exactly the sanctioned "make
the missing bit measurable / add memory" escape named in the Pushable-Occluder
Mapping, achieved by changing `Φ`/`X`, not by beating a fixed `Φ`.

## Theorem — Nonmeasurable Decisive Bit

Let `Φ : X → Σ` be the current signature map. Suppose there exists a signature
fiber `C = {x ∈ R* : Φ(x)=σ}` with positive evaluation/occupancy mass on which
the optimal first action is not fiber-constant:

```text
∃x0, x1 ∈ C :  Φ(x0) = Φ(x1) = σ  and  A*(x0) ∩ A*(x1) = ∅.
```

Then no flat `F_σ` policy is optimal on that fiber. Write `b(x)` for any bit
that distinguishes such `x0, x1`; it is the *witness* of the conflict and is, by
construction, not `F_σ`-measurable on `C` — non-measurability of `b` is a
consequence of the two conditions above, not an independent hypothesis.

Equivalently, if the decisive bit only becomes observable after a preparatory
world-changing action, but the current decision already requires choosing among
different preparatory actions or between prep/no-prep based on that bit, then a
flat policy over the current signature cannot be Bayes-optimal.

## Proof

**Deterministic case.** Any deterministic `F_σ` policy is constant on the fiber
`C`: for some `g`, it selects `g(σ)` for every `x ∈ C`. Optimality at `x0`
requires `g(σ) ∈ A*(x0)`; optimality at `x1` requires `g(σ) ∈ A*(x1)`. Since
`A*(x0) ∩ A*(x1) = ∅`, no single action satisfies both, so no deterministic
`F_σ` policy is optimal on `C`.

**Randomized case.** A randomized `F_σ` policy is likewise fiber-constant: it
applies the same kernel `g(σ) ∈ Δ(A)` to every `x ∈ C`. In a finite MDP every
action in the support of an optimal stationary policy at a state lies in that
state's `A*` (any off-`A*` action has strictly lower `Q*` and would reduce
expected return). Optimality at `x0` therefore needs `supp g(σ) ⊆ A*(x0)`, and
at `x1` needs `supp g(σ) ⊆ A*(x1)`; with `A*(x0) ∩ A*(x1) = ∅` the support must
be empty, which is impossible. So no flat `F_σ` policy — deterministic or
randomized — is optimal on `C`.

This is exactly the contrapositive of Phase 2's fiber theorem. A signature is
control-sufficient only when the collapsed states agree on at least one optimal
action. If the collapsed states disagree on the action, the signature has
discarded a control-relevant bit.

## Scope Guard

The theorem does **not** say every task with a preparatory action is impossible
for a signature policy. If the common optimal first action over the whole fiber
is `a_prep`, then `a_prep` is measurable and a signature policy can choose it.
After the preparatory action, the process may enter a new information state with
a richer signature, and later control may become solvable.

The boundary is sharper: the current signature is insufficient when the hidden
bit changes which preparatory action is optimal, or whether preparation is
optimal at all, while the signature remains the same.

## Pushable-Occluder Mapping

The pushable-occluder design extends the photometric alignment task with a
block whose geometry is not in the flat controller's observation channel. The
flat controller sees detector intensities, mirror joint state, and
push-effector proprioception; the oracle sees block geometry.

For the Phase 3 theorem to apply, the diagnostic-positive seed slate must
instantiate a fiber conflict:

```text
Φ(x0) = Φ(x1)
but
oracle-required preparatory action differs:
  push direction, push/no-push decision, or clearance sequence.
```

Under that condition, the expected flat-controller failure is a
theorem-predicted boundary. It is not a surprise after the data; it is the
negative branch of the sufficiency predicate:

- the oracle can solve because it observes the missing bit;
- a flat `F_σ` policy cannot be optimal because it must choose the same
  preparatory action across states that require different optimal preparation;
- a hierarchical controller or redesigned signature can succeed only by making
  the missing bit measurable, adding memory/belief state, or changing the
  agent class.

If the flat controller succeeds on a non-trivial fraction of diagnostic-positive
seeds, one of two things happened:

1. the design did not actually create a nonmeasurable fiber conflict; some
   photometric/proprioceptive feature leaked the decisive bit into `Φ`; or
2. the theorem boundary is wrong for the admitted task.

The pre-registered disposition is to record that result, not rescue the
boundary by renaming the signature after the fact.

## Cross-Filed Statement

Safe statement for the design and analogy docs:

> The pushable-occluder slate is Phase 3's theorem-predicted boundary: when
> matched states share the flat photometric/proprioceptive signature but require
> different preparatory block actions, no current-signature policy can be
> optimal. A success by the flat controller would mean the decisive bit leaked
> into the signature, or the boundary theorem is wrong.

## Exit Status

Phase 3 proof: **positive.**

Pre-registered negative: **not triggered.** The impossibility result is the
Phase 2 contrapositive, extended to randomized `F_σ` policies; the
pushable-occluder mapping is conditional on the design instantiating a genuine
nonmeasurable fiber conflict.

Artifacts required by the roadmap are present:

1. negative-direction proof (deterministic and randomized);
2. explicit scope guard for preparatory-action tasks;
3. pushable-occluder theorem-predicted boundary statement, cross-filed in
   [`PHASE2_BLOCKS_DESIGN.md`](../PHASE2_BLOCKS_DESIGN.md) and the river-and-dam
   entry of [`analogies.md`](../../internal/theory/analogies.md);
4. disposition branch for flat-controller success.

Reviewer pass: **complete, 2026-05-16.** The contrapositive proof, the scope
guard, the gate quote (verbatim against roadmap), and both cross-filings were
checked. The randomized-policy completeness gap and the policy-class scoping
were flagged and are now applied (new *Policy Class* section + randomized case
in the proof); the `b(x)` hypothesis is reframed as the conflict witness.

Resolved review-item dispositions:

1. **Cross-filed statement is adequately hedged.** It is conditional on a
   genuine fiber conflict and pre-registers both branches without asserting the
   empirical setup has one — no overclaim before the seed audit. Confirmed as
   written; both cross-filings match.
2. **"Flat `F_σ` policy" = measurable wrt the current `Φ`** (deterministic or
   randomized), resolved in the new *Policy Class* section. A finite-memory /
   belief-augmented controller whose memory enters `X` is a policy over an
   *enriched* signature, outside the current `F_σ` by the Phase 0 convention —
   the sanctioned escape, not a counterexample.
3. **Phase 4 reuses this vocabulary — pre-committed.** Phase 4's
   measurable/off-measurable cell split uses the same fiber-conflict definition
   (fiber has / lacks a common optimal action), so the empirical gate and the
   theorem share one language (spec self-consistency).

Phase 3 closes **positive**; Phase 4 entry (Phase 3 exit) is satisfied.

