Proposition 1 — State Boundedness
The welfare state vector stays in [0,1]n for all t ≥ 0 under convex-combination dynamics and projected endocentric updates.
Holds unconditionally; does not prevent boundary saturation under heavy-tailed increments.
Exocentric Homeostatic Deliberation — Research Evidence Memo
Proposition 1 — State Boundedness
The welfare state vector stays in [0,1]n for all t ≥ 0 under convex-combination dynamics and projected endocentric updates.
Holds unconditionally; does not prevent boundary saturation under heavy-tailed increments.
Proposition 2 — Monotone Passive Degradation
Aggregate welfare V(xt) is non-increasing on intervals where no action is taken, no exogenous replenishment occurs, and Wext is not updated.
Holds only during genuine between-episode idle phases; an out-of-band surveillance update can increase V.
Design Principle 1 — Governance-Transparent Attribution
Each utility component's contribution to any action ranking can be computed and reported independently, without access to internal operational variables.
Algebraic separability, not statistical independence; λ coefficients must be fixed by governance, not learned.
Proposition 4 — Robbins–Monro Consistency
The mean recalibration rule converges in mean square to the true interventional mean μ*(a) under i.i.d. zero-mean noise and standard Robbins–Monro step-size conditions.
Fixed action, i.i.d. noise, infinite visitation only. Does not cover non-stationary or policy-dependent environments.
Proposition 5 — Fallback Feasibility
If a designated recovery procedure satisfies all feasibility constraints at every step, the control loop is total: the agent always either selects a positive-utility action or executes the recovery procedure.
Depends on the recovery procedure remaining authorised and safety-compliant at all times by design.
Proposition 6 — Restricted EFE Ranking Divergence
EHD and single-step EFE yield different action rankings under two cases: (i) Gaussian comparator with equal means but differing variances; (ii) governance-penalty mismatch with matched non-governance terms.
Restricted comparator result under explicit auxiliary assumptions. Not a general policy-divergence theorem. Does not apply to multi-step EFE.
fig_convergence — Proposition 4
50 independent Robbins–Monro trajectories converging to μ*(a) = 0.35 from initial estimate 0.60. Learning rate κk = 0.5 / k0.75.
fig_ranking_divergence — Proposition 6, Case (i)
Actions with equal predicted means but differing variances receive identical EHD pragmatic scores but different EFE KL scores. Restricted divergence result under Gaussian–Gaussian auxiliary assumptions.
fig_welfare_trajectory — Section 7
24-month AMR monitoring simulation. Exocentric trigger fires at Wext < θext = 0.45 despite satisfactory endocentric state.
LLM agents maintain state, invoke tools, and operate across extended sessions, yet the evaluative basis on which a persistent agent should judge external improvement or deterioration has not been formally characterised. We introduce exocentric homeostatic deliberation (EHD), a control framework in which the primary welfare signal is defined over an externally monitored and auditable world-state rather than internal comfort variables alone. The framework comprises four elements: a transparent external welfare signal; an intervention-based hope term on action-conditioned future states; an additive utility decomposition with pragmatic, epistemic, operational, cost, risk, and governance components; and a recalibration rule for action efficacy. The paper establishes six propositions: state boundedness; monotone passive degradation; governance-transparent marginal attribution; mean-square consistency of the mean-update rule under Robbins–Monro conditions; fallback feasibility under a designated recovery procedure; and restricted action-ranking divergence relative to a single-step EFE comparator. The paper provides no empirical evaluation and makes no claim of equivalence with active inference.