Error Geometry and Causal Robustness

“Error is not noise, but geometric boundary; robustness is not luck, but geometric invariant.”

🎯 Core Ideas

In the previous chapter, we learned how spacetime geometry serves as minimal lossless compression of causal constraints. Now we face a practical question:

When causal structure has uncertainty (measurement errors, quantum fluctuations, finite samples), are our conclusions still robust?

Traditional method: “Point estimate + error bars”

GLS New Perspective:

$Error = Geometric Region in Parameter Space (Credible Region)$

$Robustness = Conclusion Holds Over Entire Credible Region$

Analogy:

Traditional method is like describing a route to a friend:

“Go straight 500 meters from here, then turn left”

What if your GPS has ±50 meter error? You might end up in a completely different place!

Geometric Method:

“Within a circle centered on me with radius 50 meters, no matter which point you’re at, turning left will get you to the destination”

This is true robustness!

📖 From Point Estimation to Region Estimation

Limitations of Traditional Statistical Inference

Classical Procedure:

Collect data ${X_{1}, \dots, X_{n}}$
Estimate parameter $\hat{θ}_{n}$
Calculate standard error $SE (\hat{θ}_{n})$
Give confidence interval $[\hat{θ}_{n} - 2 SE, \hat{θ}_{n} + 2 SE]$

Problems:

Confidence interval is one-dimensional (for scalar parameters)
For multi-dimensional parameters, separate confidence intervals ignore correlations
Causal conclusions often depend on complex combinations of parameters

Example (Linear Regression):

$Y_{i} = β_{0} + β_{1} X_{i} + ϵ_{i}$

We might care about conditional effect:

$ψ (β) = β_{1} + β_{0} / \overset{ˉ}{X}$

From confidence intervals of $β_{0}, β_{1}$ alone, we cannot accurately infer confidence interval of $ψ (β)$ !

Geometric Perspective: Confidence Ellipsoid

New Method: Treat error as geometric region in parameter space $Θ$

Fisher Information Metric:

At each parameter point $θ$ , define local metric:

$g_{θ} (u, v) := u^{⊤} I (θ) v$

where $I (θ)$ is Fisher information matrix.

Intuition:

Large eigenvalue of $I (θ)$ → Parameter “easily identifiable” in that direction
Small eigenvalue of $I (θ)$ → Parameter “hard to identify” in that direction

Confidence Ellipsoid:

$R_{n} (α) := {θ \in Θ : n (θ - \hat{θ}_{n})^{⊤} I (\hat{θ}_{n}) (θ - \hat{θ}_{n}) \leq χ_{d, 1 - α}^{2}}$

graph TB
    subgraph "Parameter Space Θ"
        CENTER["Point Estimate<br/>θ̂ₙ"]
        ELLIPSE["Confidence Ellipsoid<br/>ℛₙ(α)"]
        TRUE["True Value θ₀"]

        CENTER -.->|"Semi-axis Directions Determined by I⁻¹"| ELLIPSE
        ELLIPSE -->|"Contains with Probability 1-α"| TRUE
    end

    INFO["Fisher Information I(θ)"] -->|"Determines Shape"| ELLIPSE

    style ELLIPSE fill:#fff4e1
    style TRUE fill:#ffe1e1

Key Property:

Asymptotic Coverage Theorem:

$P_{θ_{0}} (θ_{0} \in R_{n} (α)) \to 1 - α, n \to \infty$

That is: True parameter falls inside ellipsoid with probability $1 - α$ !

🎨 Geometric Operations on Credible Regions

Projection of Linear Functions

Suppose causal effect we care about is linear function of parameters:

$ψ (θ) = c^{⊤} θ$

Question: Range of $ψ$ on credible region $R_{n} (α)$ ?

Geometric Answer:

$Ψ_{n} (α) := {ψ (θ) : θ \in R_{n} (α)}$

This is projection of ellipsoid in direction $c$ !

Analytic Solution:

$ψ_{m a x} = c^{⊤} \hat{θ}_{n} + \frac{χ _{d, 1 - α}^{2}}{n} c^{⊤} I (\hat{θ}_{n})^{- 1} c$

$ψ_{m i n} = c^{⊤} \hat{θ}_{n} - \frac{χ _{d, 1 - α}^{2}}{n} c^{⊤} I (\hat{θ}_{n})^{- 1} c$

Analogy:

Imagine ellipsoid is a watermelon, $c$ is direction of cutting knife:

After cutting, cross-section (projection) is an ellipse
Major and minor axes of ellipse determined jointly by watermelon shape and knife angle

graph LR
    ELLIPSOID["3D Confidence Ellipsoid<br/>ℛₙ(α)"] -->|"Project Along Direction c"| INTERVAL["1D Confidence Interval<br/>[ψ_min, ψ_max]"]

    DIRECTION["Projection Direction c"] -.->|"Determines"| INTERVAL

    style ELLIPSOID fill:#e1f5ff
    style INTERVAL fill:#ffe1e1

Local Linearization of Nonlinear Functions

If causal effect is nonlinear function $ψ : Θ \to R^{k}$ , what to do?

Delta Method (first-order approximation):

Near $\hat{θ}_{n}$ :

$ψ (θ) \approx ψ (\hat{θ}_{n}) + D ψ (\hat{θ}_{n}) \cdot (θ - \hat{θ}_{n})$

where $D ψ$ is Jacobian matrix.

Projected Ellipsoid:

$S_{n} (α) := {y \in R^{k} : n (y - ψ (\hat{θ}_{n}))^{⊤} Σ^{- 1} (y - ψ (\hat{θ}_{n})) \leq χ_{k, 1 - α}^{2}}$

where

$Σ := D ψ (\hat{θ}_{n}) \cdot I (\hat{θ}_{n})^{- 1} \cdot D ψ (\hat{θ}_{n})^{⊤}$

Physical Meaning: Uncertainty ellipsoid of nonlinear effect!

🔍 Identifiable Sets in Causal Inference

What Is Identifiable Set?

In many causal problems, even with infinite data, we cannot uniquely determine certain parameters.

Definition (Identifiable Set):

$I := {θ \in Θ : θ compatible with observed distribution and causal assumptions}$

Example 1 (Omitted Variable Bias):

True model: $Y = β_{1} X + β_{2} Z + ϵ$

But $Z$ is unobservable!

Identifiable set:

$I = {(β_{1}, β_{2}) : β_{1}^{obs} = β_{1} + β_{2} γ_{ZX}}$

where $β_{1}^{obs}$ is observed regression coefficient, $γ_{ZX}$ is regression coefficient of $Z$ on $X$ .

Geometry: This is a line, not a single point!

Example 2 (Weak Identification of Instrumental Variables):

When instrumental variable is “weak”, identifiable set of structural parameters may be unbounded or very “flat” region.

Intersection of Credible Region and Identifiable Set

GLS Core Insight:

Causal conclusions should be based on:

$R_{n} (α) \cap I_{n}$

not just point estimate $\hat{θ}_{n}$ !

where $I_{n}$ is data-driven estimate of identifiable set.

graph TB
    subgraph "Parameter Space Θ"
        TRUST["Credible Region<br/>ℛₙ(α)<br/>(Statistical Uncertainty)"]
        IDENT["Identifiable Set<br/>ℐₙ<br/>(Causal Constraints)"]
        INTER["Intersection<br/>ℛₙ∩ℐₙ"]

        TRUST -.->|"Intersection"| INTER
        IDENT -.->|"Intersection"| INTER
    end

    INTER --> ROBUST["Robust Causal Conclusion<br/>(Holds Over Entire Intersection)"]

    style TRUST fill:#e1f5ff
    style IDENT fill:#ffe1e1
    style INTER fill:#fff4e1,stroke:#ff6b6b,stroke-width:3px

Definition (Geometric Robustness):

Let $ψ : Θ \to R$ be causal effect. If there exists interval $[L, U]$ such that

${ψ (θ) : θ \in R_{n} (α) \cap I_{n}} \subset [L, U]$

then we say “causal conclusion $ψ (θ) \in [L, U]$ is geometrically robust at level $α$ ”.

In Particular: If $L > 0$ (or $U < 0$ ), we can robustly assert direction of effect!

Convex Optimization for Linear Identifiable Sets

Common Case: Identifiable set can be represented as linear inequalities:

$I = {θ : A θ \leq b}$

Then $R_{n} (α) \cap I$ is intersection of ellipsoid and polyhedron (convex set).

Extrema of Causal Effect:

For linear effect $ψ (θ) = c^{⊤} θ$ :

$ψ_{m a x}^{*} := sup {c^{⊤} θ : θ \in R_{n} (α), A θ \leq b}$

$ψ_{m i n}^{*} := in f {c^{⊤} θ : θ \in R_{n} (α), A θ \leq b}$

This is a Quadratic Programming problem, can be solved efficiently!

Geometric Robustness Criterion Theorem:

If $ψ_{m i n}^{*} \geq δ > 0$ , then we can robustly assert:

$Causal effect is positive, and at least δ$

And this conclusion holds for all $θ \in R_{n} (α) \cap I$ !

🌐 Multi-Experiment Aggregation: Intersection and Union of Regions

Problem Scenario

In reality, we often have multiple data sources:

Experiment 1: Randomized controlled trial (RCT), sample $n_{1} = 500$
Experiment 2: Observational study, sample $n_{2} = 5000$
Experiment 3: RCT from another region, sample $n_{3} = 300$

Traditional Meta-Analysis:

Calculate point estimates $\hat{θ}_{1}, \hat{θ}_{2}, \hat{θ}_{3}$ of each study, then weighted average.

Problems:

How to judge whether studies are truly consistent?
How to systematically identify conflicts?
Point estimate differences may come from sampling error, not real effect differences!

Geometric Meta-Analysis

Idea: Each study gives a credible region $R_{k} (α_{k})$ , $k = 1, 2, 3$

Consensus Region (intersection):

$R_{cons} := k = 1 ⋂ K R_{k} (α_{k})$

Physical Meaning: Parameter range simultaneously supported by all studies

Permissible Region (union):

$R_{perm} := k = 1 ⋃ K R_{k} (α_{k})$

Physical Meaning: Parameter range supported by at least one study

Conflict Region (symmetric difference):

$R_{conflict} := R_{perm} ∖ R_{cons}$

Physical Meaning: Parameter range supported by only some studies, where controversy exists

graph TB
    subgraph "Credible Regions of Three Studies"
        R1["Study 1<br/>ℛ₁(α₁)"]
        R2["Study 2<br/>ℛ₂(α₂)"]
        R3["Study 3<br/>ℛ₃(α₃)"]
    end

    R1 -.->|"Intersection"| CONS["Consensus Region<br/>ℛ_cons"]
    R2 -.->|"Intersection"| CONS
    R3 -.->|"Intersection"| CONS

    R1 -.->|"Union"| PERM["Permissible Region<br/>ℛ_perm"]
    R2 -.->|"Union"| PERM
    R3 -.->|"Union"| PERM

    PERM -.-> CONFLICT["Conflict Region<br/>ℛ_conflict = ℛ_perm \ ℛ_cons"]
    CONS -.-> CONFLICT

    style CONS fill:#e1ffe1,stroke:#00aa00,stroke-width:3px
    style CONFLICT fill:#ffe1e1,stroke:#ff0000,stroke-width:2px

Consistency Judgment

Strong Consistency: $R_{cons} \neq = \emptyset$ (consensus region non-empty)

Weak Consistency: “Volume” of $R_{conflict}$ relatively small

Significant Conflict: $R_{cons} = \emptyset$ (consensus region empty!)

Then we can clearly assert: Studies have fundamental contradiction, not vaguely say “results somewhat different”.

Consensus Interval for Causal Effect

For effect of interest $ψ (θ)$ :

$Ψ_{cons} := {ψ (θ) : θ \in R_{cons}}$

Robust Conclusion: Only when $ψ (θ) \in Ψ_{cons}$ can we say this effect value is supported by all studies.

Examples:

$Ψ_{cons} = [0.2, 0.5]$ : All studies consistently support effect between 0.2 and 0.5
$Ψ_{cons} = [- 0.1, 0.3]$ : Contains 0, cannot robustly assert direction!
$Ψ_{cons} = \emptyset$ : Studies conflict, no consensus

⚙️ Experimental Design: Shaping Future Credible Regions

New Perspective

Traditional experimental design goal: Minimize variance

GLS Geometric Perspective:

$Experimental Design = Shaping Geometric Shape of Future Credible Region$

Key Insight: By choosing experimental scheme (sample allocation, covariate design, etc.), we can actively shape Fisher information matrix $I (θ; ξ)$ , thus shaping shape of credible ellipsoid!

Fisher Information and Region Volume

Volume of credible ellipsoid:

$Vol (R_{n} (α; ξ)) = C_{d, α} \cdot det (I_{n} (θ_{0}; ξ))^{- 1/2}$

where:

$ξ$ is design variable (e.g., sample allocation scheme)
$C_{d, α}$ is constant

D-Optimal Design:

$ξ^{*} = ar g ξ max det I_{n} (θ_{0}; ξ)$

Geometric Meaning: Minimize volume of credible ellipsoid, make parameter estimation “tightest overall”!

graph LR
    DESIGN["Experimental Design ξ"] -->|"Determines"| FISHER["Fisher Information<br/>I(θ;ξ)"]
    FISHER -->|"Determines Shape"| ELLIPSE["Credible Ellipsoid<br/>ℛₙ(α;ξ)"]

    OPT["Optimization<br/>max det I"] -.->|"Shrinks"| ELLIPSE

    style ELLIPSE fill:#fff4e1
    style OPT fill:#e1ffe1

Directional Distinguishability: c-Optimal Design

If we only care about specific causal effect $ψ (θ) = c^{⊤} θ$ , don’t need overall optimal!

c-Optimal Design:

$ξ^{*} = ar g ξ min c^{⊤} I_{n} (θ_{0}; ξ)^{- 1} c$

Geometric Meaning:

Don’t pursue minimum overall ellipsoid volume
Specifically compress semi-axis in direction $c$
Concentrate resources to improve resolution of this specific causal effect

Analogy:

D-optimal = All-round development (all subjects must be good)
c-optimal = Specialization (only need math good, for math department application)

Example: Sample Allocation in Linear Regression

Model:

$Y_{i} = β_{0} + β_{1} X_{i} + ϵ_{i}, ϵ_{i} \sim N (0, σ^{2})$

Design Problem: How to allocate $n$ samples at two levels $X \in {x_{1}, x_{2}}$ ?

Let allocation proportion be $ξ = (p, 1 - p)$ , i.e., $n p$ samples at $x_{1}$ , $n (1 - p)$ samples at $x_{2}$ .

Fisher Information Matrix:

$I (β; ξ) = \frac{n}{σ ^{2}} (1 \overset{x}{ˉ} (ξ) \overset{x}{ˉ} (ξ) \overline{x^{2}} (ξ))$

where $\overset{x}{ˉ} (ξ) = p x_{1} + (1 - p) x_{2}$ .

D-Optimal Design (minimize inverse of determinant):

$det I (β; ξ) \propto \overline{x^{2}} (ξ) - \overset{x}{ˉ} (ξ)^{2} = Var (X)$

Maximize variance $\Rightarrow$ Equal allocation at extremes: $p^{*} = 1/2$

c-Optimal Design (only care about slope $β_{1}$ ):

$c = (0, 1)^{⊤}, c^{⊤} I^{- 1} c \propto \frac{1}{Var ( X )}$

Also get $p^{*} = 1/2$ (equal allocation at extremes maximizes variance of $X$ )

🔗 Connections with GLS Theory

Connection with Causal Diamond

In GLS theory, boundary $\partial D = E^{+} \cup E^{-}$ of causal diamond $D (p, q)$ encodes complete information.

Analogy:

Causal Diamond ↔ Credible Region $R_{n} (α)$
Boundary $\partial D$ ↔ Ellipsoid Boundary
Bulk Reconstruction ↔ Inferring Interior Parameters from Boundary

All reflect idea that boundary encodes complete information!

Connection with Time Scale

Uncertainty of unified time scale $κ (ω)$ can be geometrized:

$κ (ω) \in R_{κ} := [κ_{m i n} (ω), κ_{m a x} (ω)]$

Robust Causal Conclusion: Only when conclusions based on all $κ$ values in $R_{κ}$ are consistent, are they robust!

Connection with Null-Modular Double Cover

Estimation error of modulation function $g_{σ} (λ, x_{⊥})$ can be represented as confidence region:

$g_{σ} \in R_{g}$

Robustness: Range of modular Hamiltonian $K_{D}$ over entire $R_{g}$ :

$K_{D} [R_{g}] = {2 π σ \sum \int_{E^{σ}} g_{σ} T_{σσ} : g_{σ} \in R_{g}}$

🌟 Core Formula Summary

Fisher Information Metric

$g_{θ} (u, v) := u^{⊤} I (θ) v$

Confidence Ellipsoid (Credible Region)

$R_{n} (α) := {θ : n (θ - \hat{θ}_{n})^{⊤} I (\hat{θ}_{n}) (θ - \hat{θ}_{n}) \leq χ_{d, 1 - α}^{2}}$

Projection Interval for Linear Effect

$Ψ_{n} (α) = c^{⊤} \hat{θ}_{n} - \frac{χ _{d, 1 - α}^{2}}{n} c^{⊤} I^{- 1} c, c^{⊤} \hat{θ}_{n} + \frac{χ _{d, 1 - α}^{2}}{n} c^{⊤} I^{- 1} c$

Geometric Robustness

$C_{n} (α) := {ψ (θ) : θ \in R_{n} (α) \cap I_{n}}$

Multi-Experiment Consensus Region

$R_{cons} := k = 1 ⋂ K R_{k} (α_{k})$

D-Optimal Design

$ξ^{*} = ar g ξ max det I_{n} (θ_{0}; ξ)$

💭 Thinking Questions

Question 1: Why Ellipsoid Instead of Sphere?

Hint: Consider correlations between parameters

Answer:

If parameters are completely independent ( $I (θ)$ diagonal), confidence region is a sphere (same uncertainty in all directions).

But in practice, parameters are often correlated:

Intercept $β_{0}$ and slope $β_{1}$ usually negatively correlated (seesaw effect)
Fisher information matrix $I (θ)$ non-diagonal
Confidence region is ellipsoid (different uncertainty in different directions)

Geometric Intuition:

Long axis of ellipsoid → Direction where parameters “hard to identify”
Short axis of ellipsoid → Direction where parameters “easy to identify”

Question 2: What Does Empty Consensus Region Mean?

Hint: Think of quantum uncertainty principle

Answer:

$R_{cons} = k = 1 ⋂ K R_{k} = \emptyset$

Physical Meaning:

Significant Conflict: Studies have fundamental contradiction, cannot simultaneously satisfy all constraints
Model Mismatch: Assumptions of some studies may be wrong
Heterogeneity: Different studies may measure different parameters (e.g., effects in different populations)

Quantum Analogy:

Like simultaneously precisely measuring position and momentum → Uncertainty principle forbids!

Intersection of credible regions of multiple studies empty → “Geometric incompatibility” in parameter space!

Question 3: How to Define Error Geometry in Quantum Gravity?

Hint: Recall quantum fluctuations of causal diamond

Answer:

In quantum gravity, spacetime geometry itself has quantum fluctuations!

Classical GLS:

$R_{classical} = {g_{μν} : satisfies causal constraints}$

Quantum GLS (path integral):

$R_{quantum} = \int D g_{μν} e^{i S [g]} δ (causal constraints)$

Geometric Uncertainty:

At Planck scale $ℓ_{P}$ , spacetime has “foam” fluctuations
Credible region becomes measure in function space
Robustness → Topological invariance under quantum fluctuations

Example:

Correction terms to Bekenstein-Hawking entropy:

$S = \frac{A}{4 G} + α lo g (\frac{A}{ℓ _{P}^{2}}) + \dots$

Credible region $R_{α}$ of $α$ determines robust predictions of quantum gravitational corrections!

🎯 Core Insights

Error = Geometric Boundary

Traditional: Error = Supplementary information

GLS: Error = Geometric region in parameter space (credible region)
Robustness = Geometric Invariance

Traditional: Robustness = “Results similar”

GLS: Robustness = Conclusion holds over entire credible region
Causal Inference = Credible Region ∩ Identifiable Set

$Robust Causal Conclusion ⟷ R_{n} (α) \cap I_{n}$
Meta-Analysis = Intersection and Union of Regions
- Consensus = Intersection $⋂ R_{k}$
- Conflict = Symmetric difference $⋃ R_{k} ∖ ⋂ R_{k}$
Experimental Design = Shaping Geometric Shape

Actively shape credible ellipsoid through Fisher information $I (θ; ξ)$

📚 Connections with Other Chapters

With Causal Geometrization (Chapter 8)

Chapter 8: Spacetime geometry = Compression of causal constraints
This chapter: Parameter geometry = Compression of statistical constraints

Unified Perspective:

$Geometry = Optimal Encoding of Constraints + Error Boundary$

With Boundary Theory (Chapter 6)

Boundary encodes bulk information ↔ Ellipsoid boundary encodes parameter uncertainty
Uncertainty of Brown-York energy ↔ Ellipsoid shape determined by Fisher information

With Unified Time (Chapter 5)

Uncertainty of time scale $κ (ω)$ :

$κ \in [κ_{m i n}, κ_{m a x}]$

Robust Causal Arrow: Only when all $κ \in [κ_{m i n}, κ_{m a x}]$ give same causal direction, is arrow robust!

📖 Further Reading

Classical Statistics:

van der Vaart (1998): Asymptotic Statistics (asymptotic theory)
Bickel & Doksum (2015): Mathematical Statistics (confidence regions)

Causal Inference:

Manski (2003): Partial Identification (identifiable sets)
Imbens & Rubin (2015): Causal Inference (robustness analysis)

Experimental Design:

Pukelsheim (2006): Optimal Design of Experiments (Fisher information and design)

GLS Theory Source Documents:

error-geometry-causal-robustness.md (source of this chapter)

Next Chapter Preview: 10-Unified Theorem Complete Proof of Causality-Time-Entropy

We will see how causality, time, and entropy are unified through rigorous mathematical proof!

Return: Causal Structure Overview

Previous Chapter: 08-Causal Geometrization

Keyboard shortcuts

Meta Theory of the Zeckendorf-Hilbert Universe