Information Geometry: Metric Structure of Probability
“Probability distributions form a manifold, Fisher information is its metric.” — Shun-ichi Amari
🎯 Core Idea
🎯 Core Idea
We usually think probability distributions are just sets of numbers.
Information Geometry offers a geometric perspective:
Families of probability distributions can be viewed as differential manifolds, where the Fisher information matrix defines a Riemannian metric!
- Points Probability distributions
- Distance Relative entropy (KL divergence) / Fisher-Rao distance
- Metric Fisher information matrix
- Geodesics Optimal inference paths or exponential families
This constitutes one of the mathematical foundations of the IGVP (Information Geometric Variational Principle).
🗺️ Space of Probability Distributions
Simple Example: Coin Toss
Consider a biased coin with probability of heads:
All possible probability distributions form a one-dimensional manifold (open interval ).
graph LR
P0["p -> 0<br/>Tends to Tails"] --> P25["p=0.25"]
P25 --> P50["p=0.5<br/>Fair Coin"]
P50 --> P75["p=0.75"]
P75 --> P1["p -> 1<br/>Tends to Heads"]
style P50 fill:#fff4e1,stroke:#ff6b6b,stroke-width:2px
Question: How to naturally define the “distance” between two distributions and ?
📏 Kullback-Leibler Divergence (Relative Entropy)
Definition
KL divergence (Kullback-Leibler divergence) is a standard measure of the difference between two probability distributions:
Or continuous case:
Physical and Information-Theoretic Meaning
- Information Gain: The amount of information gained when updating a prior distribution to a posterior distribution .
- Encoding Cost: The expected extra bits required to encode data distributed according to using a code optimized for .
Properties
- Non-negativity: (Gibbs’ inequality).
- Identity: (almost everywhere).
- Asymmetry: Generally (thus it is not a strict metric distance!).
graph LR
P["Distribution p"] --> |"D_KL(p||q)"| Q["Distribution q"]
Q --> |"D_KL(q||p) ≠ D_KL(p||q)"| P
style P fill:#e1f5ff
style Q fill:#ffe1e1
🧮 Fisher Information Matrix
From KL Divergence to Fisher Metric
Consider parameterized distribution family , where .
The Fisher information matrix can be defined as the second-order expansion term of KL divergence near a point:
Where:
Geometric Meaning
The Fisher information matrix defines a Riemannian metric (Fisher-Rao metric)!
It is not only the unique metric (in the sense of Chentsov’s theorem) invariant under sufficient statistics, but also endows the probability manifold with a curved geometric structure.
Line element:
This means that in information geometry, “distance” is determined by the difficulty of distinguishing between two distributions.
graph TB
M["Probability Distribution Manifold 𝓜"] --> G["Fisher Metric g_ij"]
G --> D["Geodesics = Optimal Inference Paths"]
G --> C["Curvature = Non-trivial Correlations"]
style M fill:#fff4e1,stroke:#ff6b6b,stroke-width:2px
style G fill:#e1ffe1
🌀 Simple Example: Bernoulli Distribution
Parameterization
Bernoulli distribution:
Log-likelihood:
Fisher Information
Calculate the variance of the score function:
Fisher-Rao Distance
The geodesic distance between two Bernoulli distributions and :
Calculating:
This is known as the Bhattacharyya distance, corresponding to the great-circle distance on a sphere.
🔄 Quantum Relative Entropy
Definition
For quantum states (density operators) and , quantum relative entropy is defined as:
Properties
- Non-negativity: (Klein inequality).
- Monotonicity: For any completely positive trace-preserving (CPTP) map , . This reflects the Data Processing Inequality: information processing cannot increase distinguishability.
- Joint Convexity: is jointly convex in .
Physical Connection
In thermodynamics, if is a Gibbs state, relative entropy is proportional to the free energy difference:
This gives relative entropy a clear thermodynamic interpretation: the degree of deviation from equilibrium.
🎓 Application Models in IGVP
Variation of Generalized Entropy
In the IGVP framework, we postulate that spacetime dynamics follow a variational principle of generalized entropy. First-order condition:
Where generalized entropy includes an area term (Bekenstein-Hawking entropy) and a matter entropy term.
Second-Order Condition: Stability
The second-order variation involves the second derivative of relative entropy. The stability condition requires:
Physically, this corresponds to thermodynamic stability of the system; mathematically, it relates to the positive definiteness of Fisher information.
graph TB
I["IGVP Framework"] --> F["First-Order: δS_gen = 0"]
I --> S["Second-Order: δ²S_rel ≥ 0"]
F --> E["Derives Einstein Equation<br/>(Theoretical Conjecture)"]
S --> H["Corresponds to Hollands-Wald<br/>Stability Condition"]
style I fill:#fff4e1,stroke:#ff6b6b,stroke-width:3px
style E fill:#e1ffe1
style H fill:#ffe1e1
Fisher Metric and Spacetime Metric
From an information geometry perspective, there may be a deep connection between the Fisher metric on the probability manifold and the spacetime metric . IGVP attempts to establish this holographic correspondence.
📝 Key Concepts Summary
| Concept | Definition/Formula | Meaning |
|---|---|---|
| KL Divergence | Relative entropy | |
| Fisher Information | Probability metric | |
| Fisher-Rao Metric | Metric on distribution space | |
| Quantum Relative Entropy | Quantum version of KL divergence | |
| Cramér-Rao Bound | Lower bound on estimation precision |
🎓 Further Reading
- Classic textbook: S. Amari, Information Geometry and Its Applications (Springer, 2016)
- Quantum information: M. Hayashi, Quantum Information Theory (Springer, 2017)
- GLS application: igvp-einstein-complete.md
- Next: 06-category-theory_en.md - Category Theory Basics
🤔 Exercises
-
Conceptual Understanding:
- Why is KL divergence asymmetric?
- Why is Fisher information a metric?
- What is the physical meaning of monotonicity of quantum relative entropy?
-
Calculation Exercises:
- Calculate KL divergence of two normal distributions and
- Verify Fisher information formula for Bernoulli distribution
- For density matrix, calculate quantum relative entropy
-
Physical Applications:
- Application of Cramér-Rao bound in quantum measurement
- What is the relationship between Fisher information and quantum Fisher information?
- Role of relative entropy in black hole thermodynamics
-
Advanced Thinking:
- Can we define symmetric “distance”? (Hint: Bhattacharyya distance)
- What is the meaning of curvature of Fisher metric?
- What is the connection between information geometry and thermodynamic geometry?
Next Step: Finally, we will learn Category Theory Basics—“mathematics of mathematics,” key to understanding QCA universe and matrix universe!