Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Information Geometry: Metric Structure of Probability

“Probability distributions form a manifold, Fisher information is its metric.” — Shun-ichi Amari

🎯 Core Idea

🎯 Core Idea

We usually think probability distributions are just sets of numbers.

Information Geometry offers a geometric perspective:

Families of probability distributions can be viewed as differential manifolds, where the Fisher information matrix defines a Riemannian metric!

  • Points Probability distributions
  • Distance Relative entropy (KL divergence) / Fisher-Rao distance
  • Metric Fisher information matrix
  • Geodesics Optimal inference paths or exponential families

This constitutes one of the mathematical foundations of the IGVP (Information Geometric Variational Principle).

🗺️ Space of Probability Distributions

Simple Example: Coin Toss

Consider a biased coin with probability of heads:

All possible probability distributions form a one-dimensional manifold (open interval ).

graph LR
    P0["p -> 0<br/>Tends to Tails"] --> P25["p=0.25"]
    P25 --> P50["p=0.5<br/>Fair Coin"]
    P50 --> P75["p=0.75"]
    P75 --> P1["p -> 1<br/>Tends to Heads"]

    style P50 fill:#fff4e1,stroke:#ff6b6b,stroke-width:2px

Question: How to naturally define the “distance” between two distributions and ?

📏 Kullback-Leibler Divergence (Relative Entropy)

Definition

KL divergence (Kullback-Leibler divergence) is a standard measure of the difference between two probability distributions:

Or continuous case:

Physical and Information-Theoretic Meaning

  • Information Gain: The amount of information gained when updating a prior distribution to a posterior distribution .
  • Encoding Cost: The expected extra bits required to encode data distributed according to using a code optimized for .

Properties

  1. Non-negativity: (Gibbs’ inequality).
  2. Identity: (almost everywhere).
  3. Asymmetry: Generally (thus it is not a strict metric distance!).
graph LR
    P["Distribution p"] --> |"D_KL(p||q)"| Q["Distribution q"]
    Q --> |"D_KL(q||p) ≠ D_KL(p||q)"| P

    style P fill:#e1f5ff
    style Q fill:#ffe1e1

🧮 Fisher Information Matrix

From KL Divergence to Fisher Metric

Consider parameterized distribution family , where .

The Fisher information matrix can be defined as the second-order expansion term of KL divergence near a point:

Where:

Geometric Meaning

The Fisher information matrix defines a Riemannian metric (Fisher-Rao metric)!

It is not only the unique metric (in the sense of Chentsov’s theorem) invariant under sufficient statistics, but also endows the probability manifold with a curved geometric structure.

Line element:

This means that in information geometry, “distance” is determined by the difficulty of distinguishing between two distributions.

graph TB
    M["Probability Distribution Manifold 𝓜"] --> G["Fisher Metric g_ij"]
    G --> D["Geodesics = Optimal Inference Paths"]
    G --> C["Curvature = Non-trivial Correlations"]

    style M fill:#fff4e1,stroke:#ff6b6b,stroke-width:2px
    style G fill:#e1ffe1

🌀 Simple Example: Bernoulli Distribution

Parameterization

Bernoulli distribution:

Log-likelihood:

Fisher Information

Calculate the variance of the score function:

Fisher-Rao Distance

The geodesic distance between two Bernoulli distributions and :

Calculating:

This is known as the Bhattacharyya distance, corresponding to the great-circle distance on a sphere.

🔄 Quantum Relative Entropy

Definition

For quantum states (density operators) and , quantum relative entropy is defined as:

Properties

  1. Non-negativity: (Klein inequality).
  2. Monotonicity: For any completely positive trace-preserving (CPTP) map , . This reflects the Data Processing Inequality: information processing cannot increase distinguishability.
  3. Joint Convexity: is jointly convex in .

Physical Connection

In thermodynamics, if is a Gibbs state, relative entropy is proportional to the free energy difference:

This gives relative entropy a clear thermodynamic interpretation: the degree of deviation from equilibrium.

🎓 Application Models in IGVP

Variation of Generalized Entropy

In the IGVP framework, we postulate that spacetime dynamics follow a variational principle of generalized entropy. First-order condition:

Where generalized entropy includes an area term (Bekenstein-Hawking entropy) and a matter entropy term.

Second-Order Condition: Stability

The second-order variation involves the second derivative of relative entropy. The stability condition requires:

Physically, this corresponds to thermodynamic stability of the system; mathematically, it relates to the positive definiteness of Fisher information.

graph TB
    I["IGVP Framework"] --> F["First-Order: δS_gen = 0"]
    I --> S["Second-Order: δ²S_rel ≥ 0"]

    F --> E["Derives Einstein Equation<br/>(Theoretical Conjecture)"]
    S --> H["Corresponds to Hollands-Wald<br/>Stability Condition"]

    style I fill:#fff4e1,stroke:#ff6b6b,stroke-width:3px
    style E fill:#e1ffe1
    style H fill:#ffe1e1

Fisher Metric and Spacetime Metric

From an information geometry perspective, there may be a deep connection between the Fisher metric on the probability manifold and the spacetime metric . IGVP attempts to establish this holographic correspondence.

📝 Key Concepts Summary

ConceptDefinition/FormulaMeaning
KL DivergenceRelative entropy
Fisher InformationProbability metric
Fisher-Rao MetricMetric on distribution space
Quantum Relative EntropyQuantum version of KL divergence
Cramér-Rao BoundLower bound on estimation precision

🎓 Further Reading

  • Classic textbook: S. Amari, Information Geometry and Its Applications (Springer, 2016)
  • Quantum information: M. Hayashi, Quantum Information Theory (Springer, 2017)
  • GLS application: igvp-einstein-complete.md
  • Next: 06-category-theory_en.md - Category Theory Basics

🤔 Exercises

  1. Conceptual Understanding:

    • Why is KL divergence asymmetric?
    • Why is Fisher information a metric?
    • What is the physical meaning of monotonicity of quantum relative entropy?
  2. Calculation Exercises:

    • Calculate KL divergence of two normal distributions and
    • Verify Fisher information formula for Bernoulli distribution
    • For density matrix, calculate quantum relative entropy
  3. Physical Applications:

    • Application of Cramér-Rao bound in quantum measurement
    • What is the relationship between Fisher information and quantum Fisher information?
    • Role of relative entropy in black hole thermodynamics
  4. Advanced Thinking:

    • Can we define symmetric “distance”? (Hint: Bhattacharyya distance)
    • What is the meaning of curvature of Fisher metric?
    • What is the connection between information geometry and thermodynamic geometry?

Next Step: Finally, we will learn Category Theory Basics—“mathematics of mathematics,” key to understanding QCA universe and matrix universe!