Chapter 4: Attention–Time–Knowledge Graph—Information Accumulation Geometry of Time Selection

Introduction: Time Scale of Attention

When you focus on reading in a crowded café, surrounding noise “disappears”; when you are highly alert in an emergency, time seems to “slow down”—both phenomena involve how attention shapes time experience.

In previous chapters, we established mathematical framework of observer sections: Observer accesses local information of universe through “slices” of worldline $(γ (τ), A_{γ, Λ} (τ), ρ_{γ, Λ} (τ))$ . But a key question remains unanswered: How does observer choose “what to look at”?

This chapter will construct unified theory of attention–time–knowledge graph, revealing:

Attention is observer’s time-dependent weight distribution on information manifold
Time selection implements geometric constraints of information accumulation through attention operators
Knowledge graph is discrete skeleton of information manifold, spectrally converging to true geometry in long-time limit

This theory transforms scarcity resource allocation problem in “attention economics” into variational optimization problem on information geometry.

graph TB
    subgraph "Observer Three-Layer Structure"
        A["Attention Operator A<sub>t</sub><br/>Choose 'What to See'"]
        B["Knowledge Graph G<sub>t</sub><br/>Encode 'What to Know'"]
        C["Eigen Time Scale τ(t)<br/>Experience 'How Long Passed'"]
    end

    subgraph "Information Geometric Constraints"
        D["Task Information Manifold (S<sub>Q</sub>,g<sub>Q</sub>)"]
        E["Complexity Manifold (M,G)"]
    end

    A --> B
    B --> C
    D --> A
    E --> C
    A --> D
    C --> E

    style A fill:#e1f5ff
    style B fill:#fff4e1
    style C fill:#ffe1f5
    style D fill:#e1ffe1

Core Insight: Triple Constraints of Information Accumulation

Imagine an explorer mapping unknown terrain:

Attention Bandwidth: He can only explore limited area each day (support of attention operator $A_{t}$ )
Complexity Budget: Total length of walking path limited by physical strength (worldline length $C_{m a x}$ on complexity manifold)
Knowledge Graph: Map he draws is discrete sampling points (nodes $V_{t}$ ) and paths (edges $E_{t}$ ), gradually approximating true terrain

This chapter will prove: Under these triple constraints, information amount $H_{Q} (T)$ observer can accumulate has linear upper bound with complexity budget $C_{m a x}$ , and spectral dimension of knowledge graph converges to true dimension of information manifold in long-time limit.

Part One: Attention Operator—Observer’s “Spotlight”

1.1 Dual Formalization of Attention

In computational universe framework $(X, T, C, I)$ , observer cannot simultaneously access all configurations $x \in X$ . Attention characterizes observer’s “visible window” at each moment.

Definition 1.1 (Discrete Attention Operator)

At time step $k$ , observer’s attention operator is function:

$A_{k} : X \to [0, 1]$

satisfying normalization:

$x \in X \sum A_{k} (x) = 1$

or weak constraint $\sum_{x \in X} A_{k} (x) \leq B_{att}$ (total attention bandwidth).

Visible configuration subset defined as:

$X_{k}^{att} = {x \in X : A_{k} (x) > 0}$

Physical Picture: $A_{k} (x)$ is weight observer allocates “cognitive resources” to configuration $x$ . Like spotlight on stage, highlights certain regions, other regions in “shadow”.

Definition 1.2 (Continuous Attention Density)

On task information manifold $(S_{Q}, g_{Q})$ , attention can be represented as probability density:

$ρ_{t} (ϕ) \geq 0, \int_{S_{Q}} ρ_{t} (ϕ) d μ_{g_{Q}} (ϕ) = 1$

where $d μ_{g_{Q}} = det g_{Q} d ϕ$ is volume element.

graph LR
    subgraph "Discrete Attention"
        A["Configuration Space X"]
        B["Weight A<sub>k</sub>(x)"]
        C["Visible Subset X<sub>k</sub><sup>att</sup>"]
    end

    subgraph "Continuous Attention"
        D["Information Manifold S<sub>Q</sub>"]
        E["Density ρ<sub>t</sub>(φ)"]
        F["High Attention Region supp(ρ<sub>t</sub>)"]
    end

    A --> D
    B --> E
    C --> F

    style B fill:#e1f5ff
    style E fill:#fff4e1

1.2 Attention and Observer Internal State

Observer’s attention determined by internal memory state $m_{k} \in M_{int}$ . Formalized as:

Observer Object:

$O = (M_{int}, Σ_{obs}, Σ_{act}, P, U)$

where:

$M_{int}$ : Internal memory state space (finite or countable)
$Σ_{obs}$ : Observation symbol space
$Σ_{act}$ : Action space
$P : M_{int} \to Δ (Σ_{act})$ : Attention–observation strategy
$U : M_{int} \times Σ_{obs} \to M_{int}$ : Internal update operator

Dynamics Loop:

Observer in internal state $m_{k}$ , universe in configuration $x_{k}$
According to strategy $P (m_{k})$ select action $a_{k}$ (decide “where to look”)
Universe returns observation symbol $o_{k} \sim p (o ∣ x_{k}, a_{k})$
Update internal state $m_{k + 1} = U (m_{k}, o_{k})$

Attention operator $A_{k}$ jointly determined by $P (m_{k})$ and current orbit position $(x_{k}, ϕ_{k})$ .

Analogy: Imagine a miner with headlamp exploring dark cave. His “memory map” $m_{k}$ tells him “where explored”, his “decision strategy” $P$ tells him “which direction to shine headlamp next”, each observation $o_{k}$ updates map $m_{k + 1}$ .

1.3 Resource Constraints of Attention

Constraint 1 (Bandwidth Constraint):

$\int_{S_{Q}} ρ_{t} (ϕ) d μ_{g_{Q}} (ϕ) = 1$

Normalization condition means total attention conserved—focusing somewhere necessarily ignores elsewhere.

Constraint 2 (Second Moment Constraint):

$\int_{S_{Q}} ρ_{t} (ϕ) d_{S_{Q}}^{2} (ϕ, \overset{ˉ}{ϕ}) d μ_{g_{Q}} (ϕ) \leq B_{att}$

For fixed reference point $\overset{ˉ}{ϕ}$ , “spatial dispersion” of attention bounded. This prevents attention over-dispersed (cannot focus) or over-concentrated (narrow field of view).

Physical Analogy: In quantum mechanics, wave function normalization $\int ∣ ψ ∣^{2} = 1$ corresponds to probability conservation; here normalization of $ρ_{t}$ corresponds to “total cognitive resource conservation”. Second moment constraint similar to Heisenberg uncertainty principle: Cannot simultaneously achieve “high focus” and “wide coverage”.

Part Two: Knowledge Graph—Discrete Skeleton of Information Manifold

2.1 Four-Tuple Definition of Knowledge Graph

Observer cannot access all points of information manifold $S_{Q}$ through finite-time exploration. Knowledge representation he constructs is discrete graph:

Definition 2.1 (Knowledge Graph at Time $t$ )

$G_{t} = (V_{t}, E_{t}, w_{t}, Φ_{t})$

where:

$V_{t}$ : Finite node set, each node represents a “concept” or “abstract state”
$E_{t} \subset V_{t} \times V_{t}$ : Directed or undirected edge set, represents relationships between concepts (causal, implication, similarity, etc.)
$w_{t} : E_{t} \to (0, \infty)$ : Edge weights, represent relationship strength
Embedding map $Φ_{t} : V_{t} \to S_{Q}$ : Embeds each node to point in information manifold

graph TB
    subgraph "Knowledge Graph G<sub>t</sub>"
        V1["Concept 1"]
        V2["Concept 2"]
        V3["Concept 3"]
        V4["Concept 4"]
        V1 -->|w<sub>12</sub>| V2
        V2 -->|w<sub>23</sub>| V3
        V1 -->|w<sub>14</sub>| V4
        V3 -->|w<sub>34</sub>| V4
    end

    subgraph "Information Manifold S<sub>Q</sub>"
        P1["Φ<sub>t</sub>(V1)"]
        P2["Φ<sub>t</sub>(V2)"]
        P3["Φ<sub>t</sub>(V3)"]
        P4["Φ<sub>t</sub>(V4)"]
    end

    V1 -.embedding.-> P1
    V2 -.embedding.-> P2
    V3 -.embedding.-> P3
    V4 -.embedding.-> P4

    style V1 fill:#e1f5ff
    style V2 fill:#fff4e1
    style V3 fill:#ffe1f5
    style V4 fill:#e1ffe1

Physical Picture: Knowledge graph like “sampling points on map”. Explorer cannot measure every inch of terrain, can only stake at key positions (nodes $V_{t}$ ), mark paths (edges $E_{t}$ ), record relative positions (embedding $Φ_{t}$ ).

2.2 Graph Laplace Operator and Spectral Structure

Define graph Laplace operator on knowledge graph:

$(Δ_{t} f) (v) = u \sim v \sum w_{t} (v, u) (f (u) - f (v))$

for any function $f : V_{t} \to R$ . This is discrete version of Laplace–Beltrami operator.

Spectral Properties:

Eigenvalues $0 = λ_{1}^{(t)} \leq λ_{2}^{(t)} \leq \dots \leq λ_{∣ V_{t} ∣}^{(t)}$
Smallest nonzero eigenvalue $λ_{2}^{(t)}$ (Fiedler value) characterizes graph’s “connectivity”
Distribution of large eigenvalues characterizes graph’s “local geometry”

Definition 2.2 (Spectral Dimension of Knowledge Graph)

$d_{spec} (t) = - 2 ε ↓ 0 lim \frac{lo g Tr exp ( ε Δ _{t} )}{lo g ε}$

if this limit exists. $d_{spec} (t)$ describes graph’s “effective dimension” at small scales.

2.3 Spectral Approximation: From Discrete to Continuous

Key Question: To what extent does knowledge graph $G_{t}$ “faithfully” represent information manifold $S_{Q}$ ?

Definition 2.3 (Spectral Approximation Condition)

Say $G_{t}$ spectrally approximates $(S_{Q}, g_{Q})$ if:

Embedding points $Φ_{t} (V_{t})$ become dense in $S_{Q}$ as $t \to \infty$
Kernel weights constructed from embedding:

$w_{t} (v, u) = η_{t}^{- d} K (\frac{d _{S_{Q}} ( Φ _{t} ( v ) , Φ _{t} ( u ))}{η _{t}})$

makes graph Laplace $Δ_{t}$ $Γ$ -converge to continuous Laplace–Beltrami operator $Δ_{g_{Q}}$ under appropriate scaling.

Theorem 2.1 (Spectral Dimension Convergence)

If $G_{t}$ spectrally approximates $(S_{Q}, g_{Q})$ , and local information dimension $d_{info, Q}$ of information manifold is constant on compact region $K$ , then:

$t \to \infty lim d_{spec} (t) = d_{info, Q}$

Meaning: In long-term learning process, spectral dimension of observer’s knowledge graph tends to true dimension of information manifold, meaning knowledge graph gradually becomes high-fidelity skeleton of information manifold geometrically.

Analogy: Imagine approximating sphere with triangular mesh. As number of mesh nodes increases and spacing shrinks, spectrum of discrete Laplace operator approximates spectrum of continuous Laplace operator on sphere (corresponding to spherical harmonics). Here, observer’s knowledge graph plays role of “triangular mesh”.

Part Three: Extended Worldline—Joint Dynamics of Attention and Knowledge

3.1 Joint State Space of Observer–Universe

In previous chapters, we defined observer’s worldline on joint manifold $E_{Q} = M \times S_{Q}$ :

$z (t) = (θ (t), ϕ (t))$

where $θ (t) \in M$ is control manifold coordinate, $ϕ (t) \in S_{Q}$ is information manifold coordinate.

Now introduce extended worldline, including internal state, knowledge graph, and attention:

Definition 3.1 (Extended Worldline)

$z (t) = (θ (t), ϕ (t), m (t), G_{t}, A_{t})$

where:

$(θ (t), ϕ (t)) \in E_{Q}$ : Control–information state
$m (t) \in M_{int}$ : Internal memory state
$G_{t} = (V_{t}, E_{t}, w_{t}, Φ_{t})$ : Knowledge graph
$A_{t}$ or $ρ_{t}$ : Attention operator

graph TB
    subgraph "Extended State Space"
        A["(θ,φ)∈E<sub>Q</sub><br/>Physical–Information Position"]
        B["m∈M<sub>int</sub><br/>Internal Memory"]
        C["G<sub>t</sub><br/>Knowledge Graph"]
        D["A<sub>t</sub><br/>Attention"]
    end

    E["Complexity Manifold M"]
    F["Information Manifold S<sub>Q</sub>"]

    E --> A
    F --> A
    A --> B
    B --> C
    B --> D
    C --> D

    style A fill:#e1f5ff
    style C fill:#fff4e1
    style D fill:#ffe1f5

3.2 Observation–Computation Action

On basis of time–information–complexity joint action $A_{Q}$ , add observer internal cost:

Definition 3.2 (Observer–Computation Joint Action)

$A_{Q} [z (\cdot)] = \int_{0}^{T} (K_{comp} (t) + K_{info} (t) - γ U_{Q} (ϕ (t)) + R_{KG} (t) + R_{att} (t)) d t$

where:

Complexity Kinetic Energy:

$K_{comp} (t) = \frac{1}{2} α^{2} G_{ab} (θ) \dot{θ}^{a} \dot{θ}^{b}$

Information Kinetic Energy:

$K_{info} (t) = \frac{1}{2} β^{2} g_{ij} (ϕ) \dot{ϕ}^{i} \dot{ϕ}^{j}$

Knowledge Potential Energy:

$U_{Q} (ϕ) = I_{Q} (ϕ)$

where $I_{Q} (ϕ)$ is task information quality function (like relative entropy, mutual information).

Knowledge Graph Update Cost:

$R_{KG} (t) = λ_{KG} D (G_{t + d t}, G_{t})$

where $D$ is distance between graphs (like spectral distance, Gromov–Wasserstein distance).

Attention Configuration Cost:

$R_{att} (t) = λ_{att} C_{att} (A_{t})$

For example entropy regularization $C_{att} (ρ_{t}) = - \int ρ_{t} lo g ρ_{t} d μ_{g_{Q}}$ , or bandwidth constraint.

Physical Interpretation:

First three terms: Previous time–information–complexity variational
$R_{KG}$ : Frequently updating knowledge graph (adding new nodes, adjusting edge weights) requires “cognitive cost”
$R_{att}$ : Changing attention configuration (like switching focus region) requires “switching cost”

3.3 Euler–Lagrange Conditions and Optimal Strategy

Minimizing $A_{Q}$ gives optimal observation–computation–learning strategy. Formally:

$\frac{δ A _{Q}}{δ θ} = 0, \frac{δ A _{Q}}{δ ϕ} = 0, \frac{δ A _{Q}}{δ G _{t}} = 0, \frac{δ A _{Q}}{δ A _{t}} = 0$

First two equations give geodesic equations for control–information coordinates; last two give:

Optimal Knowledge Graph Update: At each moment, balance “information gain” with “graph update cost”, choose optimal node addition/edge adjustment strategy.

Optimal Attention Configuration: Under given bandwidth constraint, choose attention distribution $ρ_{t}^{*}$ maximizing short-term information gain.

Analogy: Like mountaineer planning route under limited physical strength and time. He needs to balance:

Walking fast (small complexity kinetic energy) vs reaching high-value regions (low information potential energy)
Frequently updating map (high $R_{KG}$ ) vs using rough map (low $R_{KG}$ )
Wide scanning (high $R_{att}$ ) vs focusing locally (low $R_{att}$ )

Part Four: Information Accumulation Inequality—Upper Bound Under Resource Constraints

4.1 Measure of Knowledge Amount

Define observer’s knowledge amount for task $Q$ :

$H_{Q} (t) = v \in V_{t} \sum π_{t} (v) I_{Q} (Φ_{t} (v))$

where $π_{t}$ is weight distribution on knowledge graph nodes (like visit frequency, importance score).

Information Accumulation Rate:

$\dot{H}_{Q} (t) = \frac{d}{d t} H_{Q} (t)$

Physical Meaning: $H_{Q} (t)$ measures how much observer “knows” about task $Q$ . $\dot{H}_{Q} (t)$ measures “learning speed”.

4.2 Fisher Information Acquisition Rate

Assume observer samples information manifold through attention density $ρ_{t} (ϕ)$ , its single-step Fisher information acquisition rate:

$J (t) = \int_{S_{Q}} ρ_{t} (ϕ) ∣\nabla I_{Q} (ϕ) ∣_{g_{Q}}^{2} d μ_{g_{Q}} (ϕ)$

Meaning: $J (t)$ characterizes “expected squared gradient of information quality function $I_{Q}$ under current attention configuration”—equivalent to “steepness of current learning direction”.

In complexity–information joint geometry, $J (t)$ and information kinetic energy $v_{S_{Q}}^{2} (t)$ related through Lipschitz relation:

$J (t) ≲ L_{I}^{2} v_{S_{Q}}^{2} (t)$

where $L_{I}$ is Lipschitz constant of $I_{Q}$ .

4.3 Information Accumulation Inequality

Theorem 4.1 (Observer Information Accumulation Upper Bound)

Assume:

Task information quality function $I_{Q}$ Lipschitz on $S_{Q}$ , gradient bounded:

$∣\nabla I_{Q} (ϕ) ∣_{g_{Q}} \leq C_{I}, \forall ϕ \in S_{Q}$

Second moment of attention density $ρ_{t}$ bounded:

$\int_{S_{Q}} ρ_{t} (ϕ) d_{S_{Q}}^{2} (ϕ, \overset{ˉ}{ϕ}) d μ_{g_{Q}} (ϕ) \leq B_{att}$

Observer’s complexity budget:

$C_{m a x} = \int_{0}^{T} G_{ab} (θ (t)) \dot{θ}^{a} \dot{θ}^{b} d t$

Then exists constant $K > 0$ (only depends on $C_{I}, B_{att}$ and geometric structure), such that:

$H_{Q} (T) - H_{Q} (0) \leq K C_{m a x}$

Proof Sketch:

Differentiate $H_{Q} (t)$ :

$\dot{H}_{Q} (t) = v \in V_{t} \sum \overset{π}{˙}_{t} (v) I_{Q} (Φ_{t} (v)) + v \in V_{t} \sum π_{t} (v) \nabla I_{Q} (Φ_{t} (v)) \cdot \dot{Φ}_{t} (v)$

Second term estimated using Cauchy–Schwarz inequality:

$v \sum π_{t} (v) \nabla I_{Q} (Φ_{t} (v)) \cdot \dot{Φ}_{t} (v) \leq C_{I} B_{att} v_{S_{Q}}^{2} (t)$

In joint action, $v_{S_{Q}}^{2} (t)$ and $v_{M}^{2} (t)$ coupled through weights $α, β$ . Using variational optimality, can prove:

$\int_{0}^{T} v_{S_{Q}}^{2} (t) d t \leq K_{2} C_{m a x}$

Thus:

$H_{Q} (T) - H_{Q} (0) \leq \int_{0}^{T} ∣ \dot{H}_{Q} (t) ∣ d t \leq K C_{m a x}$

graph LR
    A["Complexity Budget C<sub>max</sub>"] --> B["Information Velocity v<sub>S<sub>Q</sub></sub><sup>2</sup>(t)"]
    B --> C["Fisher Acquisition J(t)"]
    C --> D["Knowledge Growth Ḣ<sub>Q</sub>(t)"]
    D --> E["Total Knowledge H<sub>Q</sub>(T)"]

    F["Attention Bandwidth B<sub>att</sub>"] --> C
    G["Gradient Bound C<sub>I</sub>"] --> C

    style A fill:#e1f5ff
    style E fill:#ffe1e1

Physical Meaning:

Information accumulation amount has linear upper bound with complexity resources
Attention bandwidth $B_{att}$ and gradient bound $C_{I}$ only change proportionality constant $K$ , don’t change linear form
This is geometric expression of “cognitive resource scarcity”: Unlimited learning impossible, information acquisition rate constrained by physics

Analogy: Like drawing water from river with bucket. Bucket size ( $B_{att}$ ), river flow speed ( $C_{I}$ ), walking speed ( $v_{M}$ ) together determine water amount per unit time. But no matter how optimized, total water amount cannot exceed “total walking distance” ( $C_{m a x}$ ) times some constant.

4.4 Optimality Conditions for Time Selection

From information accumulation inequality can derive optimal strategy for time selection:

Corollary 4.2 (Optimal Attention Configuration)

Under fixed complexity budget $C_{m a x}$ and time window $[0, T]$ , attention strategy $ρ_{t}^{*}$ maximizing $H_{Q} (T)$ satisfies:

$ρ_{t}^{*} (ϕ) \propto ∣\nabla I_{Q} (ϕ) ∣_{g_{Q}}^{2} exp (- \frac{d _{S_{Q}}^{2} ( ϕ , ϕ _{t}^{*} )}{2 σ _{t}^{2}})$

where $ϕ_{t}^{*} = ϕ (t)$ is current orbit position, $σ_{t}^{2}$ is temperature parameter (determined by bandwidth constraint $B_{att}$ ).

Meaning: Optimal attention is combination of “squared gradient weighting” and “distance penalty”:

Prioritize regions with large information gradient (“learning boundary”)
But cannot be too far from current position (limited by attention bandwidth)

Physical Analogy: In quantum mechanics, choice of measurement operator affects information acquisition rate (Cramér–Rao lower bound); in classical information theory, communication channel capacity limits information transmission rate. Here, attention operator plays role of “measurement operator” or “channel”, optimizing information flow under geometric constraints.

Part Five: Time Sense and Attention Modulation

5.1 From Fisher Information to Subjective Duration

Recall subjective duration definition from Chapter 3:

$t_{subj} (τ) = \int_{0}^{τ} (F_{Q}^{A} (t))^{- 1/2} d t$

where $F_{Q}^{A}$ is local quantum Fisher information. Now, connect it with attention operator:

Proposition 5.1 (Attention-Modulated Subjective Duration)

If observer’s quantum Fisher information and attention bandwidth satisfy:

$F_{Q}^{A} (t) \propto \frac{1}{B _{att} ( t )}$

(larger bandwidth, lower discriminability), then subjective duration can be written as:

$t_{subj} (τ) \approx \int_{0}^{τ} B_{att} (t) d t$

Meaning: Attention dispersion ( $B_{att}$ large) causes subjective duration extension.

Experimental Prediction:

In multi-task situations (attention dispersed across multiple objects), time experience “slows”
In single-focus tasks (attention concentrated on single object), time experience “speeds”

This consistent with classic “attention–time distortion” phenomenon: Complex tasks make time “pass slow”, simple repetitive tasks make time “pass fast”.

5.2 Knowledge Graph and Time Depth

Time depth concept: Observer’s perception of “how far past”, can be characterized by path length of knowledge graph.

Definition 5.1 (Time Depth)

On knowledge graph $G_{t}$ , shortest path length from current concept node $v_{now}$ back to “origin” node $v_{0}$ :

$D_{time} (t) = v_{0} \in V_{0} min d_{G_{t}} (v_{now}, v_{0})$

where $d_{G_{t}}$ is geodesic distance on graph.

Proposition 5.2 (Time Depth Growth Law)

Under spectral approximation condition, growth rate of time depth consistent with growth rate of geodesic distance on information manifold:

$\frac{d D _{time}}{d t} \sim v_{S_{Q}}^{2} (t)$

Meaning: Observer “walking fast” on information manifold, time depth of knowledge graph also grows fast—this corresponds to phenomenon “information-dense experience causes time experience extension”.

Analogy: Imagine a memoir. Relationships between chapters (nodes) (edges) constitute “memory graph”. Path length from “starting point” to “now” is psychological measure of “feeling how long passed”. Dense experiences (high $v_{S_{Q}}$ ) make memoir “length” grow faster.

graph TB
    subgraph "Sparse Experience"
        A1["Event 1"] --> A2["Event 2"] --> A3["Event 3"]
    end

    subgraph "Dense Experience"
        B1["Event 1"] --> B2["Event 2"]
        B2 --> B3["Event 3"]
        B2 --> B4["Event 4"]
        B3 --> B5["Event 5"]
        B4 --> B5
        B5 --> B6["Event 6"]
    end

    C["Short Time Depth"] -.corresponds to.- A3
    D["Long Time Depth"] -.corresponds to.- B6

    style A3 fill:#fff4e1
    style B6 fill:#ffe1f5

Part Six: Engineering Path—Attention Tracking and Knowledge Graph Visualization

6.1 Eye Movement Tracking and Attention Estimation

Experimental Design:

Present visual stimuli (like complex images, text paragraphs)
Record eye movement trajectory $(x_{gaze} (t), y_{gaze} (t))$ and fixation duration $t_{fixation}$
Construct spatial attention heatmap:

$A_{spatial} (x, y) = i \sum t_{fixation, i} δ (x - x_{i}, y - y_{i})$

Map to information manifold: If image pixel $(x, y)$ corresponds to feature vector $ϕ (x, y) \in S_{Q}$ , then:

$ρ_{t} (ϕ) = \int A_{spatial} (x, y) δ (ϕ - ϕ (x, y)) d x d y$

Verify Predictions:

Test if $ρ_{t}^{*}$ concentrates on regions with large $∣\nabla I_{Q} ∣$ (high information gradient)
Test relationship between attention bandwidth $B_{att}$ and task complexity

6.2 Neural Embedding of Concept Graph

Method:

Collect subject’s semantic association network: Given concept word $v_{i}$ , require listing related concepts $v_{j}$ , assign similarity score $w_{ij}$
Construct knowledge graph $G = (V, E, w, Φ)$ , where $V$ is concept set, $E$ is association relations
Use graph embedding algorithms (like Node2Vec, GCN) to learn embedding $Φ : V \to R^{d}$
Calculate graph Laplace spectrum ${λ_{i}}$ and heat kernel trace $Tr exp (ε Δ)$
Estimate spectral dimension $d_{spec}$

Verify Theorem 2.1:

Compare $d_{spec} (t)$ of subjects (different learning stages)
Expectation: After long-term learning, $d_{spec} (t)$ tends to true dimension of task information manifold

6.3 Active Learning and Information Accumulation

Algorithm Framework:

Initialize knowledge graph $G_{0}$ and attention $ρ_{0}$
At each time step $t$ :
- According to current $G_{t}, ρ_{t}$ select query action $a_{t}$ (active sampling)
- Observe result $o_{t}$ , update internal state $m_{t + 1} = U (m_{t}, o_{t})$
- Update knowledge graph $G_{t + 1}$ (add nodes/adjust edge weights)
- Update attention $ρ_{t + 1}$ (reallocate according to information gain)
Record information accumulation curve $H_{Q} (t)$ and complexity consumption $C (t)$

Verify Theorem 4.1:

Test if $H_{Q} (T)$ has linear relationship with $C (T)$
Estimate proportionality constant $K$ and dependence on attention bandwidth $B_{att}$

Extension: In multi-modal tasks (vision+language+auditory), nodes of knowledge graph from different modalities:

$V_{t} = V_{t}^{vis} \cup V_{t}^{lang} \cup V_{t}^{aud}$

Cross-modal edges $E_{t}^{cross}$ connect concepts from different modalities (like image “dog” $\leftrightarrow$ word “dog”).

Embedding Alignment:

Visual features $ϕ^{vis} \in S_{Q}^{vis}$
Language features $ϕ^{lang} \in S_{Q}^{lang}$
Cross-modal mapping $f : S_{Q}^{vis} \to S_{Q}^{lang}$ (like CLIP model)

Research Question: How does spectral dimension of cross-modal knowledge graph relate to dimensions of individual single-modal manifolds?

Part Seven: Dialogue with Existing Theories

7.1 Attention Economics

Classic Theory (Simon, 1971; Kahneman, 1973):

Attention is scarce resource, needs allocation across multiple tasks
Dual-task interference measures attention capacity

Extension of This Theory:

Geometricize “attention capacity” as bandwidth constraint $B_{att}$
Express “task interference” as spatial separation of multiple high-gradient regions on information manifold
Provide quantitative bridge from geometric constraints to behavioral predictions

7.2 Manifold Learning and Representation Learning

Classic Theory (Tenenbaum, 2000; Belkin & Niyogi, 2003):

High-dimensional data embedded in low-dimensional manifold
Graph Laplace converges to manifold Laplace

Innovation of This Theory:

Interpret “data manifold” as “task information manifold” $S_{Q}$
Formalize “learner” as observer object with limited memory
Graph convergence theorem (Theorem 2.1) gives geometric guarantee of “cognitive convergence”

7.3 Active Inference and Bayesian Brain

Classic Theory (Friston, 2010; Active Inference):

Brain minimizes “free energy” (variational lower bound)
Action selection minimizes prediction error

Connection of This Theory:

Joint action $A_{Q}$ can be viewed as “generalized free energy”
Attention operator $ρ_{t}$ corresponds to “precision weighting”
Knowledge graph update corresponds to implementation of “Bayesian filtering” on discrete graph

7.4 Scalar Expectancy Theory of Time Perception

Classic Theory (Scalar Expectancy Theory, Gibbon, 1977):

Internal clock model: Pacemaker–switch–accumulator
Attention modulates switch

Geometric Reconstruction of This Theory:

“Pacemaker frequency” $\leftrightarrow$ $F_{Q}^{A}$ (square root of quantum Fisher information)
“Accumulator” $\leftrightarrow$ subjective duration integral $t_{subj}$
“Attention switch” $\leftrightarrow$ attention bandwidth $B_{att}$ modulates $F_{Q}^{A}$

Part Eight: Discussion—Geometric Constraints of Cognitive Resources and Emergent Intelligence

8.1 Applicability Domain and Assumption Strength

Lipschitz and Bounded Gradient Assumption:

Lipschitz property of information quality function $I_{Q}$ guarantees linear upper bound of information accumulation inequality
In practice, $I_{Q}$ may have singular points (like phase transition boundaries)
Need localization or regularization

Spectral Approximation Assumption:

Knowledge graph $G_{t}$ spectrally approximating information manifold requires long-time limit $t \to \infty$
In finite time, spectral dimension $d_{spec} (t)$ may oscillate or deviate
Need introduce quantitative estimate of “approximation rate”

Finite Memory Assumption:

Finite internal state space $M_{int}$ limits observer’s “working memory capacity”
Actual cognitive systems may have hierarchical memory (short-term vs long-term)
Need extend to multi-scale memory model

8.2 Tightness of Information Accumulation Upper Bound

Question: Is upper bound $H_{Q} (T) \leq K C_{m a x}$ given by Theorem 4.1 tight?

Analysis:

Under “uniform exploration” strategy ( $ρ_{t}$ constant distribution), upper bound nearly achieved
Under “greedy exploration” strategy ( $ρ_{t}$ concentrated on current optimum), upper bound may be loose
Optimal strategy (Corollary 4.2) can achieve asymptotic tightness of upper bound in some cases

Engineering Meaning: Tight upper bound means “cannot break through linear growth rate by optimizing attention strategy”—this is fundamental limitation of cognitive resource scarcity.

8.3 From Single Observer to Multi-Observer

This chapter focuses on single observer. In Chapter 6 will extend to multi-observer consensus geometry, where:

Knowledge graphs $G_{t}^{(i)}$ heterogeneous across observers
Attention $ρ_{t}^{(i)}$ influenced by social network topology
Consensus energy $E_{cons}$ couples individual knowledge graphs

Expected Phenomena:

Spectral convergence speed of knowledge graph positively correlated with connectivity of social network
Joint information accumulation of multi-observer can break through linear upper bound of single observer (“collective intelligence” emergence)

8.4 Time Selection and Free Will

Philosophical Question: Does observer “freely choose” attention configuration?

Answer of This Theory:

Attention operator $A_{t}$ determined by internal state $m_{t}$ and strategy $P$ —this is “deterministic”
But strategy $P$ itself may be “emergent” (optimized from long-term learning)
“Free will” can be understood as “freedom of choice in optimal strategy space under geometric constraints”

In Chapter 5 will deeply explore Empowerment (causal control) $E_{T}$ , giving geometric characterization of free will.

Conclusion: Geometric Characterization of Attention and Information-Theoretic Foundation of Time Selection

This chapter constructed unified theory of attention–time–knowledge graph, transforming classic psychological concept of “cognitive resource scarcity” into strict theorem on information geometry.

Core Results Review:

Attention Operator Formalization: $A_{t} : X \to [0, 1]$ or $ρ_{t} (ϕ)$ on information manifold, limited by bandwidth constraint $B_{att}$
Knowledge Graph Spectral Convergence (Theorem 2.1):

$t \to \infty lim d_{spec} (t) = d_{info, Q}$

Observer’s knowledge graph spectrally approximates true geometry of information manifold in long-time limit.

Information Accumulation Upper Bound (Theorem 4.1):

$H_{Q} (T) - H_{Q} (0) \leq K C_{m a x}$

Under complexity budget and attention bandwidth constraints, information acquisition amount has linear upper bound with physical resources.

Optimal Attention Strategy (Corollary 4.2):

$ρ_{t}^{*} (ϕ) \propto ∣\nabla I_{Q} (ϕ) ∣_{g_{Q}}^{2} exp (- \frac{d _{S_{Q}}^{2} ( ϕ , ϕ _{t} )}{2 σ _{t}^{2}})$

Optimal attention concentrates on regions with large information gradient and close to current position.

Engineering Path:

Eye movement tracking $\to$ spatial attention heatmap $\to$ information manifold embedding
Semantic association network $\to$ knowledge graph $\to$ spectral dimension estimation
Active learning algorithm $\to$ information accumulation curve $\to$ upper bound verification

Philosophical Significance:

Attention not “arbitrary choice”, but optimization process under geometric constraints
Time selection (“what to look at”+“how long to look”) determines information accumulation path
Knowledge graph “approximates truth” (spectral convergence) in long-term learning, but can never “fully arrive” due to cognitive resource limitations

Next chapter (Chapter 5) will explore geometric characterization of free will, introducing Empowerment $E_{T}$ as measure of “causal control”, revealing deep connection between “freedom of choice” and “information geometry”.

References

Attention Theory

Simon, H. A. (1971). Designing organizations for an information-rich world. In Computers, Communications, and the Public Interest (pp. 37-72).
Kahneman, D. (1973). Attention and Effort. Prentice-Hall.
Posner, M. I., & Petersen, S. E. (1990). The attention system of the human brain. Annual Review of Neuroscience, 13(1), 25-42.

Manifold Learning

Tenenbaum, J. B., de Silva, V., & Langford, J. C. (2000). A global geometric framework for nonlinear dimensionality reduction. Science, 290(5500), 2319-2323.
Belkin, M., & Niyogi, P. (2003). Laplacian eigenmaps for dimensionality reduction and data representation. Neural Computation, 15(6), 1373-1396.

Graph Geometry

Chung, F. R. (1997). Spectral Graph Theory. AMS.
von Luxburg, U. (2007). A tutorial on spectral clustering. Statistics and Computing, 17(4), 395-416.

Active Inference

Friston, K. (2010). The free-energy principle: a unified brain theory? Nature Reviews Neuroscience, 11(2), 127-138.
Parr, T., & Friston, K. J. (2019). Generalised free energy and active inference. Biological Cybernetics, 113(5-6), 495-513.

Knowledge Representation

Collins, A. M., & Loftus, E. F. (1975). A spreading-activation theory of semantic processing. Psychological Review, 82(6), 407.
Borge-Holthoefer, J., & Arenas, A. (2010). Semantic networks: Structure and dynamics. Entropy, 12(5), 1264-1302.

Time Perception

Gibbon, J. (1977). Scalar expectancy theory and Weber’s law in animal timing. Psychological Review, 84(3), 279.
Block, R. A., & Zakay, D. (1997). Prospective and retrospective duration judgments: A meta-analytic review. Psychonomic Bulletin & Review, 4(2), 184-197.

This Collection

This collection: Observer–World Section Structure: Causality and Conditionalization (Chapter 1)
This collection: Structural Definition of Consciousness: Five Conditions and Temporal Causality (Chapter 2)
This collection: Entanglement–Time–Consciousness: Unified Delay Scale (Chapter 3)
This collection: Unified Theory of Observer–Attention–Knowledge Graph in Computational Universe (Source theory document)

Keyboard shortcuts

Meta Theory of the Zeckendorf-Hilbert Universe