Chapter 4: Attention–Time–Knowledge Graph—Information Accumulation Geometry of Time Selection
Introduction: Time Scale of Attention
When you focus on reading in a crowded café, surrounding noise “disappears”; when you are highly alert in an emergency, time seems to “slow down”—both phenomena involve how attention shapes time experience.
In previous chapters, we established mathematical framework of observer sections: Observer accesses local information of universe through “slices” of worldline . But a key question remains unanswered: How does observer choose “what to look at”?
This chapter will construct unified theory of attention–time–knowledge graph, revealing:
- Attention is observer’s time-dependent weight distribution on information manifold
- Time selection implements geometric constraints of information accumulation through attention operators
- Knowledge graph is discrete skeleton of information manifold, spectrally converging to true geometry in long-time limit
This theory transforms scarcity resource allocation problem in “attention economics” into variational optimization problem on information geometry.
graph TB
subgraph "Observer Three-Layer Structure"
A["Attention Operator A<sub>t</sub><br/>Choose 'What to See'"]
B["Knowledge Graph G<sub>t</sub><br/>Encode 'What to Know'"]
C["Eigen Time Scale τ(t)<br/>Experience 'How Long Passed'"]
end
subgraph "Information Geometric Constraints"
D["Task Information Manifold (S<sub>Q</sub>,g<sub>Q</sub>)"]
E["Complexity Manifold (M,G)"]
end
A --> B
B --> C
D --> A
E --> C
A --> D
C --> E
style A fill:#e1f5ff
style B fill:#fff4e1
style C fill:#ffe1f5
style D fill:#e1ffe1
Core Insight: Triple Constraints of Information Accumulation
Imagine an explorer mapping unknown terrain:
- Attention Bandwidth: He can only explore limited area each day (support of attention operator )
- Complexity Budget: Total length of walking path limited by physical strength (worldline length on complexity manifold)
- Knowledge Graph: Map he draws is discrete sampling points (nodes ) and paths (edges ), gradually approximating true terrain
This chapter will prove: Under these triple constraints, information amount observer can accumulate has linear upper bound with complexity budget , and spectral dimension of knowledge graph converges to true dimension of information manifold in long-time limit.
Part One: Attention Operator—Observer’s “Spotlight”
1.1 Dual Formalization of Attention
In computational universe framework , observer cannot simultaneously access all configurations . Attention characterizes observer’s “visible window” at each moment.
Definition 1.1 (Discrete Attention Operator)
At time step , observer’s attention operator is function:
satisfying normalization:
or weak constraint (total attention bandwidth).
Visible configuration subset defined as:
Physical Picture: is weight observer allocates “cognitive resources” to configuration . Like spotlight on stage, highlights certain regions, other regions in “shadow”.
Definition 1.2 (Continuous Attention Density)
On task information manifold , attention can be represented as probability density:
where is volume element.
graph LR
subgraph "Discrete Attention"
A["Configuration Space X"]
B["Weight A<sub>k</sub>(x)"]
C["Visible Subset X<sub>k</sub><sup>att</sup>"]
end
subgraph "Continuous Attention"
D["Information Manifold S<sub>Q</sub>"]
E["Density ρ<sub>t</sub>(φ)"]
F["High Attention Region supp(ρ<sub>t</sub>)"]
end
A --> D
B --> E
C --> F
style B fill:#e1f5ff
style E fill:#fff4e1
1.2 Attention and Observer Internal State
Observer’s attention determined by internal memory state . Formalized as:
Observer Object:
where:
- : Internal memory state space (finite or countable)
- : Observation symbol space
- : Action space
- : Attention–observation strategy
- : Internal update operator
Dynamics Loop:
- Observer in internal state , universe in configuration
- According to strategy select action (decide “where to look”)
- Universe returns observation symbol
- Update internal state
Attention operator jointly determined by and current orbit position .
Analogy: Imagine a miner with headlamp exploring dark cave. His “memory map” tells him “where explored”, his “decision strategy” tells him “which direction to shine headlamp next”, each observation updates map .
1.3 Resource Constraints of Attention
Constraint 1 (Bandwidth Constraint):
Normalization condition means total attention conserved—focusing somewhere necessarily ignores elsewhere.
Constraint 2 (Second Moment Constraint):
For fixed reference point , “spatial dispersion” of attention bounded. This prevents attention over-dispersed (cannot focus) or over-concentrated (narrow field of view).
Physical Analogy: In quantum mechanics, wave function normalization corresponds to probability conservation; here normalization of corresponds to “total cognitive resource conservation”. Second moment constraint similar to Heisenberg uncertainty principle: Cannot simultaneously achieve “high focus” and “wide coverage”.
Part Two: Knowledge Graph—Discrete Skeleton of Information Manifold
2.1 Four-Tuple Definition of Knowledge Graph
Observer cannot access all points of information manifold through finite-time exploration. Knowledge representation he constructs is discrete graph:
Definition 2.1 (Knowledge Graph at Time )
where:
- : Finite node set, each node represents a “concept” or “abstract state”
- : Directed or undirected edge set, represents relationships between concepts (causal, implication, similarity, etc.)
- : Edge weights, represent relationship strength
- Embedding map : Embeds each node to point in information manifold
graph TB
subgraph "Knowledge Graph G<sub>t</sub>"
V1["Concept 1"]
V2["Concept 2"]
V3["Concept 3"]
V4["Concept 4"]
V1 -->|w<sub>12</sub>| V2
V2 -->|w<sub>23</sub>| V3
V1 -->|w<sub>14</sub>| V4
V3 -->|w<sub>34</sub>| V4
end
subgraph "Information Manifold S<sub>Q</sub>"
P1["Φ<sub>t</sub>(V1)"]
P2["Φ<sub>t</sub>(V2)"]
P3["Φ<sub>t</sub>(V3)"]
P4["Φ<sub>t</sub>(V4)"]
end
V1 -.embedding.-> P1
V2 -.embedding.-> P2
V3 -.embedding.-> P3
V4 -.embedding.-> P4
style V1 fill:#e1f5ff
style V2 fill:#fff4e1
style V3 fill:#ffe1f5
style V4 fill:#e1ffe1
Physical Picture: Knowledge graph like “sampling points on map”. Explorer cannot measure every inch of terrain, can only stake at key positions (nodes ), mark paths (edges ), record relative positions (embedding ).
2.2 Graph Laplace Operator and Spectral Structure
Define graph Laplace operator on knowledge graph:
for any function . This is discrete version of Laplace–Beltrami operator.
Spectral Properties:
- Eigenvalues
- Smallest nonzero eigenvalue (Fiedler value) characterizes graph’s “connectivity”
- Distribution of large eigenvalues characterizes graph’s “local geometry”
Definition 2.2 (Spectral Dimension of Knowledge Graph)
if this limit exists. describes graph’s “effective dimension” at small scales.
2.3 Spectral Approximation: From Discrete to Continuous
Key Question: To what extent does knowledge graph “faithfully” represent information manifold ?
Definition 2.3 (Spectral Approximation Condition)
Say spectrally approximates if:
- Embedding points become dense in as
- Kernel weights constructed from embedding:
makes graph Laplace -converge to continuous Laplace–Beltrami operator under appropriate scaling.
Theorem 2.1 (Spectral Dimension Convergence)
If spectrally approximates , and local information dimension of information manifold is constant on compact region , then:
Meaning: In long-term learning process, spectral dimension of observer’s knowledge graph tends to true dimension of information manifold, meaning knowledge graph gradually becomes high-fidelity skeleton of information manifold geometrically.
Analogy: Imagine approximating sphere with triangular mesh. As number of mesh nodes increases and spacing shrinks, spectrum of discrete Laplace operator approximates spectrum of continuous Laplace operator on sphere (corresponding to spherical harmonics). Here, observer’s knowledge graph plays role of “triangular mesh”.
Part Three: Extended Worldline—Joint Dynamics of Attention and Knowledge
3.1 Joint State Space of Observer–Universe
In previous chapters, we defined observer’s worldline on joint manifold :
where is control manifold coordinate, is information manifold coordinate.
Now introduce extended worldline, including internal state, knowledge graph, and attention:
Definition 3.1 (Extended Worldline)
where:
- : Control–information state
- : Internal memory state
- : Knowledge graph
- or : Attention operator
graph TB
subgraph "Extended State Space"
A["(θ,φ)∈E<sub>Q</sub><br/>Physical–Information Position"]
B["m∈M<sub>int</sub><br/>Internal Memory"]
C["G<sub>t</sub><br/>Knowledge Graph"]
D["A<sub>t</sub><br/>Attention"]
end
E["Complexity Manifold M"]
F["Information Manifold S<sub>Q</sub>"]
E --> A
F --> A
A --> B
B --> C
B --> D
C --> D
style A fill:#e1f5ff
style C fill:#fff4e1
style D fill:#ffe1f5
3.2 Observation–Computation Action
On basis of time–information–complexity joint action , add observer internal cost:
Definition 3.2 (Observer–Computation Joint Action)
where:
- Complexity Kinetic Energy:
- Information Kinetic Energy:
- Knowledge Potential Energy:
where is task information quality function (like relative entropy, mutual information).
- Knowledge Graph Update Cost:
where is distance between graphs (like spectral distance, Gromov–Wasserstein distance).
- Attention Configuration Cost:
For example entropy regularization , or bandwidth constraint.
Physical Interpretation:
- First three terms: Previous time–information–complexity variational
- : Frequently updating knowledge graph (adding new nodes, adjusting edge weights) requires “cognitive cost”
- : Changing attention configuration (like switching focus region) requires “switching cost”
3.3 Euler–Lagrange Conditions and Optimal Strategy
Minimizing gives optimal observation–computation–learning strategy. Formally:
First two equations give geodesic equations for control–information coordinates; last two give:
Optimal Knowledge Graph Update: At each moment, balance “information gain” with “graph update cost”, choose optimal node addition/edge adjustment strategy.
Optimal Attention Configuration: Under given bandwidth constraint, choose attention distribution maximizing short-term information gain.
Analogy: Like mountaineer planning route under limited physical strength and time. He needs to balance:
- Walking fast (small complexity kinetic energy) vs reaching high-value regions (low information potential energy)
- Frequently updating map (high ) vs using rough map (low )
- Wide scanning (high ) vs focusing locally (low )
Part Four: Information Accumulation Inequality—Upper Bound Under Resource Constraints
4.1 Measure of Knowledge Amount
Define observer’s knowledge amount for task :
where is weight distribution on knowledge graph nodes (like visit frequency, importance score).
Information Accumulation Rate:
Physical Meaning: measures how much observer “knows” about task . measures “learning speed”.
4.2 Fisher Information Acquisition Rate
Assume observer samples information manifold through attention density , its single-step Fisher information acquisition rate:
Meaning: characterizes “expected squared gradient of information quality function under current attention configuration”—equivalent to “steepness of current learning direction”.
In complexity–information joint geometry, and information kinetic energy related through Lipschitz relation:
where is Lipschitz constant of .
4.3 Information Accumulation Inequality
Theorem 4.1 (Observer Information Accumulation Upper Bound)
Assume:
- Task information quality function Lipschitz on , gradient bounded:
- Second moment of attention density bounded:
- Observer’s complexity budget:
Then exists constant (only depends on and geometric structure), such that:
Proof Sketch:
Differentiate :
Second term estimated using Cauchy–Schwarz inequality:
In joint action, and coupled through weights . Using variational optimality, can prove:
Thus:
graph LR
A["Complexity Budget C<sub>max</sub>"] --> B["Information Velocity v<sub>S<sub>Q</sub></sub><sup>2</sup>(t)"]
B --> C["Fisher Acquisition J(t)"]
C --> D["Knowledge Growth Ḣ<sub>Q</sub>(t)"]
D --> E["Total Knowledge H<sub>Q</sub>(T)"]
F["Attention Bandwidth B<sub>att</sub>"] --> C
G["Gradient Bound C<sub>I</sub>"] --> C
style A fill:#e1f5ff
style E fill:#ffe1e1
Physical Meaning:
- Information accumulation amount has linear upper bound with complexity resources
- Attention bandwidth and gradient bound only change proportionality constant , don’t change linear form
- This is geometric expression of “cognitive resource scarcity”: Unlimited learning impossible, information acquisition rate constrained by physics
Analogy: Like drawing water from river with bucket. Bucket size (), river flow speed (), walking speed () together determine water amount per unit time. But no matter how optimized, total water amount cannot exceed “total walking distance” () times some constant.
4.4 Optimality Conditions for Time Selection
From information accumulation inequality can derive optimal strategy for time selection:
Corollary 4.2 (Optimal Attention Configuration)
Under fixed complexity budget and time window , attention strategy maximizing satisfies:
where is current orbit position, is temperature parameter (determined by bandwidth constraint ).
Meaning: Optimal attention is combination of “squared gradient weighting” and “distance penalty”:
- Prioritize regions with large information gradient (“learning boundary”)
- But cannot be too far from current position (limited by attention bandwidth)
Physical Analogy: In quantum mechanics, choice of measurement operator affects information acquisition rate (Cramér–Rao lower bound); in classical information theory, communication channel capacity limits information transmission rate. Here, attention operator plays role of “measurement operator” or “channel”, optimizing information flow under geometric constraints.
Part Five: Time Sense and Attention Modulation
5.1 From Fisher Information to Subjective Duration
Recall subjective duration definition from Chapter 3:
where is local quantum Fisher information. Now, connect it with attention operator:
Proposition 5.1 (Attention-Modulated Subjective Duration)
If observer’s quantum Fisher information and attention bandwidth satisfy:
(larger bandwidth, lower discriminability), then subjective duration can be written as:
Meaning: Attention dispersion ( large) causes subjective duration extension.
Experimental Prediction:
- In multi-task situations (attention dispersed across multiple objects), time experience “slows”
- In single-focus tasks (attention concentrated on single object), time experience “speeds”
This consistent with classic “attention–time distortion” phenomenon: Complex tasks make time “pass slow”, simple repetitive tasks make time “pass fast”.
5.2 Knowledge Graph and Time Depth
Time depth concept: Observer’s perception of “how far past”, can be characterized by path length of knowledge graph.
Definition 5.1 (Time Depth)
On knowledge graph , shortest path length from current concept node back to “origin” node :
where is geodesic distance on graph.
Proposition 5.2 (Time Depth Growth Law)
Under spectral approximation condition, growth rate of time depth consistent with growth rate of geodesic distance on information manifold:
Meaning: Observer “walking fast” on information manifold, time depth of knowledge graph also grows fast—this corresponds to phenomenon “information-dense experience causes time experience extension”.
Analogy: Imagine a memoir. Relationships between chapters (nodes) (edges) constitute “memory graph”. Path length from “starting point” to “now” is psychological measure of “feeling how long passed”. Dense experiences (high ) make memoir “length” grow faster.
graph TB
subgraph "Sparse Experience"
A1["Event 1"] --> A2["Event 2"] --> A3["Event 3"]
end
subgraph "Dense Experience"
B1["Event 1"] --> B2["Event 2"]
B2 --> B3["Event 3"]
B2 --> B4["Event 4"]
B3 --> B5["Event 5"]
B4 --> B5
B5 --> B6["Event 6"]
end
C["Short Time Depth"] -.corresponds to.- A3
D["Long Time Depth"] -.corresponds to.- B6
style A3 fill:#fff4e1
style B6 fill:#ffe1f5
Part Six: Engineering Path—Attention Tracking and Knowledge Graph Visualization
6.1 Eye Movement Tracking and Attention Estimation
Experimental Design:
- Present visual stimuli (like complex images, text paragraphs)
- Record eye movement trajectory and fixation duration
- Construct spatial attention heatmap:
- Map to information manifold: If image pixel corresponds to feature vector , then:
Verify Predictions:
- Test if concentrates on regions with large (high information gradient)
- Test relationship between attention bandwidth and task complexity
6.2 Neural Embedding of Concept Graph
Method:
- Collect subject’s semantic association network: Given concept word , require listing related concepts , assign similarity score
- Construct knowledge graph , where is concept set, is association relations
- Use graph embedding algorithms (like Node2Vec, GCN) to learn embedding
- Calculate graph Laplace spectrum and heat kernel trace
- Estimate spectral dimension
Verify Theorem 2.1:
- Compare of subjects (different learning stages)
- Expectation: After long-term learning, tends to true dimension of task information manifold
6.3 Active Learning and Information Accumulation
Algorithm Framework:
- Initialize knowledge graph and attention
- At each time step :
- According to current select query action (active sampling)
- Observe result , update internal state
- Update knowledge graph (add nodes/adjust edge weights)
- Update attention (reallocate according to information gain)
- Record information accumulation curve and complexity consumption
Verify Theorem 4.1:
- Test if has linear relationship with
- Estimate proportionality constant and dependence on attention bandwidth
6.4 Cross-Modal Knowledge Graph
Extension: In multi-modal tasks (vision+language+auditory), nodes of knowledge graph from different modalities:
Cross-modal edges connect concepts from different modalities (like image “dog” word “dog”).
Embedding Alignment:
- Visual features
- Language features
- Cross-modal mapping (like CLIP model)
Research Question: How does spectral dimension of cross-modal knowledge graph relate to dimensions of individual single-modal manifolds?
Part Seven: Dialogue with Existing Theories
7.1 Attention Economics
Classic Theory (Simon, 1971; Kahneman, 1973):
- Attention is scarce resource, needs allocation across multiple tasks
- Dual-task interference measures attention capacity
Extension of This Theory:
- Geometricize “attention capacity” as bandwidth constraint
- Express “task interference” as spatial separation of multiple high-gradient regions on information manifold
- Provide quantitative bridge from geometric constraints to behavioral predictions
7.2 Manifold Learning and Representation Learning
Classic Theory (Tenenbaum, 2000; Belkin & Niyogi, 2003):
- High-dimensional data embedded in low-dimensional manifold
- Graph Laplace converges to manifold Laplace
Innovation of This Theory:
- Interpret “data manifold” as “task information manifold”
- Formalize “learner” as observer object with limited memory
- Graph convergence theorem (Theorem 2.1) gives geometric guarantee of “cognitive convergence”
7.3 Active Inference and Bayesian Brain
Classic Theory (Friston, 2010; Active Inference):
- Brain minimizes “free energy” (variational lower bound)
- Action selection minimizes prediction error
Connection of This Theory:
- Joint action can be viewed as “generalized free energy”
- Attention operator corresponds to “precision weighting”
- Knowledge graph update corresponds to implementation of “Bayesian filtering” on discrete graph
7.4 Scalar Expectancy Theory of Time Perception
Classic Theory (Scalar Expectancy Theory, Gibbon, 1977):
- Internal clock model: Pacemaker–switch–accumulator
- Attention modulates switch
Geometric Reconstruction of This Theory:
- “Pacemaker frequency” (square root of quantum Fisher information)
- “Accumulator” subjective duration integral
- “Attention switch” attention bandwidth modulates
Part Eight: Discussion—Geometric Constraints of Cognitive Resources and Emergent Intelligence
8.1 Applicability Domain and Assumption Strength
Lipschitz and Bounded Gradient Assumption:
- Lipschitz property of information quality function guarantees linear upper bound of information accumulation inequality
- In practice, may have singular points (like phase transition boundaries)
- Need localization or regularization
Spectral Approximation Assumption:
- Knowledge graph spectrally approximating information manifold requires long-time limit
- In finite time, spectral dimension may oscillate or deviate
- Need introduce quantitative estimate of “approximation rate”
Finite Memory Assumption:
- Finite internal state space limits observer’s “working memory capacity”
- Actual cognitive systems may have hierarchical memory (short-term vs long-term)
- Need extend to multi-scale memory model
8.2 Tightness of Information Accumulation Upper Bound
Question: Is upper bound given by Theorem 4.1 tight?
Analysis:
- Under “uniform exploration” strategy ( constant distribution), upper bound nearly achieved
- Under “greedy exploration” strategy ( concentrated on current optimum), upper bound may be loose
- Optimal strategy (Corollary 4.2) can achieve asymptotic tightness of upper bound in some cases
Engineering Meaning: Tight upper bound means “cannot break through linear growth rate by optimizing attention strategy”—this is fundamental limitation of cognitive resource scarcity.
8.3 From Single Observer to Multi-Observer
This chapter focuses on single observer. In Chapter 6 will extend to multi-observer consensus geometry, where:
- Knowledge graphs heterogeneous across observers
- Attention influenced by social network topology
- Consensus energy couples individual knowledge graphs
Expected Phenomena:
- Spectral convergence speed of knowledge graph positively correlated with connectivity of social network
- Joint information accumulation of multi-observer can break through linear upper bound of single observer (“collective intelligence” emergence)
8.4 Time Selection and Free Will
Philosophical Question: Does observer “freely choose” attention configuration?
Answer of This Theory:
- Attention operator determined by internal state and strategy —this is “deterministic”
- But strategy itself may be “emergent” (optimized from long-term learning)
- “Free will” can be understood as “freedom of choice in optimal strategy space under geometric constraints”
In Chapter 5 will deeply explore Empowerment (causal control) , giving geometric characterization of free will.
Conclusion: Geometric Characterization of Attention and Information-Theoretic Foundation of Time Selection
This chapter constructed unified theory of attention–time–knowledge graph, transforming classic psychological concept of “cognitive resource scarcity” into strict theorem on information geometry.
Core Results Review:
-
Attention Operator Formalization: or on information manifold, limited by bandwidth constraint
-
Knowledge Graph Spectral Convergence (Theorem 2.1):
Observer’s knowledge graph spectrally approximates true geometry of information manifold in long-time limit.
- Information Accumulation Upper Bound (Theorem 4.1):
Under complexity budget and attention bandwidth constraints, information acquisition amount has linear upper bound with physical resources.
- Optimal Attention Strategy (Corollary 4.2):
Optimal attention concentrates on regions with large information gradient and close to current position.
Engineering Path:
- Eye movement tracking spatial attention heatmap information manifold embedding
- Semantic association network knowledge graph spectral dimension estimation
- Active learning algorithm information accumulation curve upper bound verification
Philosophical Significance:
- Attention not “arbitrary choice”, but optimization process under geometric constraints
- Time selection (“what to look at”+“how long to look”) determines information accumulation path
- Knowledge graph “approximates truth” (spectral convergence) in long-term learning, but can never “fully arrive” due to cognitive resource limitations
Next chapter (Chapter 5) will explore geometric characterization of free will, introducing Empowerment as measure of “causal control”, revealing deep connection between “freedom of choice” and “information geometry”.
References
Attention Theory
- Simon, H. A. (1971). Designing organizations for an information-rich world. In Computers, Communications, and the Public Interest (pp. 37-72).
- Kahneman, D. (1973). Attention and Effort. Prentice-Hall.
- Posner, M. I., & Petersen, S. E. (1990). The attention system of the human brain. Annual Review of Neuroscience, 13(1), 25-42.
Manifold Learning
- Tenenbaum, J. B., de Silva, V., & Langford, J. C. (2000). A global geometric framework for nonlinear dimensionality reduction. Science, 290(5500), 2319-2323.
- Belkin, M., & Niyogi, P. (2003). Laplacian eigenmaps for dimensionality reduction and data representation. Neural Computation, 15(6), 1373-1396.
Graph Geometry
- Chung, F. R. (1997). Spectral Graph Theory. AMS.
- von Luxburg, U. (2007). A tutorial on spectral clustering. Statistics and Computing, 17(4), 395-416.
Active Inference
- Friston, K. (2010). The free-energy principle: a unified brain theory? Nature Reviews Neuroscience, 11(2), 127-138.
- Parr, T., & Friston, K. J. (2019). Generalised free energy and active inference. Biological Cybernetics, 113(5-6), 495-513.
Knowledge Representation
- Collins, A. M., & Loftus, E. F. (1975). A spreading-activation theory of semantic processing. Psychological Review, 82(6), 407.
- Borge-Holthoefer, J., & Arenas, A. (2010). Semantic networks: Structure and dynamics. Entropy, 12(5), 1264-1302.
Time Perception
- Gibbon, J. (1977). Scalar expectancy theory and Weber’s law in animal timing. Psychological Review, 84(3), 279.
- Block, R. A., & Zakay, D. (1997). Prospective and retrospective duration judgments: A meta-analytic review. Psychonomic Bulletin & Review, 4(2), 184-197.
This Collection
- This collection: Observer–World Section Structure: Causality and Conditionalization (Chapter 1)
- This collection: Structural Definition of Consciousness: Five Conditions and Temporal Causality (Chapter 2)
- This collection: Entanglement–Time–Consciousness: Unified Delay Scale (Chapter 3)
- This collection: Unified Theory of Observer–Attention–Knowledge Graph in Computational Universe (Source theory document)