Understanding Understanding
working draft synthesizedA metacognitive inquiry into understanding itself — what it is, how it is built, what makes it efficient, and what ultimately limits it. The literature is deep but vague and non-conclusive, because "understanding" sits awkwardly between knowledge (facts) and capability (action), and most disciplines study only the slice that fits their tools. The thread below is an attempt to consolidate those fragments into a single picture.
The more you understand, the more you understand that you don't understand.
This is not a paradox but a mechanism. If K is the set of understood concepts and U the unknown, learning enlarges the interface ∂K between them. As K grows, the number of adjacent unexplored questions grows with it, so ignorance becomes more visible, not less. Beginners see a small, seemingly complete map; intermediates hit contradictions and edge cases; experts perceive large networks of open problems. The same intuition recurs as Socrates ("I know that I know nothing"), the Dunning–Kruger effect, and Feynman's comfort with uncertainty.
A working definition
You understand something if you can build, manipulate, and apply a model of it across contexts.
Understanding, on this view, is not a static mental state but the possession of a manipulable internal model. Operationally it decomposes into five abilities:
| Ability | Test |
|---|---|
| Explanation | Explain the phenomenon in your own words |
| Prediction | Anticipate outcomes under new conditions |
| Intervention | Change a variable and reason about the effect |
| Compression | Represent the idea in simpler form without losing function |
| Transfer | Apply it in a domain where it was never learned |
A compact personal test follows from this: for any concept, ask (1) Can I derive it? (2) Can I modify it? (3) Can I apply it somewhere unexpected? If all three hold, the understanding is probably real. This aligns with Feynman's framing — you understand something when you can reconstruct and teach it — and with the mathematician's standard that you understand a theorem when you can recreate it without looking.
Levels of depth
Understanding is graded, not binary. A rough ladder:
| Level | Description |
|---|---|
| Recognition | "I've seen this before" |
| Recall | Can restate definitions or formulas |
| Procedural | Can apply the method when prompted |
| Structural | Grasps relationships between concepts |
| Generative | Can derive new results and variations |
| Transfer | Can deploy the idea in an entirely different domain |
Mastery generally begins around the structural → generative boundary, where knowledge stops being a list and becomes a network capable of producing new results.
Understanding as a process, not a state
The most durable structural insight is that understanding is a dynamical loop, not a fixed possession:
model → prediction → comparison with reality → error → revision → improved model
This same loop reappears, renamed, across nearly every field that studies adaptive systems:
| Domain | Form of the loop |
|---|---|
| Scientific method | hypothesis → experiment → revision |
| Bayesian inference | prior → evidence → posterior |
| Machine learning | model → loss → gradient update |
| Neuroscience | prediction → sensory input → prediction error → update |
| Reinforcement learning | policy → reward → adjustment |
Karl Friston's Free Energy Principle casts biological cognition as the continuous minimization of prediction error; the brain is, on this account, a prediction machine and understanding corresponds to better predictive models. Because the loops are structurally identical, independent frameworks keep converging on the same architecture — they are all describing a system that improves a model of reality through iterative error correction. Error is therefore essential, not incidental: progress happens precisely where predictions fail.
The consequence is a sharper definition:
Understanding is the capacity to construct and refine predictive models of reality through iterative, error-driven optimization.
And mastery is not measured by what you currently know, but by how quickly your models improve when confronted with new information.
The understanding stack
Knowledge can be arranged as a hierarchy where each level compresses and organizes the one below it:
reality → observations / data → patterns → concepts → models → predictions → theories
Lower levels ground knowledge in reality; higher levels provide explanatory power. Understanding emerges when the concept → model → prediction segment works reliably, and mastery is the ability to move fluidly up and down this stack — descending to check a theory against data, ascending to compress many observations into a single law.
Why understanding resists clean formalization
Concepts like mass, probability, and entropy became precise because each maps to a single measurable quantity. Understanding does not, for several compounding reasons:
- It is not a single object. It emerges from the interaction of perception, representation, abstraction, modeling, prediction, and revision. A process composed of processes is hard to pin to one variable.
- It spans multiple levels. David Marr's three levels — computational (what problem is solved), algorithmic (how), and implementation (the physical mechanism) — must align for understanding to feel complete. A theory that describes only one level inevitably feels partial.
- It is dynamical, not static. It is a loop that evolves, so defining it as a fixed state misses the point.
- It is representation-dependent. The same knowledge can be opaque as a symbolic equation and obvious as a geometric picture.
- It is context-relative. One can understand gradient descent in convex optimization yet not in high-dimensional stochastic settings.
- It involves compression and generative capability — both harder to measure than recall.
- It is networked. Per Herbert Simon, expertise is organized into structured clusters; because networks differ between individuals, understanding is hard to measure comparably.
- It is self-referential. Modeling understanding with understanding invites the recursive difficulties familiar from logic.
The predictable result is a fragmented literature, where each discipline captures one face of the same phenomenon:
| Field | Focus |
|---|---|
| Philosophy of science | explanation and causality (Hempel: explain via general laws) |
| Cognitive science | mental representation (Marr's levels) |
| Expertise research | pattern recognition and chunking (Simon) |
| Machine learning | model learning from data |
| Education theory | conceptual change |
| Neuroscience | neural mechanisms |
The cognitive operations of understanding
Stripped of repetition, expert thinking across mathematics, physics, and cognitive science reduces to a small toolkit of operations applied recursively. This is the core consolidated set:
- Composition — combine simpler structures into complex ones (words → sentences, functions → composition, modules → systems).
- Decomposition — break a system into modules and interfaces; understand each, then the whole (e.g. embedding → attention → feed-forward → output).
- Abstraction — remove irrelevant detail to expose shared structure (dog/cat/horse → animal; circuits, fluids, and neural nets as the same flow-through-resistance pattern).
- Generalization — extend a pattern from observed examples to unseen cases; in ML this is the literal definition of a working model.
- Analogy — map structure between domains (fluid flow ↔ electrical current; evolution ↔ optimization). Douglas Hofstadter treats analogy as the core mechanism of high-level reasoning.
- Representation change — re-express a problem in a more tractable form; the operation that most often unlocks difficulty.
- Compression — reduce an idea to its minimal generative core. Backprop becomes "propagate error gradients through the computation graph via the chain rule"; gradient descent becomes "move parameters in the direction that most reduces loss."
- Invariant detection — find what stays fixed under transformation (conservation laws, convexity, entropy). Invariants are anchors that make a system easier to reason about.
- Mental simulation — run the model internally (parameters sliding across a loss surface) to reason about behavior without formal calculation.
- Generative thinking — store procedures that produce answers, not answers: proof strategies (induction, symmetry, contradiction, transformation) rather than memorized proofs.
- Iterative refinement — the model → error → revision loop applied to the concept itself.
- Recursion — apply an operation to its own output (concept → concept of concepts → concept of conceptual systems), letting rich structure emerge from few rules.
What separates experts from novices is not the operations themselves but richer internal models, faster model updating, and stronger abstraction. Simon's work showed expert knowledge is stored as large networks of structured "chunks" that enable rapid inference and transfer. The felt quality of expert intuition is largely this: an internalized map of a domain's structure that lets one jump directly toward promising regions.
A useful unifying image: deep understanding is the construction of internal simulators. A physicist runs a simulator of physical systems; an ML researcher, of learning dynamics; an engineer, of built systems. The better the simulator, the deeper the understanding.
Breakthrough moves: transformations of concept space
Major conceptual breakthroughs are not usually the product of more computation; they come from finding the right structure. A concept space is the set of all models capable of describing a system, and the breakthrough operations are transformations of that space that make useful models easier to locate.
| Operation | What it does to the space |
|---|---|
| Representation transformation | Moves to a different coordinate system (Fourier: time → frequency; combinatorics → linear algebra) |
| Duality | Exposes a complementary description (primal ↔ dual via Lagrangian duality; position ↔ momentum) |
| Symmetry detection | Identifies equivalent regions, removing redundancy (Noether: continuous symmetries ↔ conservation laws) |
| Dimensional reduction | Removes irrelevant degrees of freedom (PCA: keep directions of largest variance) |
| Constraint relaxation | Temporarily widens the feasible region (integer → continuous optimization, then recover) |
| Limiting-case analysis | Simplifies via extremes (learning rate → 0 gives gradient flow; width → ∞ gives the neural tangent kernel regime) |
| Compositional construction | Builds complex systems from simple, scalable parts |
| Abstraction | Collapses many specific models into one general form |
Geometrically: a representation change can turn a jagged search landscape into a smooth one; symmetry deletes redundant regions (if A ≡ B, reasoning about A explains B for free); abstraction and dimensional reduction shrink the space; constraint manipulation reshapes its topology. Breakthroughs often chain several moves — identify symmetry → change representation → relax constraints → analyze a limiting case → discover an invariant. The recurring deep pattern:
observe phenomenon → build initial model → detect hidden structure
→ change representation → reveal a simpler underlying law
This is why understanding can be framed as efficient navigation of concept space: construct representations, identify structure, move toward compact predictive models. Experts appear intuitive because they have internalized the geometry of their domain's space.
What makes understanding efficient
If understanding is a model-improvement loop, why do some systems learn far faster than others? The efficiency of the loop depends on its surrounding architecture:
- Representation quality — the single biggest factor. Raw pixels make vision hard; object-level features make it easy. Good representations reshape the problem so solutions become easy to find (the goal of representation learning).
- Search efficiency in model space — the space of explanations is enormous, so priors, symmetry assumptions, and modularity prune it. Exact Bayesian reasoning over hypotheses is the ideal but is computationally intractable, so real systems approximate.
- Hierarchy — layering data → features → concepts → models → theories compresses patterns and improves generalization.
- Abstraction reuse — learning one general rule instead of separate solutions for A, B, C cuts learning complexity sharply.
- Error-signal quality — weak or delayed feedback slows everything; strong, immediate signals accelerate learning.
- Modularity — decomposing reduces dimensionality and lets modules be optimized, then reused across domains.
- Compression ability — better compression (cf. Kolmogorov complexity, the shortest description that generates the data) yields simpler, more powerful models.
- Exploration / exploitation balance — too little exploration stagnates; too much is inefficient.
- Knowledge reuse and cognitive architecture — reusable abstractions (learn optimization once, apply to ML, control, economics) and the right substrate (symbolic, neural, probabilistic, or hybrid) set the ceiling.
A training system for understanding
Treating understanding as a trainable skill, the operations above can be drilled in a deliberate sequence. For any concept:
- Structural mapping — identify the core primitives, their relationships, and the constraints/invariants. Build a concept graph, not a linear list.
- Mechanistic modeling — explain the concept across Marr's three levels. For attention: computational = select relevant information; algorithmic = similarity-based weighting; implementation = matrix multiplication on a GPU.
- Derivation — reconstruct results instead of memorizing them (derive backprop from the chain rule + computation graph; derive attention from similarity search). Derivation reveals assumptions, exposes hidden constraints, and compresses.
- Multi-representation — express the concept as equation, diagram, code, analogy, and physical intuition. Understanding that survives a change of representation is the real thing.
- Perturbation testing — stress the model: what if a parameter → 0? scale → ∞? noise increases? a constraint is removed? (Transformers without positional encoding become permutation-invariant — which reveals what positional encoding is for.)
- Compression — reduce the idea to its minimal core. Per David Deutsch, a good explanation is hard to vary without breaking its predictive power.
- Transfer — find the concept's structural twin in another domain (gradient descent ↔ energy minimization; feedback loops in biology, cybernetics, and ML).
Run as a daily loop — encounter concept → map → model → derive → stress-test → compress → transfer — across hundreds of concepts, this is what builds mastery. Early on it feels vague, nonlinear, and slow because the meta-models for learning don't yet exist; once the operations internalize, learning accelerates noticeably.
This corresponds to a predictable arc — the "complexity valley":
beginner → illusion of understanding → complexity shock
→ structured mental models → expert intuition
The disorienting middle, where a domain reveals its real depth, is not a failure of the method; it is a necessary stage of it.
The architecture of a maximally understanding system
Push the question further — what design maximizes the rate at which understanding improves? — and the same structural properties recur across cognitive science, AI, and scientific practice:
- Hierarchical representations that compress patterns level by level (cf. hierarchical Bayesian models, deep nets).
- Generative world models that can simulate possible observations, enabling prediction, planning, and counterfactual reasoning (cf. variational inference).
- Active information acquisition — choosing experiments/actions that most reduce uncertainty rather than observing passively (cf. active learning; the hypothesis → experiment cycle).
- Modular structure allowing parallel improvement and cross-domain transfer.
- Structured memory — relational networks supporting analogy, clustering, and fast recall (cf. Collins's semantic networks).
- Meta-learning — improving one's own learning strategies, not just one's models.
- Abstraction engines that turn repeated patterns into reusable concepts.
- Internal simulation for planning and counterfactuals (cf. model-based RL).
- Recursive self-improvement — refining not only models but the representations, search strategies, and abstraction mechanisms used to build them, so the system improves how it improves.
The convergence is striking: predictive brains, deep generative models, hypothesis–experiment science, and hierarchical concept systems are independent discoveries of the same architecture, which suggests common structural requirements for efficient understanding. The frontier question this raises — what representations achieve the largest compression of reality while preserving predictive power? — is essentially the search for fundamental explanatory frameworks in science.
The limits of understanding
No system reaches perfect understanding. The ceilings are structural, not merely practical:
| Limit | Source |
|---|---|
| Information | Finite, incomplete observations admit multiple explanations — understanding is irreducibly probabilistic (Shannon) |
| Computational | The right model may exist but cost exponential search to find (complexity theory) |
| Representation | A system can only model what its internal language can express (irrationals before the reals; quantum phenomena before quantum theory) |
| Approximation | All models simplify — "all models are wrong, but some are useful" (Box) |
| Observational | Some states (internal biology, hidden social variables) can't be observed directly |
| Self-reference | Sufficiently expressive systems can't prove all truths about themselves (Gödel) — a ceiling on self-understanding |
| Environmental complexity | Sensitivity to initial conditions limits long-range prediction (chaos) |
| Cognitive resources | Finite memory, attention, and speed force reliance on heuristics that sometimes misfire |
| Generality vs. precision | Broad theories lose precision; precise ones lose scope (Newtonian vs. quantum field theory) |
| The expanding unknown | Better models reveal new anomalies, so the frontier of ignorance keeps growing |
Understanding is therefore best seen as an asymptotic process — systems approach ever-more-accurate models without ever reaching a final, complete description. Scientific progress is the steady pushing-outward of these limits: better measurement expands available data, better mathematics expands the representation language, better computation expands the searchable model space. This is also why inquiry stays open-ended across centuries — and why the opening paradox holds.
The open frontier: cognitive primitives
A recurring hypothesis is that a small set of primitive operations — composition, abstraction, analogy, decomposition, transformation, generalization — generates most conceptual knowledge through recursion, the way few axioms generate a vast theory space, or simple cellular-automaton rules generate complex patterns. The search for the minimal such set runs through several traditions:
| Tradition | Primitive view | Key figures |
|---|---|---|
| Symbolic | symbols + manipulation rules → intelligence (Physical Symbol System Hypothesis) | Newell & Simon |
| Probabilistic | priors + likelihoods + update rules | Bayesian inference |
| Information-theoretic | pattern detection, compression, prediction, error correction | Shannon |
| Neural | units, weighted connections, learning rules (gradient descent) | deep learning |
| Compositional | objects, relations, causal rules composed language-like | Tenenbaum |
Despite decades of work there is no consensus on the exact primitives, but many researchers suspect the truly fundamental set is surprisingly small — a kind of generative grammar of thought. If so, the implication for mastery is direct: understanding is less about storing knowledge than about mastering the transformations that generate it.
The recursive turn
Studied far enough, understanding becomes an object of its own study, and the process turns recursive — one builds models of concepts, then of reasoning, then of model-construction itself:
understanding concepts → understanding systems → understanding how understanding works
Each layer improves the next. Deep conceptual mastery tends to move through three phases, with the second the hardest:
| Phase | Description |
|---|---|
| Accumulation | gathering concepts and tools |
| Structural integration | connecting them into networks |
| Generative insight | producing new models and frameworks |
This connects naturally to the rest of this site — to Sand to Band comprehension across abstraction levels, and to the feedback-driven, Go Slow To Go Fast view of action and learning as a closed-loop system. It also runs up against the genuinely open questions: Can understanding be measured objectively? What does embodied, neuroscience-grounded experience add that text alone cannot? Can artificial systems truly understand, or only simulate the operations of understanding?
"The more I understand about understanding, the more I realize how little I understand about understanding."