Recursive Engines at Scale
Recursive Engines at Scale
When Abstraction Becomes Architecture
There is a persistent misconception in software engineering that recursive systems are inherently fragile. That they are toys for computer science classrooms — elegant in theory, impractical in production. I believed this myself for years. Then I built one that grew to 21 organ handlers, 1,254 tests, 85% coverage, and a ritual syntax DSL that translates symbolic operations into executable Python. The result, recursive-engine--generative-entity (RE:GE), is the flagship of ORGAN-I in an eight-organ creative-institutional system spanning 78 repositories across 8 GitHub organizations. It is not a classroom exercise. It is the theoretical substrate upon which ORGAN-II’s generative art and ORGAN-III’s commercial products depend.
This essay describes how RE:GE’s architecture scales, where it fails, what recursive self-modeling actually means when you implement it, and why the complementary work in narratological-algorithmic-lenses matters for anyone building systems that need to reason about their own structure.
The Problem RE:GE Solves
Most software systems model their domain. A banking application models accounts and transactions. A CMS models content and workflows. RE:GE models something stranger: it models the act of modeling itself. The system’s core premise is that myth, identity, ritual, and recursive symbolic processes can be formalized as computational operations — and that those formalizations can, in turn, model the system that produces them.
This sounds circular, and it is. That is the point. The challenge was not avoiding circularity but making it productive. Self-referential systems in mathematics (Gödel), computation (Turing), and biology (autopoiesis) all demonstrate that self-reference is not a bug — it is the mechanism by which complex systems sustain coherence. The question was whether I could build a software system that exploited this property deliberately, in a way that produced outputs useful to downstream creative and commercial systems.
The answer required three things: an architecture that could handle recursive self-reference without infinite regress, a formal language for expressing symbolic operations, and a test suite rigorous enough to verify that recursive loops actually terminated with meaningful outputs. RE:GE provides all three. The journey from “this is a cute idea” to “this is load-bearing infrastructure” took approximately eighteen months and several complete architectural rewrites.
The 21-Organ Architecture
RE:GE’s internal architecture mirrors the eight-organ model of the broader system, but at finer granularity. Where the meta-system has 8 organs coordinating 78 repositories, RE:GE has 21 organ handlers coordinating symbolic processing pipelines. Each handler is a self-contained module responsible for a specific domain of symbolic operation. They communicate through a shared event bus and a state registry that tracks the current symbolic context.
The 21 handlers fall into five families:
Foundation Handlers (1–4): These manage the basic substrate — state initialization, context propagation, event routing, and error recovery. Handler 1 (genesis_handler) bootstraps the symbolic environment from a configuration manifest. Handler 2 (context_handler) maintains the nested context stack that allows recursive operations to reference their own invocation history. Handler 3 (event_router) dispatches symbolic events to the appropriate handler chain. Handler 4 (recovery_handler) implements graceful degradation when recursive depth exceeds configured limits.
Symbolic Processing Handlers (5–10): These perform the core symbolic operations. Handler 5 (myth_processor) parses mythological narrative structures into symbolic graphs. Handler 6 (identity_engine) manages the creation, transformation, and dissolution of symbolic identity entities. Handler 7 (ritual_executor) interprets the ritual syntax DSL (more on this below). Handler 8 (recursion_manager) coordinates self-referential operations, maintaining cycle detection and depth tracking. Handler 9 (transformation_pipeline) applies ordered sequences of symbolic transformations. Handler 10 (integration_handler) merges outputs from parallel symbolic processing chains.
Bridge Handlers (11–14): These connect RE:GE to external systems. Handler 11 (obsidian_bridge) synchronizes symbolic state with an Obsidian vault, enabling human-readable knowledge graphs. Handler 12 (git_bridge) commits symbolic state snapshots to version control, creating an auditable history of symbolic transformations. Handler 13 (maxmsp_bridge) translates symbolic events into OSC messages for Max/MSP, enabling real-time audio-visual output. Handler 14 (api_bridge) exposes symbolic operations as REST endpoints for downstream consumers like metasystem-master in ORGAN-II.
Analysis Handlers (15–18): These provide introspective capabilities. Handler 15 (pattern_detector) identifies recurring symbolic patterns across processing history. Handler 16 (coherence_analyzer) measures internal consistency of the symbolic state. Handler 17 (emergence_tracker) flags novel symbolic configurations that were not explicitly programmed. Handler 18 (metrics_collector) aggregates performance and structural metrics for monitoring.
Meta Handlers (19–21): These are where the recursion gets serious. Handler 19 (self_model) maintains RE:GE’s model of its own architecture — a symbolic representation of the 21-handler system that the system itself can query and reason about. Handler 20 (adaptation_engine) uses the self-model to propose structural modifications. Handler 21 (validation_oracle) verifies that proposed adaptations maintain system invariants.
Each handler averages 200–400 lines of Python. The total codebase, excluding tests, is approximately 8,000 lines. This is not a massive system by enterprise standards, but the density of self-referential logic makes it feel much larger. Every handler must be aware of the context in which it operates, including — crucially — whether that context was generated by another handler’s output.
class RecursionManager:
"""Handler 8: Coordinates self-referential operations."""
MAX_DEPTH = 12
def __init__(self, state_registry, cycle_detector):
self.state = state_registry
self.cycles = cycle_detector
self._depth = 0
def enter_recursive_context(self, operation_id: str) -> ContextFrame:
if self._depth >= self.MAX_DEPTH:
return self.state.emit_degraded(operation_id, reason="depth_exceeded")
if self.cycles.would_create_cycle(operation_id, self.state.current_chain):
return self.state.emit_degraded(operation_id, reason="cycle_detected")
self._depth += 1
frame = ContextFrame(
depth=self._depth,
parent=self.state.current_context,
operation=operation_id,
timestamp=self.state.clock.now()
)
self.state.push_context(frame)
return frame
def exit_recursive_context(self, frame: ContextFrame) -> SymbolicOutput:
result = self.state.pop_context(frame)
self._depth -= 1
return result
The MAX_DEPTH = 12 is not arbitrary. Through empirical testing — hundreds of runs with varying recursive depths — I found that meaningful symbolic outputs stabilize between depths 6 and 10. Beyond depth 12, the system generates increasingly degenerate outputs: symbolic entities that reference only themselves with no external content. This is the computational equivalent of staring into a mirror that reflects another mirror. Technically infinite, practically meaningless.
The Ritual Syntax DSL
One of the earliest design decisions was that symbolic operations needed their own language. Python is expressive, but it is not designed for writing things like “dissolve the identity of the narrator, ferment its residue through three alchemical stages, and reconstitute it in the voice of the antagonist.” Forcing such operations into standard Python function calls produced code that was technically correct but semantically opaque. The gap between the conceptual operation and its code representation made debugging nearly impossible.
The ritual syntax DSL is a domain-specific language embedded in Python that provides a declarative interface for symbolic operations. It is parsed by Handler 7 (ritual_executor) and compiled into sequences of handler invocations.
ritual dissolve_reconstitute {
phase nigredo {
dissolve identity(narrator)
collect residue -> $fragments
}
phase albedo {
filter $fragments where coherence > 0.3
ferment $fragments through [calcination, dissolution, separation]
}
phase rubedo {
reconstitute $fragments as identity(antagonist)
bind voice(antagonist) <- voice(narrator).timbre
emit transformed_entity
}
}
This is not pseudocode. The ritual syntax parser tokenizes this input, resolves the symbolic references (identity(narrator), voice(narrator).timbre), validates the phase sequencing against the alchemical operation graph, and generates a directed acyclic graph of handler invocations. The $fragments variable is a symbolic container that persists across phases within a single ritual execution.
The DSL supports 22 primitive operations (corresponding to the 22 classical alchemical operations from Nigredo through Rubedo), conditional branching based on symbolic state predicates, loop constructs for iterative refinement, and external bridge invocations for real-time output. The parser itself is approximately 1,200 lines of Python, with another 800 lines for the compiler that translates parsed rituals into handler chains.
I will be honest about what the DSL is not: it is not a general-purpose programming language. It cannot handle arbitrary computation. Its type system is nominal and shallow. Error messages are sometimes cryptic, particularly when a symbolic reference fails to resolve inside a nested phase block. These are deliberate constraints. The DSL is designed for a specific class of operations — symbolic transformations with mythological and alchemical semantics — and it does those operations well. Trying to make it do more would destroy the expressiveness that makes it useful.
What Recursive Self-Modeling Means in Practice
The phrase “recursive self-modeling” appears in RE:GE’s documentation, and I know how it sounds. It sounds like academic vapor. So let me describe what it actually does, concretely, in a running system.
Handler 19 (self_model) maintains a data structure that represents the 21-handler architecture as a symbolic entity within the system’s own ontology. This means RE:GE contains a model of itself that is subject to the same symbolic operations as any other entity in the system. You can dissolve it. You can ferment it. You can reconstitute it.
Why would you do this? Because self-modeling enables self-diagnosis. When Handler 16 (coherence_analyzer) detects inconsistencies in the symbolic state, it can query the self-model to determine which handler chain produced the inconsistency. When Handler 17 (emergence_tracker) identifies a novel symbolic configuration, it can compare that configuration against the self-model’s predicted outputs to determine whether the novelty is generative or degenerative. Without the self-model, diagnosing issues in a recursive system requires manually tracing execution paths — a process that becomes intractable beyond depth 4 or 5.
In practice, the self-model is a JSON-serializable graph with approximately 400 nodes (one for each handler, each handler’s internal state schema, each inter-handler communication channel, and each external bridge endpoint). Handler 20 (adaptation_engine) proposes modifications to this graph — adding monitoring hooks, adjusting handler priorities, rerouting event flows — and Handler 21 (validation_oracle) checks whether the proposed modification maintains the system’s invariants: no orphaned handlers, no unbounded recursive loops, no broken bridge contracts.
The self-model updates approximately once per minute during active symbolic processing. Each update triggers a validation pass. Over a typical 60-minute session, this produces 50–60 self-model snapshots, each of which is committed to the symbolic state history via Handler 12 (git_bridge). The result is a version-controlled record of how the system’s self-understanding evolved during the session — which is, itself, a form of knowledge that can be analyzed by Handler 15 (pattern_detector) in future sessions.
This is what I mean by productive circularity. The system’s self-model is not a static diagram. It is a live artifact that participates in the same symbolic processing it describes. The circularity is not a defect; it is the mechanism by which the system maintains coherence as it grows.
Workflow Orchestration and External Bridges
RE:GE does not exist in isolation. Four bridge handlers connect it to external systems, and these bridges are where the theoretical architecture meets practical infrastructure constraints.
The Obsidian Bridge (Handler 11) is the most heavily used. It synchronizes RE:GE’s symbolic state with an Obsidian vault, translating symbolic entities into Markdown notes with YAML front matter and wiki-style interlinks. This makes the symbolic state human-navigable. I can open Obsidian and browse the current state of a ritual in progress, see which identities have been dissolved, which fragments are in fermentation, and which reconstitution targets are queued. The bridge is bidirectional: edits in Obsidian propagate back to RE:GE’s state registry, enabling human-in-the-loop symbolic processing. The synchronization uses file-watcher events rather than polling, which keeps latency under 500ms for typical vault sizes.
The Git Bridge (Handler 12) commits symbolic state snapshots at configurable intervals. Each commit message includes structured metadata: the active ritual, the current phase, the handler chain that produced the state change, and a coherence score from Handler 16. This creates a git log that reads like a symbolic processing journal. Weeks later, I can git log --oneline and see exactly how a ritual unfolded across sessions.
The Max/MSP Bridge (Handler 13) translates symbolic events into OSC (Open Sound Control) messages. When a ritual enters its rubedo phase, the bridge sends OSC parameters that adjust synthesis parameters in a Max/MSP patch. This is how ORGAN-I’s theoretical work feeds into ORGAN-II’s generative music and performance systems. The bridge is one-directional (RE:GE to Max/MSP) by design — musical output should be influenced by symbolic state, but symbolic processing should not be driven by audio feedback. That constraint is encoded in the architecture as a system invariant: no back-edges from ORGAN-II to ORGAN-I.
The API Bridge (Handler 14) exposes REST endpoints that downstream systems consume. metasystem-master in ORGAN-II queries RE:GE’s symbolic state to drive its Omni-Dromenon performance engine. The API is read-heavy: approximately 95% GET requests for current state, 5% POST requests for triggering specific rituals. Rate limiting and authentication are handled at the bridge level, not in the core engine.
The Test Suite: 1,254 Tests and What They Prove
I am specific about the test count — 1,254 tests, 85% coverage — because test infrastructure is the most reliable indicator of whether a recursive system actually works or merely appears to work. Recursive systems are uniquely susceptible to subtle bugs that manifest only at specific depths or under specific state configurations. A test suite that only exercises shallow recursive paths provides false confidence.
The test suite is organized into four tiers:
Unit tests (612): These test individual handlers in isolation. Each handler has its own test module with tests for normal operation, edge cases, and error conditions. Handler 8 (recursion_manager) alone has 87 unit tests covering depth limits, cycle detection, context frame management, and degradation behaviors.
Integration tests (348): These test handler chains — sequences of handlers that collaborate to execute a symbolic operation. The longest tested chain involves 8 handlers processing a full ritual with three phases. Integration tests verify that handler outputs are correctly consumed by downstream handlers and that the event bus maintains ordering guarantees under concurrent dispatch.
Self-model tests (156): These verify that the self-model (Handler 19) accurately represents the system’s actual architecture. Every time a handler is added or modified, the self-model tests confirm that the self-model updates accordingly. These tests are the most important in the suite: if the self-model diverges from reality, the entire self-diagnostic capability becomes unreliable. I run these tests first in the CI pipeline for exactly this reason.
Regression tests (138): These capture specific bugs that were discovered during development. Each regression test includes a comment explaining the original failure mode and the conditions that triggered it. Many of these involve recursive depth interactions that would be nearly impossible to discover through manual testing — a cycle that only manifests at depth 7 when two specific handlers interact, or a context frame leak that only occurs during error recovery at depth 11.
Coverage is 85%, not 100%. The uncovered 15% is concentrated in the Max/MSP bridge (which requires a running Max instance for full integration testing) and in certain degenerate recursive paths that are deliberately unreachable under normal configuration. I do not pursue 100% coverage as a goal. The marginal value of covering degenerate paths is low, and the maintenance burden of keeping those tests current is high.
Narratological-Algorithmic-Lenses: The Complementary Approach
While RE:GE formalizes symbolic processing as a recursive engine, narratological-algorithmic-lenses takes a different approach to a related problem: how do you formalize narrative structure as executable algorithms?
The repository contains 14 narratological studies spanning narrative theory from Aristotle’s Poetics to Pixar’s story rules. Each study identifies the formal structures underlying a narrative tradition and translates them into 92 executable algorithms. These are not plot generators. They are analytical tools that decompose existing narratives into their structural components and measure properties like dramatic tension curves, character arc completeness, thematic coherence, and pacing regularity.
The 14 studies cover: Aristotelian dramatic structure, Campbell’s monomyth, Propp’s morphology of folktales, Todorov’s narrative equilibrium, Barthes’ narrative codes, Genette’s narrative discourse, Greimas’ actantial model, Bremond’s narrative logic, Chatman’s story and discourse, Bal’s narratology, Prince’s narrative grammar, Herman’s cognitive narratology, Ryan’s possible worlds theory, and Pixar’s empirical story rules.
Each study produces 5–8 algorithms, for a total of 92. These algorithms are implemented as Python functions that accept narrative data structures (scenes, characters, events, transformations) and return analytical metrics. The metrics feed into ORGAN-I’s broader research pipeline: auto-revision-epistemic-engine consumes them as inputs to its 8-phase revision workflow, and my-knowledge-base indexes them alongside the semantic units extracted from Claude conversations.
The complementarity between RE:GE and narratological-algorithmic-lenses is structural: RE:GE asks “how do symbolic transformations compose?” while the lenses ask “how do narrative structures compose?” Both questions concern the formal properties of meaning-making, but they approach the problem from different directions. RE:GE operates bottom-up, building complex symbolic entities from primitive operations. The lenses operate top-down, decomposing complex narratives into structural primitives. Together, they form a bidirectional analytical pipeline that can both construct and deconstruct symbolic meaning.
In practice, the two systems exchange data through shared data structures in the ORGAN-I research pipeline. A narrative decomposed by the lenses can be fed into RE:GE as a ritual specification. A symbolic entity produced by RE:GE can be analyzed by the lenses for narrative coherence. This bidirectional exchange is possible because both systems operate on the same foundational ontology: entities, transformations, phases, and emergent properties.
Where Recursive Systems Break Down
Honesty requires acknowledging the failure modes. RE:GE has three persistent challenges that I have mitigated but not solved.
Performance at depth. Recursive symbolic processing is computationally expensive. Each additional depth level approximately doubles the state that must be tracked, because the context stack grows linearly but the potential interactions between context frames grow quadratically. At depth 8, a typical ritual takes 2–3 seconds to execute. At depth 12, it takes 15–20 seconds. This is acceptable for offline symbolic processing but prohibitive for real-time performance applications. The Max/MSP bridge works around this by pre-computing symbolic states and streaming cached results, but this introduces latency between symbolic changes and audio output.
Emergence detection false positives. Handler 17 (emergence_tracker) flags novel symbolic configurations, but “novel” is difficult to define rigorously. A configuration that appears novel might simply be a rare but predictable combination of existing patterns. Currently, the tracker uses a similarity threshold against historical state snapshots: anything below 0.7 cosine similarity to all known states is flagged as emergent. This produces approximately 30% false positives — configurations that look novel but are structurally mundane. Reducing false positives requires a richer model of “expected” configurations, which is an open research problem.
Self-model staleness. The self-model updates once per minute, but symbolic processing can produce multiple state changes per second. During high-throughput processing, the self-model lags behind actual system state by 10–30 seconds. This means self-diagnostic queries during bursts return stale results. I have experimented with event-driven self-model updates, but the overhead of updating the 400-node self-model graph on every state change is prohibitive — approximately 200ms per update, which would consume the entire processing budget. The current approach — periodic snapshots with a staleness indicator — is a pragmatic compromise that works in practice even if it offends architectural purists.
What I Would Do Differently
If I were starting RE:GE today, I would change three things.
First, I would implement the ritual syntax DSL as a proper language with a formal grammar from the beginning, rather than evolving it incrementally from Python string parsing. The current parser works, but it carries the scars of its incremental development: inconsistent operator precedence, three different comment syntaxes (an artifact of three separate parser rewrites), and a code generation phase that mixes compilation with interpretation. A clean formal grammar — perhaps using a PEG parser generator — would have prevented these accretions and made the language easier for others to learn.
Second, I would invest more heavily in the self-model from the start. In the early development, the self-model was an afterthought — a nice-to-have diagnostic tool. It became load-bearing only when the system grew complex enough to require automated self-diagnosis. By then, retrofitting comprehensive self-model coverage to 15 already-existing handlers was tedious and error-prone. Starting with the self-model as a first-class architectural component would have produced a more consistent system with fewer self-model test failures during the retrofit period.
Third, I would separate the bridge handlers into their own package. The current architecture bundles bridges with core handlers, which means updating the Obsidian bridge requires re-running the entire core test suite. A separate bridge package with its own test suite and versioning would reduce the maintenance burden and make it easier for downstream consumers (like metasystem-master) to pin specific bridge versions without taking on core engine changes.
What Scales Well
Despite its challenges, RE:GE validates a core thesis: recursive self-referential architectures can scale to production complexity if you invest in three things.
Rigorous depth management. The MAX_DEPTH = 12 limit, the cycle detector, and the graceful degradation protocol collectively prevent the system from disappearing into infinite recursion. These are not optional safeguards; they are load-bearing architectural components without which the system would be unusable.
Comprehensive testing at every tier. The 1,254 tests are not a vanity metric. They are the reason I can add a new handler or modify the ritual syntax without fear of breaking self-referential invariants that would be invisible in normal testing. The self-model tests, in particular, provide a kind of architectural insurance that no amount of manual code review can replicate.
Explicit self-modeling. The self-model is the single most valuable component of the architecture. It transforms RE:GE from a system I built into a system I can reason about while it runs. That distinction matters enormously as complexity grows. A system you cannot observe is a system you cannot maintain.
The eight-organ meta-system exists because RE:GE proved that recursive architectures work. The organ handler pattern — self-contained modules communicating through a shared event bus with explicit self-modeling — is the same pattern that organizes 78 repositories across 8 organizations. The scale changed. The principle did not.
ORGAN-I’s theoretical work is not academic exercise. It is the infrastructure upon which everything else is built. RE:GE processes the symbolic operations that ORGAN-II renders as art and ORGAN-III commercializes as products. The recursive engine does not merely model itself. It models the entire system of which it is a part. That is what recursive engines at scale actually means.