Combinatorial Creativity at Machine Speed
Here is an observation about human creativity that is both well-established and routinely ignored: most creative breakthroughs are not bolts from the blue. They are combinations. Existing ideas, connected in new ways. Darwin combined Malthusian population dynamics with variation in natural populations. The Wright brothers combined bicycle engineering with aerodynamic theory. Steve Jobs combined calligraphy with personal computing. The creative act was not inventing any of the component ideas — it was seeing the connection.
Arthur Koestler called this “bisociation” — the meeting of two previously unrelated matrices of thought. Margaret Boden calls it “combinational creativity” and distinguishes it from exploratory creativity (working within a known framework) and transformational creativity (changing the framework itself). Whatever you call it, the pattern is consistent: most novel ideas are novel combinations of existing ideas.
Now here is the problem. You know things from a handful of domains. Maybe five, if you’re unusually polymathic. For each of those domains, you have access to maybe a few hundred concepts that are salient enough to participate in combinatorial creativity. Let’s be generous and say you have 500 concepts across all your domains. The number of pairwise combinations of 500 concepts is 124,750. The number of three-way combinations is about 20 million. Most of these are uninteresting. But finding the interesting ones requires evaluating them, and 20 million is far too many for a human to evaluate, even unconsciously.
This is where an LLM changes the game. Not because it’s more creative than you — that framing misses the point. But because it can explore combinatorial spaces faster than you can, draw on a wider set of component concepts, and produce candidate combinations for you to evaluate. The creative act shifts: instead of generating novel combinations (which is bottlenecked by what you know and how fast you can think), you curate novel combinations generated at machine speed. You become the editor, not the writer. And editing is something humans are very good at.
The Combinatorial Advantage
Let’s be precise about what the LLM brings to combinatorial creativity.
Breadth of component concepts. You have deep knowledge of a few domains. The LLM has broad knowledge of hundreds of domains. This means the space of possible combinations it can draw on is orders of magnitude larger than yours. Most of those combinations will be garbage. But the probability of finding an interesting combination increases with the size of the search space, as long as you have a way to filter the results.
Speed of generation. You can generate maybe one cross-domain analogy every few minutes, if you’re actively brainstorming. The LLM can generate dozens in seconds. This is not a quality advantage — it’s a quantity advantage that matters because combinatorial creativity is, to a significant degree, a numbers game. The more candidates you generate, the more likely you are to find the good ones.
No disciplinary inhibition. When you generate cross-domain analogies, you unconsciously filter out combinations that feel “inappropriate” — a physicist might not suggest a connection to literary criticism, not because the connection doesn’t exist, but because it feels professionally uncomfortable. The LLM has no such inhibition. It will happily combine thermodynamics with narrative theory, or immunology with product design, because it has no professional identity to protect.
Structural pattern matching. This is the most subtle advantage. When a human looks for analogies between two domains, they often get stuck on surface features. (“Companies are like organisms” — okay, but which specific structural features map, and which don’t?) The LLM, because it represents concepts as positions in a high-dimensional space organized by structural similarity, can sometimes identify structural parallels that are non-obvious: mathematical isomorphisms, shared dynamic patterns, common constraint structures.
The Core Technique: Systematic Cross-Domain Mapping
Here is the technique I’ve found most productive for combinatorial creativity with an LLM. It has three stages: generation, evaluation, and deepening.
Stage 1: Generation
Take the concept you want to explore and ask the LLM to find structural parallels in a wide range of domains. The key word is structural — you want parallels in how things work, not in what they look like.
Prompt template:
I’m working with the following concept: [describe your concept in 2-3 sentences, focusing on its key structural features — what are the inputs, outputs, dynamics, and constraints?]
Find structural parallels to this concept in each of the following domains. For each domain, identify the specific concept or mechanism that shares the deepest structural similarity. Do not settle for surface analogies — I want parallels in the underlying dynamics, constraints, or mathematical structure.
Domains: evolutionary biology, thermodynamics, music theory, urban planning, immunology, game theory, literary narrative, fluid dynamics, ecology, economics, military strategy, linguistics.
For each parallel, explain: (a) what the parallel concept is, (b) specifically which structural features map onto my original concept, and (c) where the parallel breaks down.
This typically produces 12 candidate analogies, of which 2-4 are genuinely interesting and 1-2 are insights you wouldn’t have reached on your own. The hit rate is low, but the generation is fast, so the expected value is high.
Stage 2: Evaluation
Not all parallels are load-bearing. Some are surface analogies that sound clever but don’t actually illuminate anything about your original concept. You need a way to separate the structural from the superficial.
Prompt template for evaluation:
You generated the following parallels to my concept: [list the parallels from Stage 1]
For each parallel, apply these tests:
Prediction test: Does the parallel predict something about my original concept that I can verify? If I take the dynamics of the parallel domain seriously, what should I expect to see in my domain that I might not have noticed?
Mechanism test: Is there a specific mechanism in the parallel domain that maps onto a specific mechanism in my domain? Not a vague similarity (“both involve competition”) but a concrete mechanistic parallel (“the negative feedback loop in X maps onto the resource constraint in Y”).
Surprise test: Does the parallel suggest something about my concept that is genuinely non-obvious? If the insight is “both systems involve trade-offs,” that’s not useful. If the insight is “both systems exhibit critical transitions at specific threshold values,” that’s useful.
Rate each parallel as: load-bearing (passes all three tests), interesting (passes two), or superficial (passes one or zero). Explain your ratings.
Stage 3: Deepening
For the parallels that survive evaluation, go deep. Explore the target domain’s treatment of the parallel concept and look for insights that transfer.
Prompt template for deepening:
The parallel between [your concept] and [target domain concept] passed evaluation. Now go deep.
In [target domain], what is the most sophisticated understanding of [target concept]? What subtleties, failure modes, and non-obvious dynamics have experts in that field identified? What are the classic mistakes that novices make when thinking about this concept?
Then: translate each of these back to my original domain. Which of the target domain’s hard-won insights transfer? Which don’t? For those that transfer, what specifically do they imply I should do differently?
Worked Example: “Technical Debt” Through Twelve Lenses
Let me walk through the full technique with a concept that every software engineer thinks they understand: technical debt.
Stage 1: Generation
I asked the model to find structural parallels to technical debt in twelve domains. The concept I described:
Technical debt is the accumulated cost of shortcuts taken during software development. It makes future development slower and more error-prone. It accrues “interest” — the longer it persists, the more costly it becomes to work around. It can be “repaid” through refactoring, but repayment has an opportunity cost (time spent refactoring is time not spent building new features). Teams that ignore technical debt eventually reach a point where development slows to a crawl.
Here are the twelve parallels the model generated (condensed):
-
Evolutionary biology — genetic load. The accumulation of slightly deleterious mutations in a population. Most are individually harmless but collectively reduce fitness. Can be purged by strong selection, but purging has its own costs.
-
Thermodynamics — entropy accumulation. A closed system tends toward disorder. Maintaining order requires continuous energy input. Local reductions in entropy always increase global entropy.
-
Music theory — harmonic tension. Unresolved dissonance creates forward momentum but must eventually resolve. Too much accumulated tension without resolution becomes unpleasant.
-
Urban planning — infrastructure deficit. Deferred maintenance on roads, bridges, and pipes. Individually small deferrals that compound into systemic fragility.
-
Immunology — chronic inflammation. Low-grade, persistent immune activation that doesn’t resolve. Individually below the threshold of symptoms but collectively degrading function.
-
Game theory — iterated defection. Shortcuts as defection in a repeated game against your future self. Short-term gains that erode the long-term payoff.
-
Literary narrative — subplot proliferation. Unresolved subplots that accumulate until the narrative becomes incoherent. Each subplot is interesting individually, but collectively they overwhelm the reader’s ability to track the story.
-
Fluid dynamics — viscosity increase. Particulates accumulating in a fluid, gradually increasing resistance to flow. The system still moves, but every action requires more force.
-
Ecology — nutrient depletion. Intensive farming that extracts nutrients faster than they’re replaced. Yields stay high until they suddenly crash.
-
Economics — deferred maintenance (capital depreciation). Under-investment in maintaining capital stock. The accounting books look better in the short term, but the productive capacity of the assets degrades.
-
Military strategy — overextended supply lines. Rapid advance that outpaces logistics. The further you advance, the more vulnerable you become. Local victories that create systemic fragility.
-
Linguistics — semantic drift. Words gradually changing meaning through informal usage until formal communication becomes unreliable. Still functional for routine exchange but breaks down for precision.
Stage 2: Evaluation
Applying the three tests to each parallel:
Load-bearing (pass all three tests):
-
Genetic load — Predicts that technical debt is inevitable in any living system (selection is never perfect); suggests that periodic “purging” events (major refactors) are a natural part of the lifecycle; identifies that the real danger is not the debt itself but the accumulation rate exceeding the purging rate. The mechanistic parallel is tight: point mutations map to individual shortcuts, genetic load maps to accumulated debt, selection pressure maps to code review and refactoring. The surprise: evolutionary biology suggests that some genetic load is actually beneficial because it maintains variation. Does some technical debt serve a similar function? Possibly — code that’s too perfectly refactored may be too rigid to adapt.
-
Chronic inflammation — Predicts specific symptomology: technical debt doesn’t cause acute failures, it causes a pervasive slowdown that’s hard to diagnose because there’s no single cause. The mechanistic parallel: individual inflammatory markers map to individual shortcuts, the threshold below which each is individually harmless maps to “it works, it’s just a bit ugly,” and the collective degradation of function maps to the gradual slowdown. The surprise: chronic inflammation is notoriously hard to treat by addressing individual causes — you have to treat the systemic condition. This suggests that addressing individual pieces of technical debt may be less effective than systemic interventions (architectural changes, development process changes).
-
Ecological nutrient depletion — Predicts a critical transition: technical debt accumulates silently until a tipping point, after which development doesn’t gradually slow — it suddenly crashes. The mechanistic parallel is strong: soil nutrients map to codebase health, crop yield maps to development velocity, and intensive farming without replenishment maps to feature development without refactoring. The surprise: ecology suggests that the tipping point is preceded by specific warning signs (reduced diversity of soil organisms, increased vulnerability to drought). What are the analogous warning signs for technical debt tipping points? Possibly: increasing brittleness (more bugs per feature), decreasing diversity of contributors (only the original authors can work on certain components), increasing vulnerability to “weather events” (requirements changes cause disproportionate rework).
Interesting (pass two tests):
-
Overextended supply lines — Good prediction (rapid feature development creates vulnerability), strong mechanism (distance from logistics base maps to distance between code and understood, maintainable state). Lacks a truly surprising insight.
-
Harmonic tension — Interesting surprise (some tension creates forward momentum; fully “repaying” all technical debt may remove productive urgency), but the mechanistic parallel is loose.
Superficial (pass one or zero):
The remaining parallels either lacked predictive power, had weak mechanistic mappings, or produced insights that were obvious (“technical debt is like an accumulation of something bad”).
Stage 3: Deepening
I chose the chronic inflammation parallel for deepening. The model generated extensive analysis of how immunology understands chronic inflammation, including:
-
Resolution pathways: The immune system has active resolution mechanisms — it doesn’t just “stop being inflamed,” it actively produces pro-resolution mediators. The software implication: you may need active mechanisms for resolving technical debt, not just the absence of creating more. A dedicated process, with its own tooling and incentives, separate from feature development.
-
Comorbidity: Chronic inflammation rarely occurs in isolation — it’s associated with and exacerbates other conditions. The software implication: technical debt in one component doesn’t just slow work on that component. It exacerbates problems in adjacent components, creating a cluster of co-occurring issues that are worse together than they would be individually.
-
Biomarkers: Chronic inflammation is monitored through specific biomarkers (CRP, ESR, cytokine levels) long before symptoms appear. The software implication: what are the “biomarkers” for technical debt? Not lines of code or cyclomatic complexity (those are too crude), but something like: time-to-first-successful-change for a new developer, ratio of bug-fix commits to feature commits, rate of “just for now” comments in commit messages.
-
Anti-inflammatory vs. immunosuppressant: Treating chronic inflammation requires reducing inflammation without suppressing the immune system’s beneficial functions. The software implication: “refactoring” that removes all complexity is like immunosuppression — it solves the immediate problem but removes the system’s ability to handle complexity. Good refactoring is like a targeted anti-inflammatory: it reduces harmful complexity while preserving beneficial complexity.
This last insight — the distinction between harmful and beneficial complexity, framed as the distinction between inflammation and immune function — was genuinely new to me. I’d always thought of technical debt as uniformly bad (by definition). The immunological frame suggests that some of what we call technical debt is actually the system’s adaptive response to complex requirements, and removing it would be harmful. The question isn’t “how do we eliminate technical debt?” but “how do we distinguish between technical debt that’s genuinely pathological and technical debt that’s actually functional complexity we’ve mislabeled?”
Speed and Quantity: The Numbers Game
I want to emphasize the quantitative aspect of this technique, because it’s easy to focus on the qualitative examples and miss the core advantage.
In the worked example above, I generated 12 candidate parallels in about 30 seconds. Of those, 3 were load-bearing, 2 were interesting, and 7 were superficial. The three load-bearing parallels each produced at least one insight that I wouldn’t have reached through normal thinking about technical debt. One of those insights (the inflammation/functional complexity distinction) was genuinely novel to me.
If I’d tried to do this manually — sit down and think about what technical debt is like in other domains — I might have generated 3-4 analogies in 30 minutes, probably including the economics and infrastructure ones (because those are the standard analogies that appear in the existing literature). I probably wouldn’t have generated the immunology or ecology parallels, because those domains aren’t part of my active knowledge. So I would have spent 10x the time and produced a less useful result.
This is the combinatorial creativity advantage: not better individual analogies, but more candidates evaluated faster, drawn from a wider pool of source domains. The LLM is not a better creative thinker than you. It’s a faster and broader combinatorial explorer, and you’re a better evaluator. Together, you can cover more ground than either could alone.
Scaling Up: Concept Matrices
Once you’re comfortable with the basic technique, you can scale it up by working with multiple concepts simultaneously.
Concept matrix prompt:
I have three concepts that are central to my problem:
- [Concept A: description]
- [Concept B: description]
- [Concept C: description]
For each pair (A-B, A-C, B-C) and for the triple (A-B-C), find the single best structural parallel in any domain. The parallel should illuminate the relationship between the concepts, not just the concepts individually.
For each parallel: (a) name the domain and specific concept, (b) explain the structural mapping in detail, (c) describe what the parallel predicts about the relationship between my original concepts.
This technique is particularly useful when you’re trying to understand how multiple factors in your problem interact. The pairwise and three-way parallels often reveal interaction dynamics that thinking about each concept individually would miss.
Example: For a startup trying to balance growth speed (Concept A), code quality (Concept B), and team morale (Concept C):
-
The A-B parallel (growth speed vs. code quality) might map to predator-prey dynamics in ecology: growth “consumes” quality, but if quality collapses, growth also collapses. This predicts oscillation — boom-bust cycles of rapid development followed by painful slowdowns — rather than a stable tradeoff.
-
The A-C parallel (growth speed vs. team morale) might map to pace/recovery cycles in athletic training: sustained high pace without recovery leads to overtraining and injury. This predicts that morale doesn’t degrade linearly with pace but has a threshold beyond which recovery becomes much slower.
-
The B-C parallel (code quality vs. team morale) might map to habitat quality and species health in ecology: organisms in degraded habitats show stress markers even when food is abundant. This predicts that low code quality degrades morale even if other factors (compensation, management, projects) are good.
-
The A-B-C triple might map to the fire triangle (heat, fuel, oxygen): remove any one element and the fire goes out. This predicts that all three must be maintained above threshold simultaneously — you can’t compensate for zero code quality with high morale and fast growth.
Advanced Technique: Reverse Mapping
So far, I’ve described the technique as: start with your concept, find parallels in other domains. But you can also run it in reverse: start with an interesting concept from another domain and ask the model to find where it applies in your domain.
Reverse mapping prompt:
Here’s a concept from [domain]: [describe the concept in detail, including its key dynamics, failure modes, and non-obvious implications].
Is there anything in [your domain] that works this way? I’m not looking for a surface metaphor. I want to know if the underlying dynamics described above are literally present in some aspect of [your domain]. If so, identify specifically what system, process, or phenomenon in my domain exhibits these dynamics, and describe the mapping in detail.
This is useful when you encounter an interesting idea in a book, lecture, or conversation and want to know if it applies to your work. The LLM can search across the full space of potential applications much faster than you can.
Example: After reading about quorum sensing in bacteria (a mechanism by which bacteria coordinate behavior based on population density, switching from individual to collective behavior when a threshold is reached), you might ask:
Is there anything in software engineering team dynamics that works like quorum sensing? I’m looking for situations where a group of individuals switches from independent behavior to coordinated behavior based on the density or frequency of some signal, and where this switch happens at a threshold rather than gradually.
This might surface: code review norms that emerge spontaneously when a team reaches a certain size; the threshold at which individual debugging becomes pair debugging based on the “density” of error signals; or the point at which a team switches from ad-hoc communication to formal stand-ups based on the frequency of coordination failures.
The Critical Caveat: Evaluation Is Everything
I’ve spent this chapter describing how to generate creative combinations. I need to close by emphasizing that generation is the easy part. Evaluation is where the real work happens, and it is fundamentally a human responsibility.
The LLM can tell you that the parallel between technical debt and chronic inflammation passes the prediction test, the mechanism test, and the surprise test. But it cannot tell you whether the resulting insight is true in your specific situation. It cannot tell you whether the inflammatory model of technical debt actually describes your codebase or whether it’s a compelling story that doesn’t match reality. That judgment requires domain knowledge, contextual understanding, and empirical testing that only you can provide.
The combinatorial creativity technique produces hypotheses, not conclusions. Each interesting parallel is a hypothesis about the structure of your problem — a claim that certain dynamics, identified in another domain, are also at work in yours. These hypotheses need to be tested. The test is not “does this sound right?” (which is a test of narrative plausibility, not truth). The test is: “If this parallel is accurate, what should I observe? Do I observe it?”
Some hypotheses will fail the test. The ecological nutrient depletion model of technical debt predicts a sudden tipping point — a cliff, not a slope. If your codebase’s development velocity has degraded gradually and continuously, the nutrient depletion model is wrong for your situation, however satisfying the analogy sounds. Discard it.
Other hypotheses will pass the test and genuinely change how you understand your problem. These are the wins. They justify the entire process. And they come from a place that neither you nor the LLM could have reached alone — from the combinatorial space that lies between your domain knowledge and the model’s breadth, explored at machine speed and evaluated with human judgment.
That collaboration — machine-speed exploration, human-quality evaluation — is the subject of the rest of this book. The techniques in Part III will give you more tools for both sides of the equation. But the fundamental dynamic is the one we’ve described in this chapter: the LLM explores the combinatorial space; you evaluate the results; and together you think thoughts that neither of you could have thought alone.