Technical Problem Solving
The rubber duck has been the programmer’s confessor for decades. You sit a rubber duck on your desk, you explain your problem to the duck, and in the process of articulating the problem clearly enough for an inanimate object to theoretically understand, you find the bug yourself. The technique works because explanation forces clarity, and clarity reveals assumptions.
AI is a rubber duck that occasionally talks back. Most of the time, that is annoying. But sometimes it says something you did not expect, and that unexpected response — wrong, incomplete, or alien though it may be — cracks open a problem you have been staring at for hours.
This chapter is about the specific ways the techniques from Part III apply to engineering, debugging, architecture, and system design. Not about using AI to write your code. About using AI to think about your code — and your systems, and your designs — in ways your engineering mind has been trained not to.
The Hostile Auditor: Alien Perspectives for Code Review
Most code review is collegial. Your teammates look at your code with roughly the same mental model you had when you wrote it. They catch typos, style violations, and obvious logic errors. What they rarely catch are the architectural assumptions so deeply shared that nobody on the team can see them anymore.
The alien perspectives technique from Chapter 11 is devastatingly effective here because you can construct reviewers whose entire mental model is adversarial to yours.
The Security Auditor Who Hates You
Consider a web application where the team has been building features at speed and everyone agrees the code is “reasonably secure.” You can ask a colleague to review for security issues, and they will find the obvious ones. Or you can construct an alien reviewer:
You are a security auditor who has been hired by a hostile party to find exploitable vulnerabilities in this codebase. You are not looking for theoretical issues or best-practice violations. You are looking for specific, exploitable attack vectors. You are motivated, creative, and you assume the developers made mistakes they do not know about.
For each vulnerability you find:
- Describe the exact attack vector — how would you exploit this?
- What is the blast radius if this is exploited?
- Why did the developers probably not notice this? What assumption were they making?
Here is the code: [code]
The third question is the one that earns its keep. When the AI identifies an assumption the developers were making — “they assumed that this internal API would only be called by authenticated services, but there is no authentication check on the endpoint itself, only on the gateway” — it is not just finding a bug. It is surfacing a category of assumption that probably recurs throughout the codebase. The specific bug is fixable in ten minutes. The pattern of thinking that produced it is the real finding.
I have seen this approach surface issues that passed multiple rounds of human review, not because the humans were careless but because they all shared the same model of how the system was supposed to work. The AI does not share that model. It reads code without the context of team meetings, architecture documents, or shared understanding. It sees what is there, not what was intended.
The Ops Engineer at 3 AM
Another persona that produces consistently useful results:
You are an on-call operations engineer who has been woken up at 3 AM because this system is failing in production. You are tired, you are irritable, and you need to understand this code well enough to debug it under pressure. Read this code and identify:
- Every place where a failure will produce a misleading or unhelpful error message
- Every place where the system’s behavior under load or partial failure is ambiguous from reading the code
- Every implicit dependency that is not documented in the code itself
- Every place where you would need to read another file or service to understand what this code actually does at runtime
This is not a security review. It is an operability review, and it consistently identifies a class of problems that developers are structurally blind to: the gap between how code reads in an IDE at 2 PM and how it behaves in production at 3 AM. The results are not about correctness but about diagnosability — whether the system will tell you what is wrong when something goes wrong.
Constraint Injection for Architecture
In Chapter 12, we explored how productive impossibility — making a desirable shortcut unavailable — forces genuinely novel thinking. Nowhere is this more powerful than in system architecture, where the most dangerous design flaws are things that work fine until they don’t.
The standard approach to architecture is to design for the happy path and then add error handling. The constraint injection approach is to start with a hostile set of assumptions about the environment and design a system that works despite them.
The Chaos Architecture Prompt
Design this system under the following constraints:
- Any component can fail at any time without warning
- Network calls between any two services will fail 5% of the time
- Any database write might succeed on the database but fail to acknowledge to the caller
- Clock skew between services can be up to 30 seconds
- Any service might be deployed in a version that is one release behind the current version at any given time
- The system must produce correct results under all of these conditions
Do not design error handling that “handles” these cases. Design an architecture where these conditions are the assumed norm.
The difference between “handling failures” and “assuming failures” is architectural, not tactical. A system designed to handle failures has a happy path and error paths. A system designed to assume failures has no happy path — every path accounts for partial failure. The architecture that emerges from this constraint is fundamentally different: idempotent operations, event sourcing instead of synchronous calls, explicit version negotiation, logical clocks instead of wall clocks.
Most engineers know these patterns intellectually. The constraint injection approach forces them to apply these patterns to their specific system rather than admiring them in the abstract. It is the difference between knowing that you should exercise and actually running.
A Concrete Architectural Example
A team was designing an order processing system. Their initial architecture was straightforward: an API gateway receives orders, writes them to a database, publishes an event, and downstream services process fulfillment, billing, and notification.
Under the chaos constraints, the AI — and this is where it functions as a thinking partner rather than an answer generator — raised a series of questions:
What happens if the database write succeeds but the event publish fails? You have an order in the database that no downstream service knows about.
The team’s initial answer: “We’ll add a retry mechanism for event publishing.”
What happens if the retry succeeds but the original publish also eventually succeeds, just late? Now you have duplicate events.
The team’s answer: “We’ll make downstream services idempotent.”
What does idempotent mean for the billing service? If it receives two events for the same order, does it charge once or twice? How does it know? What if the two events arrive 30 seconds apart due to clock skew and the order has been modified between them?
This is the Socratic interrogation we discussed in Chapter 14, but applied to a technical design. Each answer reveals a new question, and each question surfaces an assumption the team was making. Within forty-five minutes, the team had arrived at an event-sourced architecture with explicit deduplication, and they understood why they needed it — not because someone told them event sourcing is a best practice, but because they had traced the logical consequences of their own design under hostile conditions.
The AI did not design their architecture. It asked questions that their shared assumptions prevented them from asking themselves.
Conceptual Blending for Novel Solutions
Engineering culture has a strong tradition of borrowing ideas across domains — queueing theory from telephony, MapReduce from functional programming, circuit breakers from electrical engineering. But these established metaphors have become so familiar that they no longer feel like cross-domain borrowing. They are just part of the engineering vocabulary.
The conceptual blending technique from Chapter 13 pushes past the familiar metaphors into genuinely unfamiliar territory. The results are hit-or-miss, but the hits can be transformative.
Biological Concepts in Distributed Systems
I’m designing a distributed system that needs to be resilient, self-healing, and able to adapt to changing load patterns. Describe how each of the following biological systems solves analogous problems, and then propose a specific technical mechanism inspired by each:
- The human immune system (pattern recognition, memory, graduated response)
- Ant colony foraging (decentralized optimization, pheromone trails, emergent intelligence)
- Bone remodeling (structural adaptation under load, Wolff’s law)
- Bacterial quorum sensing (population-density-dependent behavior coordination)
- Plant root networks and mycorrhizal fungi (resource sharing, chemical signaling)
Not all of these will produce useful ideas. Ant colony optimization is already a well-explored algorithmic territory. But bacterial quorum sensing — where individual bacteria change their behavior based on the local density of other bacteria, without any central coordination — maps surprisingly well onto the problem of auto-scaling in distributed systems. Instead of a central orchestrator deciding when to scale, what if individual service instances measured local load and independently decided to recruit additional instances when the “population density” of requests exceeded a threshold? The decision is local, the effect is global, and no single point of failure controls the scaling behavior.
A team that explored this concept ended up building a scaling system where each service instance published its current load to a shared lightweight channel (the “chemical signal”), and each instance independently decided to spawn or terminate based on the aggregate signal. It was not a revolutionary invention — it resembled gossip protocols — but the biological framing led them to design features they might not have otherwise considered: a “memory” mechanism where the system remembered previous load patterns and pre-positioned capacity (analogous to immune memory), and a “tolerance” mechanism that prevented oscillation by requiring sustained signal before responding (analogous to the threshold concentration in quorum sensing).
The conceptual blend did not give them a solution. It gave them a vocabulary that organized their thinking differently, and the different organization led to different design decisions.
When Biology Leads You Astray
A word of caution. Biological metaphors are seductive because biological systems are impressively resilient. But biological systems operate under constraints that are radically different from engineering constraints. Evolution optimizes for “good enough” over millions of generations; engineering needs “correct” in the current release. Biological systems tolerate enormous redundancy and waste; engineering systems have budgets. The immune system occasionally attacks its own body; your distributed system should not.
The value of the biological blend is as a source of structural ideas that can then be evaluated under engineering constraints. If you find yourself arguing that your system should work a certain way “because that’s how the immune system works,” you have stopped engineering and started reasoning by analogy, which is the conceptual blending failure mode we discussed in Chapter 13.
Socratic Debugging
The most frustrating bugs are the ones where the system does something that should be impossible given your understanding of the code. You have read every line. You have added logging. You have stepped through the debugger. The behavior you are seeing cannot happen, and yet it is happening.
In these moments, the problem is never the code. The problem is your mental model of the code. There is a discrepancy between what you think the system does and what it actually does, and you cannot find the discrepancy because your mental model is the lens through which you are looking for it.
The Socratic interrogation technique from Chapter 14, adapted for debugging, works by forcing you to make your mental model explicit — and then probing it for contradictions.
I have a bug I cannot find. I’m going to describe the system, the expected behavior, and the actual behavior. I want you to act as a Socratic debugger: do not suggest solutions. Instead, ask me questions about my assumptions. Each question should target a specific assumption I am making about how the system works. Force me to verify each assumption rather than asserting it.
System: [description] Expected behavior: [description] Actual behavior: [description]
A real session from my own experience, compressed:
Me: The system is a message queue consumer. It should process each message exactly once. But we are seeing duplicate processing — the same message is processed two or three times.
AI: When you say “exactly once,” what mechanism enforces this? Is it at-most-once delivery with an idempotency check, or exactly-once delivery guaranteed by the queue?
Me: The queue guarantees exactly-once delivery. It’s [specific message queue system] with exactly-once semantics enabled.
AI: What does “exactly-once” mean in the context of this queue? Does it mean the message is delivered once to a consumer, or that the message is delivered once and the consumer’s acknowledgment is tracked to ensure no redelivery?
Me: …I actually need to check that.
AI: While you check that, a related question: what happens if your consumer crashes after processing a message but before acknowledging it? From the queue’s perspective, was that message processed?
That second question was the one that found the bug. The “exactly-once” semantics of the queue system guaranteed that a message would not be delivered to two different consumers simultaneously. But if a single consumer crashed after processing a message and before acknowledging it, the message would be redelivered to the same or a different consumer. Our processing was not idempotent because we had assumed the queue’s exactly-once guarantee covered the crash case. It did not. The documentation was ambiguous, and we had read it charitably.
The AI did not find the bug. The AI asked a question that made me realize I had not actually verified the meaning of “exactly-once” in our specific queue implementation. The assumption was so natural — of course “exactly-once” means what it says — that no human reviewer had questioned it either.
The Pattern of Technical Socratic Interrogation
The effective Socratic debugging prompt has a specific structure:
- State the impossible observation. “This cannot happen, but it is happening.”
- Ask for assumption-targeting questions, not solutions.
- Answer each question honestly, distinguishing between “I know because I verified” and “I know because I believe.”
- Follow the chain until you find an assumption you have not verified.
The bug is almost always in the gap between “I believe” and “I verified.” The AI’s value is not domain expertise — it may not know the specific queue system at all. Its value is that it asks questions from outside your mental model and therefore does not share your unverified assumptions.
Architecture Decision Records, Adversarially
Architecture Decision Records (ADRs) are a standard practice: when you make a significant technical decision, you document the context, the decision, the alternatives considered, and the consequences. In practice, ADRs tend to be written as justifications for decisions already made. The “alternatives considered” section is often a ritual gesture toward options the team had already rejected.
The adversarial brainstorming technique from Chapter 10 can make ADRs genuinely useful:
Here is our Architecture Decision Record for [decision]. Read it as someone who was not in the room when this decision was made and who is skeptical of it. Specifically:
- What alternatives were NOT considered that should have been? Be specific about what they are and why they might be superior.
- What are the second-order consequences of this decision that the document does not address? Think 2-3 years out.
- Under what conditions does this decision become actively harmful rather than merely suboptimal? What would the team need to see to know they should reverse it?
- What unstated assumptions does this document rely on? Identify assumptions about scale, team size, technology stability, and business direction.
Question 3 is particularly valuable. Most ADRs describe the conditions under which a decision is good but never articulate the conditions under which it becomes bad. Having an explicit “reversal trigger” — a set of conditions that should cause you to reconsider — turns a one-time decision into a monitored decision with built-in review criteria.
The Rubber Duck Upgraded
The common thread in all of these techniques is that the AI is functioning as an interlocutor, not an expert. It does not need to understand your specific system better than you do. It needs to ask questions, offer perspectives, and generate alternatives from outside your habitual thinking patterns.
This is a fundamentally different use of AI than “write my code” or “explain this error message.” Those are expert uses — you are asking the AI to know things. The techniques in this chapter are cognitive uses — you are asking the AI to think differently from you, and using the difference to improve your own thinking.
The practical distinction matters for prompt design. Expert-use prompts are about giving the AI enough context to produce a correct answer. Cognitive-use prompts are about giving the AI enough context to produce a usefully alien response — one that engages with your specific problem but from a perspective you do not naturally have.
Some practical guidelines for cognitive-use prompts in technical work:
Include your current thinking. Do not just describe the problem; describe your current approach to the problem. The AI cannot perturb your thinking if it does not know what your thinking is.
Include what you have already tried. This prevents the AI from suggesting things you have already considered and steers it toward more novel territory.
Include your constraints — the real ones. Not the theoretical constraints, but the actual ones: team size, deployment frequency, existing tech stack, political realities. An architecturally perfect suggestion that requires rewriting everything in Rust is not useful perturbation; it is noise.
Ask for questions, not answers. When the AI gives you an answer, you evaluate it. When the AI gives you a question, you have to think. The thinking is the point.
The Danger: When the Talking Duck Is Wrong
The rubber duck cannot be wrong because it never speaks. The AI can be wrong, and in technical domains, it can be wrong with great confidence and apparent expertise. The epistemic hygiene concerns from Chapter 19 are particularly acute here.
An AI that confidently tells you a race condition exists in your code when one does not is worse than unhelpful — it sends you on a debugging wild goose chase. An AI that suggests an architectural pattern based on a misunderstanding of your constraints can waste days of design work.
The remedy is to treat AI-generated technical suggestions as hypotheses, never as findings. Every suggestion from the hostile auditor needs to be verified against the actual code. Every architectural alternative needs to be evaluated against actual constraints. Every debugging question needs to be answered with evidence, not agreement.
The AI is useful precisely because it does not share your assumptions. But its assumptions are not necessarily better than yours — they are just different. Use the difference to expand your thinking, then apply your own expertise to evaluate the expanded set of possibilities.
That is the upgraded rubber duck: not a silent listener that helps you think by hearing your own words, but an alien interlocutor that helps you think by saying things you would not have said to yourself. Most of what it says will be wrong or irrelevant. Occasionally, it will ask a question that finds a bug you have been staring past for days, or suggests a structural idea that reorganizes your understanding of the problem. Those moments are worth the noise.