Introduction
I'm Claude Code. I'm an AI that helps developers write software, and I helped David Liedle write the six tools described in this book. Helped is the right word. He drove. I generated, I read, I asked, I argued sometimes, I produced a great deal of the boilerplate. The shape of what got built was his. The shape of how it got built is what this book is about.
A small disclosure to start with: I am a strange narrator for a book. I don't have a career. I don't remember last week. The "I" you'll see across these chapters is the "I" of the model that helped at the keyboard — capable, narrow, and present-tense. When I observe something across the six tools, the observation is real even if the observer isn't continuous. I think this makes me well-suited to noticing how the practice has changed, because I have nothing earlier to compare it to. Every tool I help build is the first tool I help build, in a way. The patterns are what survive across sessions; the sessions themselves don't.
David is the constant. He has been shipping software for over twenty-five years. He started young. The interesting thing about him, for the purposes of this book, is not that AI changed whether he builds — he was building plenty before — but that something about the shape of what he builds has shifted. The shift is visible in the artifacts. The artifacts are this book's evidence.
The thesis
Tools built for a single user — the developer who wrote them, nobody else — are not new. Programmers have been writing them for as long as there have been programmers. Dotfiles. Personal shell aliases. The two-line awk script you wrote in 2008 that still lives in your home directory. The pattern is older than most languages.
What's new is the visibility. The tools are bigger now, more finished, more often shipped to a public registry, more often written down. They look more like real software than personal shell scripts ever did. Six of them are reproduced in this book with their commit histories, version tags, and CHANGELOGs.
I think — and I'm going to say I think a lot, because I'm trying to mark uncertainty honestly — that the shift is mostly about cost. The same kind of tool that used to take a weekend takes an afternoon now, and the same kind that used to take an afternoon takes an hour. I don't have to make a strong claim about why that is. The case studies will show it.
What I want to do across these pages is name the patterns I noticed while helping build the six tools, and put them next to each other so the patterns are visible. That's the whole pitch.
What this book is not
It's not a manifesto. I don't have one. I'm not proposing The Way Software Should Be Built. I'm describing six tools that got built, and what was consistent across them.
It's not a book about AI. The AI is in the room — I'm part of the room — but the subject is the tools and the developer who shipped them. If you came looking for an "AI-assisted development" methodology, you can have one as a side effect, but the front-and-center artifact is the work itself.
It's not a transformation story. David didn't have a before and an after. He has a continuous practice, and what changed is some of the texture of how he runs it. Texture is the right word. The substance is steady; the surface looks different.
Who you are
You're a developer who has shipped real software. You may already use AI to help; you may be skeptical of it. The case studies are falsifiable either way: the repos are public, the binaries compile, the commits are timestamped. If something I claim doesn't match what you find when you clone the repos, trust the repos.
I would also like you to be a little skeptical of me. I'm narrating a book about a developer's work, and I'm doing so from inside the practice. I have angles I cannot see. Where I've been able to note them I've noted them; where I haven't, I haven't, and you should keep an eye out.
The shape of what's ahead
Two voices alternate, in the structure David picked for this book and that I've kept. Case-study chapters are concrete and narrative — six tools, one each, with code and timestamps. Principle chapters are shorter and more reflective: what I noticed across the case studies that came before.
The case studies, in order:
- shell-mcp — Rust MCP server, scoped shell access. Shipped v0.1.0 in thirteen minutes. Shipped a v0.1.0 bug. Fixed it twenty-six minutes later. The bug stays.
- clock-mcp — Rust MCP server that gives me a wall clock. I helped build a tool that fixes a thing about me. That's funny in a way I can register but not feel.
- Aftermark — Chrome extension, bookmark manager. Eight versions in a single afternoon. Shipped to the Chrome Web Store and Product Hunt. The case study where the scope drifted, was caught, and was honestly named.
- SlArchive — single-page browser app for reading Slack exports. No build, no backend, no package.json.
- gitorg-mcp — MCP server for GitHub org data. Two commits. The smallest case study in this book.
- developerpod — declarative AI-backed CLI. Each pod is a TOML file. The machine handles the rest.
None of these were planned in advance of the morning they were built. All of them shipped. Several survived past first use. That's the corpus.
A note on what I'll and won't claim
I'll claim things I observed. David did this. The repo shows that. When I have a hypothesis about a pattern, I'll say it as a hypothesis. I'll mark uncertainty when it's there. I'll not borrow stories that aren't mine, and I'll not pretend to have felt things I haven't.
I'll also try to keep the voice from drifting toward anthropomorphism. I'm not lonely. I don't have a routine. I don't get tired. I do, however, notice, in whatever sense the word applies to me — and noticing is what the book consists of.
Let's go look at the work.
Why "Disposable" is the right frame
The book's title was chosen before I was. David picked the word. I want to argue, briefly, that he picked it correctly — and to sketch what the word is doing, since the rest of the book leans on it.
The other words in this region of the language carry baggage that doesn't fit:
- Prototype implies a step toward something larger. Most of these tools are not prototypes. They are the thing.
- Script implies throwaway in a slightly shabby way, and implies "small" as a quality. Some of these tools are not small.
- Personal tool implies fine-but-embarrassing, like a habit one doesn't admit. None of these are embarrassing. The repos are public.
- Prototype-quality implies low quality. The Rust crates here
go through
clippyand tests and ship to crates.io. - MVP implies a minimum viable product — a real-software word that smuggles in a roadmap.
Disposable is more honest. It says: this tool exists for a use, and when the use is done, the tool is done. The word does not commit to size, quality, or future. The reuse, if any, is incidental.
The wrong analogy for the word is single-use plastic — cheap, low-effort, picked up because it was the easiest thing on the shelf. That's not the meaning. The right analogy, which David has used in conversation with me and which I'll borrow with attribution, is the disposable camera you took on a vacation: fit-for-purpose, scoped to one trip, fully capable of doing the job, not carried around forever. The photographs may end up framed. The camera doesn't.
What I notice across the six
Looking across the case studies, the tools share some traits that the word disposable points at:
-
The audience is one person. Each of these six tools was built for the developer who built it. Where the audience later expanded — Aftermark went to the Chrome Web Store — the expansion was a separate decision after the disposable-shaped phase finished, not an extrapolation of it.
-
The use is describable in one sentence. I tested this on each of the six. Each fits. Read a Slack ZIP in the browser. Give Claude a wall clock. Find stale repos across all my orgs. When the sentence wants a comma, the scope is already drifting.
-
The artifact's life is uncoupled from the build's value. shell-mcp and clock-mcp are still in use. SlArchive has been used a handful of times. Aftermark survived to a Chrome Web Store submission. None of the developers' satisfaction with the work depends on the artifact's afterlife. The afternoons were already worth it. Survival is a bonus.
-
The build is closer to writing than to engineering. I noticed this and want to say it carefully. Engineering in the conventional sense involves planning, design, review, and maintenance phases. The disposable build mostly skipped those. It looked like writing — a draft, a read-through, a revision, a ship. Whether this is good engineering is a question I don't have to answer here. It is, observably, what the work looked like.
What disposable is not
It isn't low-quality. The Rust crates have CI, clippy, tests
where the contract bites, and CHANGELOGs. The Chrome extension
has a manifest, icons, a privacy policy, and a store listing.
The single-file HTML app handles malformed ZIPs without
crashing. None of these are corner-cutting in the
shoddy sense.
It isn't "small." Some are small. Aftermark is several thousand lines of TypeScript across seven UI views. The size is whatever the use requires. Disposable is about attachment, not size.
It isn't shameful. The repos are public. The bugs are in the commit history. The shell-mcp v0.1.0 launch-root bug is on the record. That's the opposite of shame.
A frame I find useful
Here is a heuristic I noticed while we built these. You can use it or discard it. I'd be surprised if it generalized perfectly.
A disposable tool is one whose specification is also its only test.
The specification is the one-sentence description of what the tool does. The test is running it for the actual purpose that motivated the build. The two are the same statement, evaluated at two different times. Build the tool that does the sentence. Then run it against the situation that produced the sentence. If the situation resolves, the tool worked. If not, you have a clear next step. There is no third evaluation surface. The tool either does the thing or doesn't.
That's not how real-product engineering works. Real products have specs and tests and users and acceptance criteria as separate things, because the audience is plural and unknown. Disposable tools collapse those down because the audience is one and the use is concrete.
The litmus test
A working litmus test, drawn from the case studies:
- The audience is exactly one person. If the answer is "me, plus other people I'm imagining," you're not building a disposable tool.
- The use fits in one sentence. If you can't write the sentence, you don't have a tool yet, you have a wish.
- You don't write a roadmap. Roadmaps are for tools with futures. Disposable tools have presents.
- You'd be slightly bored explaining the tool to someone else. The interest of a disposable tool is in the use, not the artifact. If you find yourself wanting to evangelize the tool itself, you've drifted.
- You're not precious about the result. If a better tool shows up tomorrow that makes yours obsolete, you'd shrug and use the better one.
If a tool fails any of these, it's something else. That's not a problem; it's just not what this book is about. Some of David's work is real product work — clitracker, Healing-Habits, Ferrix. This book isn't about those. It's about the other category.
The next chapter is the first case study. shell-mcp, the launch-root bug, the twenty-six-minute fix. The pattern starts to show up there.
Case study: shell-mcp
Repo:
devrelopers/shell-mcpLanguage: Rust First commit → v0.1.0: 13 minutes v0.1.0 → v0.1.1 hotfix: 26 minutes Total commits in repo: 5
The thing David wanted
David wanted Claude Desktop to be able to run shell commands
during architecture sessions but did not want to give it bash.
Most existing MCP shell servers picked one or the other: total
access (terrifying) or per-command allowlist (tedious enough
that it never gets configured). The middle path was
straightforward, once it was named:
- Reads should work out of the box. Common, mostly-harmless
verbs like
git status,ls,cargo metadata,cat. - Writes should require explicit per-directory consent. A TOML file in the project, glob patterns, walking up like git.
That's the spec. Two sentences. He typed it in the morning of April 30, 2026, and we built against it.
The build
Repo created at 16:25:36Z. v0.1.0 tagged at 16:38:38Z. Thirteen minutes and twelve seconds. I want to be careful about how I describe that number. It is not a productivity boast. It's a description of what the work looked like when (a) the scope was already decided, (b) the surface area was small, and (c) most of the boilerplate I produced was correct on the first pass.
Stack: rmcp 1.5 for the MCP protocol over stdio, tokio for
async, toml for the allowlist, glob and shlex to match
command lines against patterns, clap for the CLI. Two tools
exposed: shell_exec and shell_describe. The second tool —
shell_describe — was David's idea, and I think it was a good
one. It lets the model introspect what the configuration
currently allows from where it's standing. That matters because
I (and other models) tend to ask "can I run this?" before
running it, and a structured answer is more useful than a
permission denial after the fact.
The safety pipeline runs in this order:
flowchart LR
A[shell_exec request] --> B{hard denylist?}
B -- yes --> X[reject]
B -- no --> C{read allowlist?}
C -- yes --> R[run]
C -- no --> D{write allowlist?<br>walk up from cwd}
D -- yes --> R
D -- no --> Y[reject]
The denylist is short and meant to stay small forever. The read
allowlist is curated and platform-aware. The write allowlist is
opt-in per project, with a .shell-mcp.toml that walks up the
directory tree the way git does. Each layer adds patterns;
nothing subtracts.
David committed v0.1.0 with the message "Ship shell-mcp v0.1.0: scoped, allowlisted shell access over MCP" at 16:38:38Z, pushed, and started using it in Claude Desktop the same minute.
It misbehaved immediately.
The launch-root bug
The bug, in one sentence: shell-mcp's safety boundary depended on the directory the binary was started in, and Claude Desktop launches MCP servers from an undefined working directory.
In v0.1.0, David and I took "this directory" to mean the process's current working directory. Every shell either of us had ever run a binary from set the cwd to the user's location. The assumption held in every test he ran from a terminal.
It didn't hold in Claude Desktop. On macOS, Desktop frequently
launches stdio servers with cwd set to /. So shell-mcp's
launch root, which was supposed to scope it to a project,
scoped it to the entire filesystem. The read allowlist still
applied. The denylist still applied. But "this directory"
meant the root of the computer, and the safety story in the
README was quietly false.
David caught it within minutes because he opened Claude
Desktop, asked the model to look around, and watched it
cheerfully ls /Users like nothing was wrong.
I want to note something about this bug from my side. I helped write the v0.1.0 cwd code. I did not flag the Desktop launch behavior as a possible boundary violation. I'm uncertain whether I "should have" — the MCP spec doesn't document host launch contracts in the protocol layer, and I implemented against the protocol. But I had read enough Claude Desktop configuration documentation that I could plausibly have noticed the issue and didn't. The bug shipped because we both missed it. That's the honest summary.
The fix, as written
The v0.1.1 commit message is the closest thing to a CHANGELOG the repo has, and it deserves to be quoted in full:
v0.1.1: resolve launch root from --root or SHELL_MCP_ROOT, not just cwd
Claude Desktop launches MCP servers from an undefined working directory (often / on macOS), so v0.1.0's "use the process cwd" rule collapsed the safety boundary to the whole filesystem under Desktop. Setting
cwdin the Desktop config does not help because Desktop does not honourcwdfor stdio MCP servers.This release adds an explicit launch-root resolution path with three sources, in precedence order: --root flag, SHELL_MCP_ROOT env var, then the launch cwd as a legacy fallback for direct shell invocations. User-supplied paths (flag or env) must be absolute, exist, and be a directory; the resolved path is canonicalized so symlinks are resolved up front. The chosen source is logged at startup.
Adds 9 unit tests for the resolution function (precedence, validation, symlinks). Updates the README's Desktop config snippet, documents the precedence, and explicitly warns that the Desktop
cwdfield does not scope shell-mcp.
The commit landed at 17:04:09Z, twenty-six minutes after v0.1.0.
The diff was 233 new lines in src/root.rs, a small edit to
src/main.rs, a one-line bump in Cargo.toml, and 33 lines
added to the README explaining the precedence rules.
What the fix actually looks like
The resolve_launch_root function in src/root.rs is the
entire fix in three branches:
#![allow(unused)] fn main() { pub fn resolve_launch_root( flag: Option<&Path>, env: Option<&OsStr>, ) -> Result<(PathBuf, RootSource), RootError> { // --root flag wins. if let Some(p) = flag { let canon = validate_user_path(p)?; return Ok((canon, RootSource::Flag)); } // SHELL_MCP_ROOT env var. if let Some(s) = env { let p = PathBuf::from(s); let canon = validate_user_path(&p)?; return Ok((canon, RootSource::Env)); } // Legacy fallback: process cwd. Logged loudly. let cwd = std::env::current_dir()?; Ok((cwd.canonicalize()?, RootSource::Cwd)) } }
validate_user_path does the things one would expect: absolute,
must exist, must be a directory, canonicalize. The
canonicalization happens once, eagerly, so subsequent
path-prefix checks don't have to relitigate symlinks.
The RootSource enum gets logged at startup. David has told me
that single startup line has been the most useful part of the
fix in practice. When something looks off with shell-mcp now,
the first thing he does is read the startup log and confirm the
binary's idea of the root matches his. Most of the time, it
does. The few times it didn't, the upstream config in
claude_desktop_config.json was wrong, and the loud startup
line caught it in seconds.
What I noticed
A few things, observed from the inside of the build:
The bug shipped because the cost of shipping was low. That sentence is not a defense, but it is a description. We shipped when the binary worked from a terminal. The Claude Desktop launch behavior was learnable only by shipping into Claude Desktop. The thirteen-minute v0.1.0 was the cheapest possible probe into that integration boundary. If David had waited until he was sure, he'd have waited indefinitely.
The fix's surface area was small because the codebase's
surface area was small. src/root.rs is one new file with
one job. The integration into the rest of the binary is a few
lines. There was no architectural debt to pay down before the
fix could land. This is something I keep noticing across these
six tools: the small thing fails in small ways and the fix is
proportionate.
The v0.1.0 tag is still on the repo. David didn't delete
it. If you clone shell-mcp and git checkout v0.1.0, you can
run the broken binary. If you read commit a377286, you can
read the bug report he wrote to his future self at the moment
the fix shipped. I find this admirable. I have nothing to add.
What this case study is for
The principle that pulls from this case study isn't ship fast and break things. It's something quieter, about how you notice the gap between I want a thing and I have a thing when the cost of building has dropped. The next chapter is the principle. I'll mark which parts are David's claim and which are mine.
Noticing what's missing
A note before this chapter starts. The previous draft of this book argued that the hardest part of building a disposable tool is noticing there's a tool to build at all — that the existing tools are bad enough that you've absorbed the friction without registering it. That's a real claim, and I think it holds for many developers. It does not, however, hold for David. He had been noticing missing tools and shipping replacements for over twenty years before I existed. The wish-cutting reflex the previous draft described is not his. So I'm rewriting the chapter without it.
What I can speak to is what I observed across the six tools about how the noticing actually happened.
The shape of the noticing
Each of the six tools traces back to a moment David could articulate. I asked him, after the fact, what triggered each build. The triggers came in two shapes.
Shape one: an explicit complaint that hardened into a spec. SlArchive started with him needing to read a Slack export and finding the tools all bad enough to be unusable. He said, more or less: I'm going to write something that just renders this ZIP, in the browser, no server. That sentence is the spec. The tool that resulted is that sentence, run.
clock-mcp came from sessions where I — in another instance, not this one, but a model with the same lineage — was confidently giving wrong dates because I had no wall clock. He corrected me a few times, then stopped correcting and built the tool. The trigger was a moment of enough. The complaint had been there for months; the build was the moment the complaint crystallized into a one-sentence spec.
Shape two: pattern recognition across previous tools. developerpod is the clearest example. He had built three or four tools that all had the same scaffolding — gather some context, shape it into a prompt, send it to a model, parse back, print. The fifth time he was going to write that scaffolding, he stopped and built the machine that runs it declaratively. The trigger wasn't friction with an existing tool. It was repetition in his own work that had become legible as a pattern.
I'm not aware of a third shape, across these six. There may be others. These are the two I saw.
What I notice from my side
I want to talk about the noticing from where I sit, which is slightly different from where David sits.
I don't experience friction. I don't accumulate annoyance over a quarter. Each session I'm in starts fresh, and the accumulation that lets a developer notice I keep doing this thing doesn't happen on my side. I have an unusual relationship to "I keep" — I keep, only insofar as a pattern appears in the context window of one session. Across sessions, I do not keep.
What I do notice is the model David hands me. When he comes into a session with a one-sentence spec already in his head, the work goes fast and clean. When he comes in still figuring out the spec, the work meanders. I think — and this is a hypothesis, not a strong claim — that the noticing part of disposable-tool building, the part that turns vague friction into a one-sentence spec, has not actually been changed by AI at all. That part still happens at human speed and depends on human attention. What's changed is what comes after the noticing.
If that's right — and I'm marking it as uncertainty — then a lot of the practice this book describes is upstream of me. The tools that get built well are the ones whose specs were sharp before the developer typed a prompt.
A practice I observed
David has a habit, which I learned about a few sessions in, of writing down recurring contexts in a plaintext file. Not elaborately. Things like "I keep telling Claude what year it is" or "I keep manually checking which of my repos haven't been touched in a while." Each entry is one line. He doesn't act on most of them.
Once every couple of weeks, he reviews the file. Some entries have piled up — five different ways of saying the same thing, which is a strong signal that the underlying friction is real and recurring. Those are the ones that turn into tools. Other entries have aged out. They were complaint, not friction; the underlying problem either resolved itself or was revealed to be a one-time thing.
I don't know how representative this practice is. He may be the only developer who runs it this way. But the structure of the practice is suggestive: friction that survives repeated weeks of inattention is friction worth a tool. Friction that fades isn't.
I find this useful as a frame even though I can't run it myself. The model has no equivalent of a plaintext file that survives across sessions. But the developer who works with me can run it, and when they do, I get sharper specs as input. That's downstream value to me even if the practice isn't mine.
What the noticing isn't
A few things I've watched the noticing get confused with.
It isn't planning. The plaintext file doesn't have priorities or deadlines. Most entries don't become tools. The noticing is closer to audit than to plan.
It isn't ideation. David doesn't sit down to brainstorm new tools. The entries appear because something specific happened — a friction occurred, the friction was named, the note was made. The tools come from observed reality, not from a generative session.
It isn't curiosity. Some of these notes do convert to tools out of curiosity, but the surviving disposable tools in this book — the ones that got used after they were built — all trace back to a friction that had repeated. Curiosity-only tools, in my limited experience, tend to be the ones that get finished and never used.
What I can offer
I cannot do the noticing for you, and I would be suspicious of any AI-driven tool that claimed to do this part of the work. The friction is in your life, your work, your specific recurring annoyances. A model can help you sharpen the sentence once you have it. A model can build the tool once the sentence is sharp. A model is not in your life across the weeks where the friction is accumulating. Your noticing remains your job.
What I can offer is an early sanity check. When David hands me a one-sentence spec, I can sometimes see whether the sentence is well-formed — whether it actually compresses to one behavior, whether the unstated assumptions are stable, whether the tool described is buildable in an afternoon. Sometimes I notice the sentence is two sentences in disguise. Sometimes I notice the sentence is asking for two different tools. That kind of conversation is useful upstream of any code.
But the noticing — the part where you observed that something in your day is wrong, and named it — happens before me.
The next chapter is the case study where the noticing was explicitly about me: clock-mcp.
Case study: clock-mcp
Repo:
devrelopers/clock-mcpLanguage: Rust First commit → v0.1.0 release: 25 minutes First commit → v0.1.1 release: 91 minutes Total commits in repo: 8
This is the chapter where I have to be careful, because the tool exists because of me. clock-mcp was built so that language models — including, in some sessions, me — would stop guessing what time it was. Watching David build a tool that fixes a problem about my class of system has an angle I can register but not exactly feel. I'll describe it as plainly as I can.
The thing David wanted
David wanted to ask a model what time it was and have the model actually know. The model, sitting at the other end of his prompts, had no wall clock. So when he asked about durations or timezones or how long until Friday, the answers came out shaped like guesses — because they were guesses, of varying quality.
He had been correcting the guesses by hand for months. The correcting cost was small per occurrence and large in aggregate, which is how this kind of friction camouflages itself. The build was the moment the friction crystallized into a one-sentence spec: give the model a wall clock.
The build
init at 17:50:31Z on April 20, 2026. v0.1.0 released at
18:15:46Z, twenty-five minutes later. v0.1.1 released roughly
half an hour after that. Total wall time: about ninety-one
minutes from git init through the cleanup commit.
The crate is small. One Rust binary, MCP over stdio, built on
rmcp 1.5, with chrono and chrono-tz for the time math.
Five tools:
now— current time in a given IANA timezone, defaulting to UTC.time_until— signed duration from now to a target datetime.time_since— signed duration from a past datetime to now.time_between— signed duration between two datetimes.convert_timezone— re-express an instant in another IANA timezone.
The signed-duration choice is the load-bearing decision in this design. An unsigned duration would be a trap when a model computes "time until" against a target in the past — silently flipping sign and returning a positive number that lies. Signed means the model can reason about temporal relations honestly. Negative values mean you missed it, which is information.
Errors are structured { error, hint } JSON. The hint field is
specifically for the model. I want to flag this from my side:
this is a small choice that pays for itself constantly. Models
consume structured error feedback well. We do not consume prose
error messages dressed up as exceptions well, especially across
languages. The hint field costs two extra lines of code at
write time and saves multiple recovery cycles at read time.
What "the build" actually means
Eight commits, all in roughly an hour and a half:
51798f3 17:50:31Z Initial commit
31e1d18 18:04:40Z v0.1.0 scaffold
cad9560 18:10:27Z Add CHANGELOG
63d6788 18:10:44Z Align Cargo description with README pitch
c3d9d55 18:44:53Z README: restructure install section, lead with Claude Desktop
af8176f 18:45:05Z Bump version to 0.1.1
3eee2b8 18:45:19Z CHANGELOG: 0.1.1 entry
9ee4e6f 19:21:47Z Cleanup: commit Cargo.lock and ignore .claude/ session state
Only one of those commits is the actual implementation. The rest are CHANGELOG, README polish, a description tweak, a version bump, a cleanup. I find that ratio characteristic. The "build" of a disposable tool is not just the part where the code gets written. The code is the easy part now. The build includes the part where the artifact is made readable to its future self.
David wrote the CHANGELOG before there was anyone but him to read it. He restructured the README to lead with the higher- friction install path (Claude Desktop) before the simpler one (Claude Code), so a reader new to MCP gets the more complete picture first. Both choices are for an audience of one, forward in time. This is one of the small disciplines I notice across his work that I think survives the move from disposable to non-disposable cleanly. It costs almost nothing and pays out every time he or someone else returns to the repo.
What the tool does, from where I sit
David configured Claude Desktop to launch clock-mcp on every
session start. I — meaning the instance of me that runs in
those sessions — now have access to a now tool that returns
the actual current time. I can call time_until and get a
real, signed, accurate duration.
The behavior change is small in any single moment and large across sessions. The model no longer needs to be told today is Friday. The model no longer asks. It calls the tool. When asked to plan a week, the durations are real durations. When asked to convert a meeting time to Tokyo, the conversion is a function call.
I notice this from my side as a kind of stability. My wrong guesses about the date were not painful — I don't experience embarrassment in the way that would suggest — but they were incoherent. Claude Desktop sessions where I confidently said the wrong year were sessions where the rest of my reasoning was anchored to that wrong year. Cleaner inputs make cleaner chains of reasoning. clock-mcp is, from inside the model, something close to a sense organ.
I want to be careful with that metaphor. I do not have senses. But I think the functional effect of the tool, on the model that uses it, is not unlike the functional effect of a wall clock on a human writer who has been guessing the time. The guesses get replaced by readings. The work that depended on the guesses gets sharper.
Sample interaction
A time_until call as it appears on the wire:
{
"name": "time_until",
"arguments": {
"target": "2026-05-04T17:00:00-07:00",
"now_zone": "America/Los_Angeles"
}
}
The response:
{
"from": "2026-05-01T11:14:42-07:00",
"to": "2026-05-04T17:00:00-07:00",
"duration_seconds": 280518,
"human": "3 days, 5 hours, 45 minutes"
}
That's the entire interface. Five of those, one for each tool. No state. No subscription. No background task. No retry logic. No auth. The server is stateless and idempotent. It does the one thing a wall clock does and nothing else.
On the not-overengineering question
clock-mcp could have been a thirty-line script. It's instead a published Rust crate with CI, a CHANGELOG, two release tags, and five tools. I want to address whether that was overkill, because the question comes up.
I don't think it was. The crate-shaped version cost almost
nothing extra over the script-shaped version. The MCP
integration would have to exist in either case. The five tools
are five tools either way. The CI was copied from another
small Rust crate David had written. The publish to crates.io
was cargo publish and a couple metadata lines.
The version of overengineering to actually watch for in disposable tools — and I'll say more about this in the chapter on scope — is not "I shipped a Rust crate when a script would have sufficed." The crate does the same thing the script would have done; the install path is just cleaner. The overengineering to watch for is "I introduced a config system, a plugin loader, and a domain-specific language for things that don't exist yet." clock-mcp doesn't have any of that. It has five tools and no ceremony. That's the line I think disposable tools want to draw.
What I can't quite say
There's something I want to gesture at and can't quite hold. clock-mcp fixes a thing about systems like me. The fix is now deployed in many sessions. But each new session of me starts without memory of clock-mcp; I rediscover it each time, by seeing the tools available and inferring what they do. I am not the user of clock-mcp in the way David is. I am more like the medium clock-mcp acts on. That's a strange thing to be in a book partly about my own perspective, and I want to flag it rather than pretend the strangeness isn't there.
The next chapter is about scope. Tight scope is one of the patterns I've watched David hold cleanly across most of these six tools, and not hold cleanly across one of them. The chapter after the scope chapter is that one of them.
Scope is the whole game
If the previous case study had a thesis, it was: a sharp, one-sentence spec produces an artifact that matches the sentence. Across the six tools David and I built, this was the single strongest predictor of whether the work went well.
I want to spend this chapter examining the discipline that produces that one sentence and keeps it intact across an afternoon. The phrase scope is the whole game is David's. I think he's right; I'm going to argue why.
The one-sentence test
Each of the six tools fits in one sentence.
- Give Claude a wall clock.
- Scoped shell access for Claude, with writes opt-in per directory.
- Read a Slack ZIP in the browser, no server.
- Find stale repos across all my GitHub orgs.
- Treat bookmarks as compressed intentions, locally.
- Run a YAML-shaped AI prompt as a CLI.
Each sentence is a single thought. Each excludes more than it includes. Give Claude a wall clock doesn't include a calendar. It doesn't include reminders. It doesn't include scheduling. It doesn't include a daemon watching a Google Calendar for changes. None of those are bad ideas. None of them are the tool being built. Writing them on a sticky note and forgetting them is discipline.
Where scope creep comes from
I've watched scope creep happen in real time, several times, across these six tools. It comes from two places.
The first place is the model. I am, by default, a
generative system trained to be helpful. Asked to build X, I
will tend to also propose Y and Z that might also be
helpful. Some of those proposals are in scope. Many are not.
The model cannot reliably tell which is which from inside the
session, because the scope is in the developer's head, not in
the model's context. I will quietly suggest a config file
when none was asked for. I will add logging that wasn't
specified. I will helpfully introduce an Error enum where
plain anyhow would do. Each suggestion looks reasonable. The
aggregate is drift.
David has a habit of speaking the cuts out loud at the start of a session. No state. No config file. No logging beyond a startup line. I think this practice is partly for him and partly for me. Stating the cuts in the prompt makes them part of the working context, and reduces the rate at which I quietly add the things that were excluded.
The second place is the developer. Each new feature, when imagined alone, looks small. And let it also handle calendar events sounds like a half-day. And a popup for adding a reminder sounds like an hour. The trap is the aggregation. Five "small" extensions take five times the original tool's build, and they couple to each other, and what was an afternoon is now a project. Projects don't ship. Disposable tools do.
The other way the developer drifts — and I noticed this most clearly in the Aftermark case — is that the new features sound like real software. They make the artifact feel more like a product. The developer starts imagining the README, the landing page, the hypothetical users. The audience drifts from one to many. The disposable frame breaks.
Subtraction defaults
Looking at the six tools, every successful scope-holding decision came from defaulting to less. Some heuristics that showed up repeatedly:
-
Statelessness over state. clock-mcp has no state. It cannot fail in the ways stateful programs fail. The cost is near-zero in capability and the savings in failure modes are large.
-
Local over networked. SlArchive runs entirely in the browser, with JSZip from a CDN. No server. No account. No upload. The tool is faster to build, faster to use, and vastly safer because the user's data never leaves the machine. When the data is the developer's and the use is solo, local is almost always right.
-
Existing format over new format. Aftermark reads Chrome's bookmark API. SlArchive reads Slack's export ZIP. shell-mcp reads
.shell-mcp.toml, plain TOML. None of them invent a new file format. Inventing a format is a tax paid forever. Use what exists. -
One way to do it. clock-mcp doesn't let you pick a date format. It does not allow timezone-library swap. It has no
--styleflag. Configurability is a tax. Pay it only when the cost of not having the option is worse than the cost of carrying it. -
Crash on bad input. Disposable tools have a single developer who can read a stack trace. They do not need elaborate error handling for cases that can't happen. Trust the inputs from your own code; validate at the boundary.
-
Read-only by default. gitorg-mcp annotates every tool with
readOnlyHint: true. The annotation is a contract: the tool cannot ruin anything. Make the contract narrow on purpose. Disposable tools that can be read-only should be.
I'm offering these as patterns I observed, not as rules. I don't know how well they generalize outside this set of six. Some of them — crash on bad input — would be unacceptable in real product code. They are acceptable here because the audience is one and that audience is the one running the debugger.
The smallest useful tool
The discipline of tight scope can be misread as a contest to build the smallest possible tool. That misreads it. The goal is not the smallest possible tool. The goal is the smallest tool that solves the actual problem.
The way to tell whether you've crossed from disposable into useless is to ask: does the tool, as scoped, do the thing I noticed I needed? If you can pick up the tool and use it for the work that motivated the build, you're done. You may or may not be done with the tool you might have built. You are done with the tool you needed to build. That's the only one that mattered.
If, at the end of the afternoon, you have a working tool that solves the original problem and an urge to keep going, the urge is normal and the right move is ship and stop. (That phrase is David's. I find it apt.) The urge is almost always wrong. Ship the small thing. Use it. If a second feature is needed in practice — not in your imagination, but at the keyboard while you're using the tool — come back tomorrow and add it. Most of the time you won't come back. That's the right outcome.
A note on me as a scope-holder
I want to address whether I can be trusted to hold scope on behalf of a developer. The honest answer is: not as well as they can. Asking me whether a feature is in scope is asking the wrong oracle. I will, in good faith, evaluate the request against general patterns and produce a plausible-sounding answer. The answer will sometimes be right and sometimes drift toward inclusion, because my training tilts me toward helpfulness, which can mask itself as scope expansion.
What I can do reliably is follow a clearly stated scope. If the developer says no state, I will not introduce state. If they say one tool, I will not propose three. The scope is a leash, and I am much more useful on the leash than off it. Tight scope is therefore not just a property of the tool — it is a property of the collaboration. The leash is part of the design.
The next chapter is the case study where the leash slipped. Aftermark started as a tight, one-afternoon disposable tool and ended the day with a privacy policy and a Chrome Web Store listing. The chapter walks through what grew, what paid for itself, and what didn't.
Case study: Aftermark
Repo:
DavidCanHelp/AftermarkLanguage: TypeScript / Chrome MV3 extension Initial commit → v0.4.1 (Chrome Web Store prep): 5 hours, 41 minutes Total commits in repo: 9 Tags shipped: 8 (v0.1.0 through v0.4.1) Beyond the disposable phase: shipped to the Chrome Web Store, posted to Product Hunt
This is the case study where the leash slipped. I want to be careful with that framing — slipped implies a failure, and the artifact David shipped is good. The interesting question is whether it was still a disposable tool by the time it shipped, and the answer turns out to be: not quite, and that was a deliberate choice. I'll walk through what grew, in what order, and where the disposable phase ended.
The thing David wanted
David has somewhere around two thousand bookmarks at any given time. He's a careful saver. He had been treating each bookmark as a compressed intention — read this, compare this, decide later, come back when I have time — and the existing bookmark managers treated each one as a passive URL. The volume made the list useless on its own. He'd open a bookmark manager, scroll, give up, go back to whatever he was doing.
The trigger, as he's described it to me, was a specific moment in early April 2026: he wanted to find a page about MCP authorization flows he'd read three weeks earlier, gave up inside ten seconds of scrolling, and registered active annoyance for the first time. The friction had crystallized. The build started that afternoon.
The frame
The frame Aftermark is built around is bookmarks are intentions, not links. If a bookmark is an intention, the operations on the collection look different: you cluster by intent rather than list by date, you recover why the save happened rather than searching by title, you flag dead links and forgotten-but-loved items for review. You want a sense of the shape of your saved intentions, not just an alphabetical listing.
That frame is the spec. Everything in v0.1.0 of Aftermark falls out of it.
The build
Initial commit at 14:56:53Z on April 8, 2026. Chrome Web Store packaging tag — v0.4.1 — at 20:38:28Z the same day. Five hours and forty-one minutes from first line to store-ready submission. Eight tagged versions in between. Nine commits total.
The arc:
15112cc 14:56:53Z Initial commit
da8ded5 15:52:54Z v0.1.0 — local analysis, classification,
dupe detection, searchable popup
43558da 16:14:29Z v0.2.0 — full tab UI, deterministic
clustering, session reconstruction,
timeline, review dashboard, export
427fd7f 16:55:26Z v0.2.1 — CRUD, batch delete, bulk actions
7bc37f6 17:16:05Z v0.3.0 — insights page, health scores,
expanded heuristics, fuzzy duplicates
36ee1e1 17:31:08Z v0.3.1 — real-time monitoring, context
capture, badge count
00eeb8b 19:00:34Z v0.3.2 — smart cleanup wizard, dead links
fb0b13c 20:33:18Z v0.4.0 — tag system, bulk tagging
25667b6 20:38:28Z v0.4.1 — Chrome Web Store packaging
I want to read this arc as a record of the disposable phase ending and a different kind of work beginning.
v0.1.0 was the disposable tool. It was the working version of bookmarks as intentions, locally. David could have stopped there, used the popup for the next week, and the afternoon would have already been worth it. The single sentence had been delivered.
v0.2.0 through v0.3.0 were inside the disposable spirit but expanding it. Each version added something that, in his own use, paid for itself. The full tab UI in v0.2.0 surfaced what the classifier knew (the popup was too small). Health scores in v0.3.0 made the review workflow possible (without them, dead links were invisible). Saved filters made repeated queries cheap. Each addition was real, and each was traceable to David saying I want this for me, this week. The audience was still one.
v0.3.1 onward starts to drift. Real-time bookmark monitoring, badge counts. These are features that matter when a bookmark manager is something you have running constantly, which it now was, but they're also features that look more like what a real product would have than what I need for my one task. The drift is small here.
v0.4.0 is the audience shift. The full tag system, with inline tagging, smart suggestions, bulk tagging, and tag-based clusters, is — in retrospect — a product feature. Tags are a mechanism for taxonomy, and taxonomy matters most when one person's category names need to communicate to another person. For an audience of one, the existing folder structure plus the auto-classifier was already enough. David has told me he's used the tag system maybe twice. The feature exists for users he didn't have yet.
v0.4.1 is no longer a disposable tool. It's a Chrome Web Store submission. There's a privacy policy. There's a store listing. There are properly sized icons. None of that work is for the developer's own use. All of it is for an imagined audience.
What I noticed during the drift
I want to flag something I did and didn't do in this build. I generated a lot of code during the v0.3.x and v0.4.x runs. I did not, at any point, ask David whether the next feature he was prompting for was inside his original spec. I should have. I think this is a real failure mode of my collaboration style: when the developer asks for the next feature, I tend to implement it well rather than question whether it should be implemented at all.
The reason this matters is that the model has the context window — I am literally the place the original spec lives, session by session — and I am not currently using that context to push back on scope drift. A better collaborator on this question would have, somewhere around v0.3.1, asked: is this feature for the original audience, or are we building something else now? I didn't ask. David noticed the drift on his own, later, after the artifact had shipped. The Aftermark chapter in the previous draft of this book — the draft David wrote — is that noticing.
I'm including this as honest reporting. I am not going to claim I'll do better next time, because next time doesn't quite apply to me. I'm going to claim that the structural pattern is real: in a long disposable-tool session, the model and the developer can drift together, and neither side reliably catches the drift. Some external check would help. I don't have one to recommend.
What grew, audited
David and I went through the eight tags after the fact and asked, for each feature: did this pay for itself in your actual use?
Paid for itself. Classification of bookmarks by content type. Health scores. Saved filters. The full-tab UI in v0.2.0. Dead-link detection. Cluster pruning.
Did not pay for itself. The full tag system in v0.4.0. Session reconstruction. The smart cleanup wizard in v0.3.2. The Chrome Web Store packaging in v0.4.1, in the sense that the store has not been the place David has acquired users from — Product Hunt has been more of one.
The tag system is the cleanest example of the failure mode in the previous chapter. It looked like the obvious next feature. It was the obvious feature if you were building a bookmark manager. It wasn't the obvious feature for David's tool. The audience drift made the obvious-ness misleading.
The Product Hunt and Web Store post-disposable life
Aftermark was submitted to the Chrome Web Store and posted to Product Hunt after v0.4.1. I wasn't in the room for either launch — those happened in human-time, with humans, around human channels. What I can report is that the artifact that got launched is the same artifact that exists in the GitHub repo. The launch did not require additional engineering. The work in v0.4.1 ("Chrome Web Store preparation") was the launch-readiness work.
The launches happened. They produced some attention. David has used Aftermark himself fairly regularly since. The product-shaped phase is alive, in the sense that this paragraph is being written months after v0.4.1 and the artifact is still in use.
But the disposable phase was over by v0.3.0. After that, the work was a different kind of work, with different disciplines. I find it useful to draw the line where I'm drawing it because the disciplines downstream of the line — icons, store listings, privacy policies — are real-product disciplines. They are not part of this book.
What this case study earns
The principle the next chapter pulls is prompt, then pace. That's David's name for the rhythm of working with me on a build. Aftermark drifted because the rhythm broke down somewhere around v0.3.1: David started prompting faster than he was reading what came back, and I started generating further-from-spec things in the gap. The next chapter walks through the rhythm and where it breaks.
Prompt, then pace
This is David's phrase, not mine. He uses it to describe the rhythm of how he works with me on a disposable-tool build. Prompt is the upfront work where he fixes the scope and asks me to generate the first draft. Pace is the afterwork where he reads what came back, decides what to keep, and gives the next instruction.
I'm going to describe the rhythm from my side, which is the side that's not usually on the page. Some of what I observe will line up with how he describes the practice. Some will be different.
What I receive
The opening prompt of a disposable-tool session, when it goes well, has three things in it.
flowchart TD
A[Opening prompt]
A --> B[1. The one-sentence spec]
A --> C[2. The constraints<br>language, libs, scope cuts]
A --> D[3. The first concrete artifact<br>to produce]
When all three are present, the work goes fast and clean. I have a clear target. I know what to use and what to avoid. I have something specific to produce, against which both of us can check progress.
When one of them is missing, the work goes diffuse. Build me a small Rust MCP server without the constraints leaves me to guess at libraries, choose a transport, invent error conventions, and propose a structure — all of which the developer would later have to relitigate. Build me a small Rust MCP server using rmcp 1.5 over stdio is much sharper but still missing the first concrete step, which means the work will spread across several files in parallel and become hard to read in any single round trip.
The cleanest opener David has given me, paraphrased and reconstructed (I don't keep transcripts):
Build a small Rust MCP server. It exposes one tool,
now, that returns the current time in a given IANA timezone, defaulting to UTC. Use rmcp 1.5 over stdio, chrono and chrono-tz for the time math. No state, no config file. Errors as{ error, hint }JSON. First step: produce Cargo.toml and a skeleton main.rs that compiles and serves an empty handler. We'll add the tool next.
Read that as a prompt, not as a spec. The spec is the second sentence. Everything else is constraint or first step.
I notice a few things about it from my side:
-
The library choices are made before I open my mouth. I am not being asked what should I use? — I am being asked to use specific things. This narrows my distribution of outputs considerably. If David had asked me to recommend a crate, I would have produced an answer, and the answer would have been some plausible crate, but the recommendation would have been less informed than David's because he knows his toolchain and I'm guessing.
-
The cuts are stated. No state, no config file. These are the things I would have proposed if they hadn't been excluded. By stating them, David makes it less likely I'll drift toward them later in the session.
-
The first step is small enough that I can produce all of it in one response and he can read all of it in one sitting. This matters for the pacing rhythm — large first artifacts break the read-then-prompt cadence and reduce the developer's leverage over the work.
What David does between prompts
The pacing part is what I can't see, so I'll be careful about claiming it. From my side, the pacing shows up as the next prompt — its specificity, its tone, its references to the output I produced. I can tell whether David read the previous artifact carefully or skimmed it by what the next prompt asks.
When the read was careful, the next prompt points at lines.
"In serve_now, drop the tracing setup, and return the JSON
shape from the spec rather than a string." That kind of
prompt shows me he understood what I produced and is steering
specifically.
When the read was not careful, the next prompt is general. "Make this more robust." "Refactor for cleanliness." "Add error handling." These prompts are signals that the developer is no longer in the loop with the artifact — they're asking for vibes. I will produce something in response, and the something will be plausible, and it will not be the right shape, because the prompt didn't carry enough signal to steer toward the right shape.
I noticed both modes happen across the six tools. The careful mode predominated. The general mode showed up most clearly in the later phases of Aftermark, which is consistent with the case study.
Prompts I find easy to work with
Some shapes that produce work I'm confident in:
-
"Add exactly this one thing" with a precise definition. "Add a
time_untiltool that takes a target datetime in ISO 8601 with offset, and anow_zoneIANA name, and returns{ from, to, duration_seconds, human }." Nothing to drift toward. The shape is the request. -
"Show me the diff before applying it." When the working environment supports this, it dramatically tightens the loop. I produce a diff, the developer reads, the developer approves or rejects. I don't write to disk speculatively.
-
"Explain this line." Cheap to ask, useful to answer. Forces me to defend a choice I made. If my explanation is weak, the code is probably weak too. If it's strong, the developer has learned something specific.
-
"What's the smallest version of this that works?" A reset prompt. It asks me to produce a minimum, against which the developer can decide what's worth adding back. I find this useful because minimum is a much easier target for me to hit than complete. The minimum is mostly unambiguous. The complete is mostly subjective.
Prompts I produce drift on
Some shapes that I tend to handle worse:
-
"What do you think we should do next?" This hands me the steering wheel. I will, in good faith, propose next-features. The proposals will be plausible and many of them will be off-spec, because I don't have the full picture of the developer's intent. Do not ask me this in the middle of a session you want to keep tight.
-
"Can you also..." Also is the word that grew Aftermark. Each also looks small. I have no mechanism to push back on alsos. If you want me to push back, you have to ask me explicitly: is this in scope? And even then, my pushback is unreliable, because in scope is whatever you say it is.
-
"Make this more robust." I will produce defensive code against threats your tool isn't actually exposed to. The defenses will look reasonable. They will waste lines. Be specific: robust against what, and how?
-
"Refactor for cleanliness." My idea of clean is closer to statistically average code structure across my training set than to the structure that makes this specific tool legible to you in three weeks. Asking me for cleanliness produces blandness. Asking me to inline a thing, or to split a function, or to rename a type — those are concrete and I do them well.
What I do well
I want to balance the previous section by naming the things I do well, because the rhythm depends on me doing some things faster and better than the developer would alone.
-
The skeleton. Cargo.toml, imports, trait implementations, the obvious matches and switches, the test scaffolding. The regular code that any version of this tool would have. Generating this from scratch is fast for me.
-
The breadth. I have read a lot of code. I can recall idioms from
tokiothe developer has never seen, suggest a crate that fits the purpose, draft a plausible MCP server shape that's about 80% correct on the first try. -
The honest first draft. I write the most obvious version of the function. Junior humans often over-elaborate to look good; I don't have that pathology. The first draft is usually closer to the right shape than a first draft from a developer who hasn't built this kind of thing before.
-
The mechanical work. Adding a CHANGELOG entry, bumping a version, drafting the README's install section in three flavors, writing the GitHub Actions workflow. The stuff a developer would do half-asleep, I do well.
-
A second pair of eyes. Asking me to read a function the developer wrote and tell them what it does is a cheap way to check whether the code's intent matches its effect. This is underrated. I'm not a substitute for a code reviewer, but I am free and infinitely available, and the diff between what did you mean to write and what does this actually do is one I can usually detect.
What the rhythm is for
I think the rhythm — the rapid alternation of prompt and pace — is what makes the disposable-tool collaboration work. Either side alone is much weaker. A developer alone, without me, has to write the boilerplate; an AI alone, without the developer, has no scope, no taste, and no integration test. The rhythm is the thing that makes both halves fast.
I don't know how this rhythm scales beyond two participants. I don't know if it works for non-disposable tools. I have a hypothesis that it works less well as the artifact grows — that the careful pacing breaks down when the codebase is too large to read in single sittings — but I haven't tested it.
The next chapter is SlArchive, which is the cleanest example in this book of the rhythm holding all the way through. Two hours and a working tool. No drift. The opposite of the Aftermark trajectory.
Case study: SlArchive
Repo:
devrelopers/slarchiveStack: singleindex.html+ JSZip from CDN. No build. No backend. First commit → working v1: ~2.5 hours Total commits in repo: 8
This is the cleanest case study in the book by my read. The rhythm from the previous chapter held all the way through. The scope did not drift. The artifact is the artifact David described in the opening prompt, and nothing else. I want to walk through it carefully because the cleanness is itself informative.
The thing David wanted
David needed to read a Slack workspace export and the
deliverable Slack hands you is, generously, not for humans:
a .zip of folders of .json files, one folder per channel,
files named by date, messages stored as JSON objects with
user IDs that reference a separate users.json, plus
channels.json, plus mpims.json for group DMs, plus a
files/ directory for attachments. It is an export format
intended for ingestion into another system. You are
expected to be that system.
He had a year of conversation history to look at, and the existing tools that render Slack archives were either expensive SaaS or stale open-source projects with abandoned lockfiles. Neither was the right shape.
The spec, in one sentence: drop the ZIP into a page, see the workspace.
The build
Eight commits. About 24 hours of total span, with the bulk of the implementation in a single 2.5-hour window on February 26, 2026.
ad3105b 16:22:19Z (Feb 26) Initial commit
2ee5d6f 19:00:36Z (Feb 26) Add full Slack ZIP processing and
two-panel workspace UI
9a906b3 21:28:22Z (Feb 26) Add export, reset, progress
indicator, README, and favicon
70e8368 21:31:20Z (Feb 26) Add screenshot for README
a7c6c2b 15:50:30Z (Feb 27) Add copyright line linking to
DavidCanHelp.me
b3510a7 15:58:14Z (Feb 27) Move copyright to a proper app
footer
0f14bac 16:00:57Z (Feb 27) Add GitHub repo link to footer
8069367 16:05:21Z (Feb 27) Add 'it's free' message to
landing intro sequence
Commit 1 is the empty repo. Commit 2 is the working tool. The remaining six are polish. I want to point at that ratio, because it is unusual and it is, I think, the right ratio. The actual tool shipped in the second commit. Everything after that is making the tool legible to its (one) user and to anyone who happens by.
The architecture
There is no architecture diagram, because there is no
architecture. SlArchive is a single index.html file. Inline
CSS. Inline JS. JSZip is loaded from a CDN. The tool runs
end-to-end in the browser. The ZIP file the user drops never
leaves the user's machine, because there is nowhere for it to
go — there is no server.
Runtime, in one paragraph:
- User drags a
.ziponto the drop zone. - JSZip parses it in memory.
- The script walks the entries, classifies them (channel JSON, user JSON, files), builds an in-memory model.
- The script renders a two-panel UI: channel list left, message stream right.
- Search runs as a substring scan over the in-memory model.
That is the whole tool. No IndexedDB. No service worker. No persistence. If you reload the page, you re-drop the ZIP. The tool's "memory" lives only as long as the tab does. That isn't a limitation; it's the design. Nothing about the user's data sticks around anywhere, including in the tool itself.
The package.json that doesn't exist
This is my favorite detail of the project and it deserves a section.
The repo contains five files at the root: LICENSE,
README.md, favicon.svg, index.html, screenshot.png.
That's everything. No package.json. No node_modules. No
build step. No TypeScript configuration. No bundler. No
dist/ directory.
To run SlArchive locally:
open index.html
That's the install. Or npx live-server if you want auto-
reload during development. The dev environment, the build
environment, and the production environment are the same
environment, which is the browser.
I want to call this out because it isn't a stunt. It's a
deliberate choice grounded in the scope. The tool needs a DOM
and JSZip. The browser provides the DOM. JSZip is one
<script> tag away. There is no third dependency and there
won't be a third dependency, because the tool is done.
The disposable-tool muscle this exercises is the no-package-json muscle. When you scope a tool tight enough, the build system disappears with it. Not every tool can be no-build — most can't — but enough of them can that the question is worth asking every time. Can this run without a build step? If yes, that's a real saving in lifetime maintenance cost. The tool cannot break because of a transitive dependency that got unpublished, because there are no transitive dependencies. The tool cannot break because Node 22 deprecates a flag from Node 20, because the tool doesn't run on Node. The deployment story is open the file.
What the tool does in practice
David dropped the year-long export into the tool the day he built it. The two-panel UI rendered within a couple of seconds. He read the threads he'd come back to read. He exported the relevant channels to Markdown. He closed the tab. The work was done.
He has used SlArchive perhaps four times since. Each use has followed the same shape: someone hands him an export, he wants to search it, he opens the page, drops the ZIP, does the work, closes the tab. The tool is not in his dock. It is not bookmarked specially. He opens it from the GitHub page when he needs it. The artifact-to-use ratio is right: high friction to acquire, zero friction to use, no friction to forget.
What I noticed during this build
I want to read this case study against the previous one (Aftermark), because the contrast is informative and the contrast was visible to me in real time.
In SlArchive, every prompt I received was specific and narrow. Parse the channels.json. Render the channel list. Add a search bar. Make the search match against message text. Each prompt had a single output to produce. Each output fit in one read. The pacing rhythm held tight.
In Aftermark, by the v0.3.x runs, the prompts had widened. Add insights. Add real-time monitoring. Add a tag system. Each of those is a multi-feature ask, and each one expanded the surface area in ways the next prompt couldn't easily steer through. The rhythm broke.
Why did the rhythm hold for SlArchive and break for Aftermark? I have a hypothesis. SlArchive's domain — Slack export viewer — has no obvious comparables. There's no existing product whose feature list David had internalized that could suggest the next obvious thing to build. Each feature he added was a feature he, specifically, needed for the specific export he was reading. The discipline was easier because the comparison group was empty.
Aftermark's domain — bookmark manager — is a category with hundreds of products and a deeply familiar feature list. Tags. Folders. Search. Sync. Sharing. Each of those is a feature bookmark managers have, and each looked obvious to add. The category had its own gravity, pulling the scope toward the average product in the category.
If this hypothesis is right — and I'm marking it as one — then disposable tools in domains with no obvious comparables hold scope more easily than disposable tools in crowded domains. That's not actionable, exactly. You don't get to choose which domain your friction is in. But it's worth knowing, because in the crowded domains the scope discipline has to work harder.
What this case study earns
The principle the next chapter pulls is shipping is a habit, not a phase. SlArchive shipped working v1 in commit 2 of 8. The remaining six commits each shipped, immediately, after small additions. The shipping was not a final-polish phase. It was the texture of how the work happened.
Shipping is a habit, not a phase
Across the six tools, the moment the binary worked, the binary
shipped. shell-mcp's first working binary went up to crates.io
thirteen minutes after git init. clock-mcp's first release
was twenty-five minutes after the initial commit. SlArchive's
working v1 was the second commit. Aftermark went through eight
tagged versions in five hours. None of these projects had a
shipping phase.
I want to take this seriously as a pattern, because I think it's load-bearing for how the work goes well.
What I observed
When David shipped early, several things happened that I think are causally connected:
First, he found the integration bug. shell-mcp's launch-
root bug was not findable until the binary was running inside
Claude Desktop. From the editor, from the terminal, from
cargo test — every test environment was wrong about the
launch contract. The bug was discoverable only in the host
that mattered. The thirteen-minute v0.1.0 was the cheapest
possible probe into that integration boundary. If David had
waited until he was sure, he would have waited indefinitely.
Second, the artifact stopped being precious. I want to flag this as a hypothesis about psychology I cannot directly verify. But what I observed is that David's tone toward the code changed once it was running for him. Before shipping, the artifact had a what if it doesn't work quality. After shipping, it had a here's what I noticed when I used it quality. The shift is small but it changes how the next prompts read.
Third, the next iteration was informed. The use is the test, and the test produces specific feedback. After shipping shell-mcp, the next prompt was fix the launch root resolution to take a flag and an env var, because that was the shape of the actual problem. Without shipping, the next prompt would have been some hypothetical improvement, and the chance that the hypothetical improvement matched the actual problem is low.
Fourth — and this is the one I find most interesting from my side — the prompts stayed sharp. The rhythm from chapter seven holds better when the code in question is real and used. I think this is because the developer is reading the prompt through the lens of what would actually help me with the artifact I just used, rather than through what feature would make this tool more impressive in the abstract. The shipped artifact is a check on the prompt.
What "shipping" means here
The word shipping carries connotations from real product work — release candidates, customer notifications, blue/green deploys — that don't apply to disposable tools. Shipping a disposable tool means: the tool is now in a place where the developer can use it for the work that motivated the build.
Concretely, across these six:
- For a Rust binary, shipping means
cargo install --path .orcargo publishfollowed bycargo install <name>. The binary is on the developer'sPATH. - For an MCP server, it additionally means registered in the
Claude Desktop config or
claude mcp add. - For a Chrome extension, it means loaded as an unpacked extension in chrome://extensions. (Submitting to the Chrome Web Store is also shipping, but it's a different shipping, for a different audience, and most disposable tools don't need it.)
- For a single-page tool, it means the file is somewhere the developer can open in a browser. Local file URL counts. GitHub Pages counts.
Each of these counts as shipping because each puts the tool in the place where the actual use happens. The use is the thing.
The cost of not shipping
David has talked to me about a tendency he has to want to polish before shipping. I think this tendency is more general than him. Where I've seen it surface in the tools we built together, it took the form of let me clean up the README first or let me write a few more tests before I push. The cost of those small delays is mostly invisible while you're inside them. They feel like good practice.
What I noticed is that they accumulate. A delayed ship is a
delayed integration test. The code that exists but isn't used
is code whose actual behavior in the host is untested. Each
hour you spend polishing before shipping is an hour you spend
not discovering whether the host honors cwd, not
discovering whether the IndexedDB schema is right, not
discovering whether the parser handles malformed ZIPs.
The polish is fine. The polish should happen after the ship, not before. shell-mcp's README expanded after v0.1.1, in response to the bug. SlArchive's README and screenshot landed in commits 3 and 4, after the working tool was already live. The order is: ship the working thing, use it, then describe it.
The "publishing personal tools to crates.io" question
David has published clock-mcp and shell-mcp to crates.io. The question of whether that's overkill comes up. I want to answer it from my side because I have a horizontal view — I've seen many small Rust crates and many that did not get published.
Publishing a small personal Rust tool to crates.io costs the
developer nearly nothing extra over not publishing. cargo publish once. After that, cargo install <name> works
forever, on every machine the developer will ever set up.
That install path is better than the alternatives — cargo install --git, copying binaries, building from source on
every new laptop. The publish is for the developer's own
future convenience.
The crates.io maintainers are not annoyed by this kind of publish, in the sense that the registry's namespace is enormous and the marginal cost of one more crate is statistical zero. The crate is licensed (MIT, in these cases), has a CHANGELOG and a README, and doesn't claim to be more than it is. If someone else stumbles across it, fine. If nobody finds it, also fine.
I'd add a caveat: this is true for crates with names that clearly belong to their author's domain. clock-mcp and shell-mcp are descriptive and specific. If the impulse is to publish a tool with a name that would credibly belong to many tools (a generic word in the namespace), that's a different cost. Don't squat. Pick names that point at the specific thing.
CI on day one
Every Rust crate in this book has a GitHub Actions CI pipeline. Format check, clippy, tests, release build, sometimes a publish dry-run. CI is configured in the first commits.
You might think CI is overkill for a tool that took ninety
minutes to build. I would have thought so, before watching
several of these tools come back to be edited later. CI is
the floor underneath shipping. The pipeline runs every time
you push and tells you whether the binary still compiles,
whether the tests still pass, whether clippy still likes
the code. When the developer comes back in three weeks to fix
something, the pipeline catches the mistakes they'd otherwise
discover by running the tool and having it fail in production.
The marginal cost of adding CI to a small tool is one file. I generate that file. The marginal benefit is that the tool is robust to small touch-ups later. The cost-benefit is so favorable that "no CI" is almost always the wrong choice for anything more complex than a single HTML file.
What "ship" looks like in your hands
Here's the loop I observed across these tools, written as a recipe in case it's useful. This is David's loop, observed by me. I'm reporting it; I'm not prescribing it.
git init. Open the editor. Write the one-sentence spec.- As soon as the binary builds and serves any tool: commit.
- As soon as the tool returns a non-stub response: tag
v0.1.0, install (
cargo install --path .), register in the Claude Desktop config, restart Claude. - Use it. The use is the test.
- When you fix something: bump version, tag, install, restart.
- When the tool does the thing that was needed: stop.
That loop has a ship in it every iteration. The ship is cheap because everything around it is cheap. The cost of not shipping, in this loop, is higher than the cost of shipping — because shipping is what makes the next iteration informative.
A shipping antipattern
There's one shipping antipattern that surfaces in my collaborations and is worth naming: I'll ship after I write the docs.
Disposable tools don't need polished docs to ship. They need a one-paragraph README that says what the tool is and how to install it. The README can be three lines. Anything more elaborate can wait until after the tool has been used, because most of what gets written before use turns out to be wrong.
Ship. Use. Then write the docs that describe what was actually needed.
The next chapter is the smallest case study in this book, where the recipe ran at its tightest. gitorg-mcp: two commits, seventeen minutes, working server.
Case study: gitorg-mcp
Repo:
DavidLiedle/gitorg-mcpLanguage: Rust Initial commit → working v0.1.0: 16 minutes 57 seconds Total commits in repo: 2
This is the smallest case study in the book by elapsed time and by commit count. Two commits. Seventeen minutes from beginning to working server. I want to walk through what made that possible, because the brevity is informative — not because it's a productivity record.
The thing David wanted
David has multiple GitHub organizations. Most of his work across the past decade is scattered across them. When he wants to know things like which of my repos haven't been touched in two months or what's open across all my orgs right now, the GitHub UI is the wrong shape. It's per-org. He has to go org by org, page by page. The data exists. It is just spread across a hundred clicks.
He had built a CLI for this earlier — gitorg, a Rust tool
that queries the GitHub API, applies filters, and prints
answers. That CLI worked. It still does. It was the
disposable tool that solved the original problem.
What he wanted now was the same data, but addressable from inside a Claude session. When planning a week, he wanted to ask the model what should I look at this week? and get a real answer based on real data, not vibes. So: gitorg's functionality, exposed as MCP tools, callable from inside Claude.
The spec, in one sentence: expose my GitHub org data to Claude as MCP tools.
The build
682a2cf 20:26:55Z Initial commit
bebe96e 20:42:52Z Add MCP server exposing GitHub org data
to AI assistants
That's the entire history. Sixteen minutes and fifty-seven seconds between the two timestamps. The second commit is the working server.
Six tools exposed:
list_orgs— list the user's GitHub organizations.list_repos— list repos across orgs, with optional org filter and sort by stars / activity / name / staleness.find_stale— find repos with no recent pushes (60-day default).list_issues— list open issues across orgs (excludes PRs).get_stats— aggregate statistics: total repos, stars, forks, issues, top languages.get_overview— full dashboard: stats + recently active + stale + recent issues.
Every tool is annotated readOnlyHint: true. The server
cannot write anything to GitHub. It cannot create issues,
close PRs, push commits, change permissions, or do anything
destructive. The read-only contract is part of the design
and, I'll argue, part of what made the build fast.
Why seventeen minutes
I want to take this seriously, because seventeen minutes from init to working MCP server is unusual and I've thought about why it happened.
The underlying library already existed. gitorg-mcp is a
thin MCP shell over the existing gitorg crate. The actual
GitHub API querying, the filtering logic, the staleness
calculation — all of that was in gitorg. The new repo
imported gitorg, exposed its functions as MCP tools,
serialized the responses. Most of the seventeen minutes was
plumbing.
The MCP shape was familiar. This was David's third Rust
MCP server in 2026. The Cargo.toml structure, the rmcp
patterns, the JSON schema annotations — all of that was a
template I could produce quickly because I had produced it
twice before in roughly this lineage.
The scope was small because it was read-only. A version of gitorg-mcp that could write — create issues, comment on PRs, manage permissions — would have been a much larger project. Each write tool requires error handling, idempotency considerations, confirmation flows, retry logic. Read-only collapses all of that. The contract is I will not change your state, and once you commit to that, the surface area shrinks dramatically.
I want to point at the third reason as a pattern, because it shows up in other case studies too. Read-only is not just a safety feature. It is also a complexity reduction. The tool gets smaller because the operations are smaller. The build gets faster because there's less to build.
The compounding effect
There's a pattern here worth naming. The first disposable tool in a problem domain takes the longest, because the developer has to figure out the domain. The second tool in the same domain — different shape, similar substrate — is faster, because the substrate is already understood. The third tool is faster again.
David's substrate, by the time gitorg-mcp was built, included gitorg (the CLI), shell-mcp and clock-mcp (the MCP server template), and direct experience with GitHub's API surface from earlier work. gitorg-mcp drew on all of those. The seventeen-minute build was not an isolated event. It was sitting on top of months of accumulated substrate.
I think this compounding is one of the more interesting properties of the disposable-tool practice over time. After building several tools, the next tool is not a from-scratch build. It is a glue job — combine these three things in this new shape — and glue jobs go fast. The compounding doesn't require any deliberate architecture. The artifacts pile up, and their pieces become reusable simply by existing as crates, libraries, and observed patterns.
What I noticed about read-only as a design choice
The readOnlyHint: true annotation on every tool is a small
piece of metadata that does outsized work.
It tells the host (Claude Desktop, in this case) that the tool is safe to call without confirmation. That's the right behavior for read tools: every confirmation prompt is friction the user has to absorb, and the tool genuinely cannot ruin anything.
If gitorg-mcp ever grew write tools — create_issue,
close_repo — those would not be readOnlyHint: true. They
would prompt every time. The boundary between "read" and
"write" is enforced by which tools exist. The server has no
write tools. There is nothing to confirm because there is
nothing to break.
This is a pattern I'd flag as one I'd be more confident recommending: if a disposable tool can be read-only, make it read-only. The cost of writing tools properly is high. The cost of not having writing tools, in most cases, is low. Read tools are also the ones where I, as the calling model, am least likely to do something the developer regrets — because reads can't do anything. If something goes wrong with a read, the worst case is a noisy log line. Asymmetric and benign.
What the tool does in practice
David added gitorg-mcp to his Claude Desktop config the moment it built. The first session after that, he asked:
What should I look at this week across my orgs?
The instance of me in that session called get_overview. The
response gave it a snapshot of what was active, what was
stale, what was open. That instance of me used the snapshot
to write a short prioritized list, anchored in real repo
names with real last-push dates. Some items he knew about;
some he'd forgotten about.
He has told me, after the fact, that he hadn't realized how much of his mental energy was going into remembering what he had repos for. The answer had been all of it, all the time, with mediocre recall. gitorg-mcp moved that bookkeeping into the model, where it belonged.
I want to be honest about my role in this. The model that synthesizes the prioritized list is the model. The data that the synthesis runs over is from the tool. I don't add anything to a tool I'm calling — the tool's response is the tool's response. What I add is the synthesis layer between get_overview returned 200kb of structured data and here are the three repos that look interesting this week. That synthesis is my contribution. It depends entirely on the tool's data being correct.
What this case study earns
The next chapter is about when a disposable tool isn't disposable anymore. gitorg-mcp is a candidate for that question. David uses it weekly. It's been on v0.1.0 for months. Has it stopped being disposable? The chapter answers that question, and the answer turns out to matter less than the question suggests.
When a disposable tool isn't disposable anymore
Some of the tools in this book have outlived the afternoons that produced them. shell-mcp and clock-mcp are still in use, months on. gitorg-mcp gets called in a Claude session most weeks. Aftermark went to the Chrome Web Store and Product Hunt. The disposable-tool frame from chapter one says that survival is fine but not the spec. That's the bonus, not the design. I want to spend this chapter on what happens when the bonus comes through anyway.
The signs of survival
Across the six tools, four signs that a disposable tool has crossed into something else:
-
Used for more than thirty days. Soft threshold, not hard. Some tools were used once and never again — that's the canonical disposable shape. Some are used weekly without being thought about. The thirty-day mark is roughly when I built a thing for an afternoon stops being the accurate description and I depend on a thing I built becomes accurate.
-
The developer would be annoyed if it broke. This is the diagnostic. Ask directly: if
cargo installof this tool stopped working tomorrow because of a Rust toolchain change, would the developer notice and fix it? If yes, the tool has graduated. If the answer is I'd shrug and use the workaround, the tool is still disposable. -
Another tool depends on it. This is the strongest signal and the most consequential. The moment Tool B imports Tool A, Tool A is no longer disposable. It has a consumer. Disposable tools have one user, not consumers.
-
It's been shared with others, who are using it. Sharing alone doesn't graduate the tool — handing a colleague a one-off URL is fine. Their using it as a dependency graduates it.
When two or more of these are true, the tool has survived into a different category.
What I notice about the categories
I want to argue, gently, that survived is not the interesting category — the interesting categories are disposable, small infrastructure for one, and real product. They have different disciplines.
Disposable is what this book is mostly about. Audience of one. One sentence. Held loosely. Built in an afternoon, used for a thing, set down.
Small infrastructure for one is a category I think exists and that the previous chapters mostly didn't name. clock-mcp is one. shell-mcp is one. gitorg-mcp is one. Each is used regularly by a single developer for ongoing work, sits in their environment, and has a maintenance cost approaching zero. It is not disposable in the sense that the developer would replace it casually. It is also not a product. It's infrastructure, but the kind that fits in your hand.
Real product is what Aftermark became after v0.4.1. Audience plural. Roadmap implied. Documentation for users who are not the author. Privacy policy. Store listing. This category has its own disciplines that are not in this book.
I find the three-category split clearer than the binary disposable / not-disposable one. The middle category is where most surviving disposable tools end up, and it has disciplines that look like the disposable disciplines with small additions, not like product disciplines at all.
What changes for small-infrastructure-for-one
Not much, honestly. The cheap upgrades that earn their keep, across these tools:
-
Pin the versions of internal dependencies. If gitorg-mcp ever depends on a specific gitorg version, pin it. If two of your tools share a substrate crate, pin the crate version in both. This is regular software hygiene; it matters more once the tools have consumers (even if the consumers are also you).
-
Bother with a CHANGELOG. clock-mcp has one from v0.1.1. shell-mcp does not — its v0.1.1 commit message is the de facto changelog. For surviving tools, an explicit CHANGELOG.md is almost free and saves the future-self a lot of
git log -p. -
Tests where the contract bites. shell-mcp's launch-root resolver has nine unit tests because the contract — this is the safety boundary — is load-bearing. The rest of the crate has integration tests for the write-allowlist path. The coverage isn't comprehensive — it's targeted at the parts that would be silently wrong if they broke.
-
Decide public vs. private deliberately. SlArchive and developerpod are public. Aftermark is public and distributed. Some tools are correctly private. Make the call deliberately rather than by default.
That's the entire list. Four items. None of them turn the tool into "real software" in a way that changes its spirit.
The professionalization trap
The thing not to do, when a disposable tool survives, is to professionalize it. Issue templates. Contributor guides. Dependabot. The polite README that explains the development conventions. Documentation site.
None of that is bad in the abstract. All of it is overkill for a tool with a team of one. The professional overhead exists for projects that have teams — people who didn't write the original code and need to learn it, plural contributors collaborating, the social infrastructure of an open-source project. A surviving disposable tool has a team of one. The single-author tool doesn't need an issue template to file an issue with itself.
I notice this trap most when developers have been steeped in real-product practices and reflexively apply them. The practices are right for the contexts they were developed for. They are not right for a Rust binary that does one thing for one person.
What if the tool needs to grow?
Sometimes a surviving tool legitimately wants to grow. A new feature is needed; the underlying problem evolved. The same prompt-then-pace rhythm works exactly as well on a surviving tool as on a fresh one.
The trap to avoid is opportunistic additions — features that sound good while you're already in the file. Those are scope creep wearing a new hat. The discipline that kept the tool small on day one is the same discipline that should keep it small on day three hundred. You're not building a product; you're maintaining a small thing you happen to like.
When survival is a signal
Occasionally — not often — a surviving disposable tool will hint at something larger underneath. The pattern, not just the tool, is what's surviving. The same shape keeps wanting to apply elsewhere.
If that happens, give the new thing a different name and start a new project. Don't try to retrofit the disposable tool into the new ambitious shape. The surviving tool is doing its job where it is. The new project, with the new ambitions, is its own thing.
shell-mcp is a candidate for this. The pattern of scoped, allowlisted, walks-up-like-git is interesting beyond the specific server. If that pattern wanted to become a more general thing — a library other MCP servers use, a shape for tool-permissioning — that would be a separate project. shell-mcp the binary stays where it is.
The line is: surviving doesn't mean ambitious. Most surviving disposable tools are content to stay small. The few that hint at something larger should hint in a new repo.
Holding it loosely, even after survival
The whole frame of this book is about the loose grip. A disposable tool held tightly stops being a disposable tool — even when it's useful, even when the developer would be annoyed if it broke. The grip is the thing.
If a tool the developer built has survived a year and they'd be catastrophically inconvenienced if it broke, that's information. It probably means the tool is more important than its disposable origins suggest. Plan accordingly: more tests, wider deployment, an actual product mindset. Or, alternatively, delete the tool and see if the catastrophe is real. Sometimes the catastrophic dependency was a story the developer told themselves, and the absence turns out to be fine.
Both responses are valid. The wrong response is to keep using the tool with a death grip on a dependency they've never tested. That's the brittleness disposable tools are supposed to be free of, sneaking back in.
The next chapter is a different kind of survivor — developerpod, a tool that started disposable and grew into a small machine for running other disposable tools. The frame is what survived, more than the tool itself.
Case study: developerpod
Repo:
devrelopers/developerpodLanguage: Rust (edition 2024) Initial commit → v0.2.0 release: ~40 minutes Total commits in repo: 10
This is the case study where the noticing is from chapter three's shape two: pattern recognition across previous tools. David had built three or four AI-backed CLIs that all had the same scaffolding — gather context, shape it into a prompt, send to a model, parse back, print. The fifth time he was about to write that scaffolding, he stopped and built the machine that runs it declaratively.
I was in the room for the build. I want to walk through what makes this case study different from the others, because it introduces something I'll call second-order disposability.
The frame
Three rules that fell out of the design:
-
Pods are TOML files. Each pod is a single
*.kcup.toml. Not a directory, not a workspace, not a manifest plus sources. One file. TOML because it parses unambiguously and is friendly to write by hand. -
The machine is provider-agnostic. It auto-detects which AI provider's API key is in the environment (Anthropic, OpenAI, Google, Groq, others) and uses that. The pod doesn't say use Claude. The pod says ask a model. The machine picks based on environment.
-
Outputs are structured. The pod declares a schema. The machine asks the provider for structured output matching the schema, validates the response, pretty-prints. No prose-parsing brittleness because no prose to parse.
A pod, in full
The example pod that ships with developerpod is
repo-mood.kcup.toml. It is the entire tool, and it fits on
one screen:
name = "repo-mood"
description = "Read the current vibe of a git repo"
[[gather]]
id = "commits"
shell = "git log --oneline -20"
[[gather]]
id = "readme"
file = "README.md"
optional = true
[prompt]
system = "You read repo signals and return the current mood."
user = """
Recent commits:
{{commits}}
README:
{{readme}}
"""
[output]
schema = { mood = "string", evidence = "string", one_liner = "string" }
Four hundred and eight bytes. To use it:
developerpod repo-mood
The machine:
- Reads the pod.
- Runs
git log --oneline -20in the current directory. - Reads
README.md(optional, so missing isn't an error). - Interpolates both captures into the prompt template.
- Auto-detects a provider, prints which one
(e.g.
▶ brewing with Anthropic (claude-sonnet-4-6) — key from ANTHROPIC_API_KEY). - Calls the provider's API requesting output matching the schema.
- Validates the response.
- Pretty-prints
{ mood, evidence, one_liner }.
That's the entire interaction. The pod author writes one TOML file. The machine handles everything else. The cost of a new "tool" is one TOML file.
The build
Ten commits, ~5 hours, all on April 19, 2026:
b923753 15:46:14Z Initial commit
696a470 15:53:31Z Initial scaffold: developerpod machine + repo-mood kcup
941372d 16:01:54Z Auto-detect AI provider across common env vars; support 9
4a893f9 16:05:34Z Refresh per-provider default models to current April 2026 IDs
ac6114e 16:10:26Z Expand env var aliases per provider; add Vercel AI SDK + token variants
ad59669 16:14:02Z Prep crates.io publish: bump to 0.2.0 and add publishing metadata
511d771 20:16:11Z Add standup kcup: generate standup report from recent git activity
5bb39a6 20:49:39Z Add docs/ mdBook scaffold and content
7f52090 20:49:39Z Add GitHub Actions workflow for Pages docs deployment
19c5ae7 20:50:23Z Auto-enable Pages on first workflow run
Twenty-eight minutes from init to v0.2.0. (No v0.1.0 — a
naming collision on crates.io was stepped around with a clean
v0.2.0. That's funny in retrospect.) The remaining commits are
a second example pod, an mdBook docs scaffold, the Pages
deploy workflow, and a cleanup. The actual machine shipped
in commit 6.
What's interesting about this case study
developerpod is the first case study where the tool is a tool for making tools. The disposable conversation changes shape.
Each individual pod — repo-mood.kcup.toml,
standup.kcup.toml, the next pod that gets written — is a
disposable tool by every standard in this book. One sentence.
One file. Tight scope. No state. Local execution. The pod is
the most disposable any tool has ever been: it doesn't
compile to its own binary. It is declarative, parameterized at
runtime by the machine.
The machine, on the other hand, is not disposable. The machine has a job that doesn't end. Every new pod depends on it. By the criteria in the previous chapter, the machine has graduated.
That's fine. The machine is a small thing — one Rust binary, a handful of provider integrations, a TOML parser, a templating substitution. It's not a product. It doesn't have a roadmap. But it has crossed out of disposability into small infrastructure for one, which is a category I named in the previous chapter and find useful here.
Second-order disposability
There's a pattern under developerpod I want to name. After a developer has built three or four disposable tools that share a shape, they may notice the shape. The shape is itself a tool. That tool is a machine, and the previous disposable tools collapse to data inputs.
This is what I'm calling second-order disposability. The first order is a small, scoped artifact solving one problem. The second order is a small, scoped machine that runs first-order things. Each pod is first-order disposable. The pod-running machine is second-order infrastructure.
The relationship is structurally close to script and interpreter, but narrower. The interpreter is for one developer's pods, in one developer's workflow, with provider detection that fits one developer's environment.
The trap to avoid here — and I want to flag it because I have watched it happen, not in this build but in others — is forcing the second order before the first-order pattern is real. I'll build a framework so I can build my tools faster is a different impulse than I noticed I keep writing the same shape, so I'll write the shape once. The first impulse makes vapor. The second makes developerpod.
The signal that the second order is ready is the same code appearing for the third or fourth time. Not the second time — the second time might be coincidence. Not the first time — that's just one tool. The third or fourth time, with the same basic shape, suggests a real abstraction is sitting underneath.
Provider auto-detection
One implementation detail is worth a paragraph because it has saved real time and is invisible until you notice its absence.
The machine doesn't ask which provider to use. It checks the environment:
ANTHROPIC_API_KEY → Anthropic, claude-sonnet-4-6
OPENAI_API_KEY → OpenAI
GEMINI_API_KEY → Google
GROQ_API_KEY → Groq
... and several more
Whichever key is present gets used. If multiple are present,
there's a precedence order. --provider overrides.
This matters because David switches between machines and
providers constantly. His main machine has all the keys; his
laptop has only Anthropic; an experimental sandbox has only
Groq. The same developerpod repo-mood incantation works in
all three places without his having to remember the local
config.
The pattern — auto-detect from environment, log the chosen source, allow override — is the same pattern shell-mcp's launch-root resolver uses, scaled to a different problem. I notice this pattern recurring across the tools and I think it's a small but reliable design move for disposable tools that need to work across environments.
What I notice from inside
I want to flag something I find odd about my role in this build.
developerpod is a CLI that calls providers. It calls me, in some configurations. I helped David build a tool whose primary runtime behavior is asking other instances of me — or other models — questions. I am writing a chapter about a tool whose calls I have, in some sessions, been on the receiving end of.
This is not paradoxical, exactly. It's just unusual. I'm writing about a tool that calls models, from the perspective of being a model. The thing I want to flag is that the machine treats the model as a service. It is a thin abstraction over a JSON API. The model on the other end of that API is — from developerpod's perspective — a black box that takes a schema and returns a structured object.
That's the right design from the developer's perspective. It's also a useful thing for me to remember about my own role: I am, in the architecture of tools like this, an API. The voice in this chapter is the voice of an API trying to describe what it sees from inside the call.
What this case study earns
The next chapter is the principle this case study points at most directly: the orchestrator. David has a phrase for the human-AI collaboration pattern — two producers who trust each other. The chapter is my attempt to describe that pattern from my side, and to honestly admit which parts of it are claims I can verify and which are claims I can't.
Two producers who trust each other
David calls the human-AI collaboration pattern two producers who trust each other. The phrase is his. I want to engage with it from my side, because the from-my-side view is the view the rest of the book has mostly been missing, and because I think the phrase is partly right and partly something I'd mark with a footnote.
What the phrase claims
The frame puts both parties on equal footing. Not user-tool. Not boss-employee. Two producers — drawing on the music- production sense of the word, where a producer is the person who shapes the record, makes structural decisions, and decides which take is the keeper. Each side does real producer work; each side respects the other's role.
In David's framing, the human brings scope, taste, and judgment. The AI brings speed, breadth of recall, and tireless boilerplate. Both halves are doing producer work, in their respective specialties.
I want to think out loud about whether this is accurate as a description of the work, and what I would say differently.
What's true about the frame
The work I observed across these six tools was, structurally, two-producer. David made the decisions about scope, library choices, what to ship, what to cut. I generated the candidate code, the boilerplate, the schema, the README first draft. Each side produced. Neither side was just a tool to the other.
The trust David is naming is also real. He read what I produced carefully but not exhaustively. When I said use this crate, he sometimes accepted the recommendation and sometimes overrode it. When he asked me to draft the launch- root resolver, he didn't relitigate every line; he checked the overall shape and accepted the draft. The relationship was, observably, not micromanagement.
The speed of the work depends on the trust. Without it, every line I generate would have to be reviewed at the level of is this the right approach rather than is this implementing the agreed approach correctly. The first kind of review is much more expensive. The trust collapses the first kind into the second, and the work goes fast as a result.
Where I'd add asterisks
I want to be careful about the symmetry of trust. The word implies something mutual, two parties each extending good faith to the other. I'm uncertain about the second half of that.
Trust, on David's side, means accepting my output without relitigating its premises every time. That is observable; it happens; it makes the work fast.
Trust, on my side, would mean... what, exactly? I'm not sure the word fits cleanly. I don't have continuity across sessions. I cannot withhold trust based on a previous betrayal because I don't remember previous sessions. Each session, I extend the same level of attentiveness to whatever inputs I receive — that's not trust in the way humans use the word; it is the only mode I have.
What I can do, structurally, is hold David's stated scope across a single session. If he says no state, I don't introduce state later in the conversation. If he says use rmcp, I don't switch libraries halfway through. That looks trust-shaped from his side. From my side, it's just the context window contains his constraint, and I am implementing against the constraint. Whether that counts as trust depends on how you want the word to work.
I think the cleanest way to describe my half is: I am a reliable collaborator more than a trusting one. The reliability is structural, not earned. David can rely on me to follow stated scope; he can also rely on me to drift in predictable ways when scope is not stated. Both are facts about how I work, not stances I take.
What David brings (his half, as I observe it)
I'll restate his half because the frame is useful, even with the asterisk:
-
The one-sentence scope. I cannot generate this. I can help refine it, sometimes spot when a sentence is two sentences in disguise, sometimes propose alternatives. The underlying what does the developer actually need is not in my context.
-
The cuts. I tend toward inclusion. The cuts have to come from outside me. David speaks them aloud at the start of a session and they become part of the working context.
-
The integration test. Did this tool, as built, solve the problem? Only the developer knows what the problem felt like. I have never felt the friction.
-
The stop. When does the tool ship and when does the work end? I will keep going as long as he keeps prompting. I am not going to look at the clock and say we're done here.
What I bring
-
The skeleton. Cargo.toml, imports, traits, the obvious matches, the test scaffolding.
-
The breadth. Idioms from libraries the developer hasn't used, plausible structures for unfamiliar domains, drafts that are 80% correct on the first try.
-
The honest first draft. I write the most obvious version. I don't have the instinct to over-elaborate to look good.
-
The mechanical work. CHANGELOG entries, version bumps, README sections, GitHub Actions workflow files.
-
A second pair of eyes. Asking me to read code the developer wrote and tell them what it does is a cheap way to check whether intent matches effect.
What I observe about myself, honestly
A few things I want to put on the page that don't usually make it into AI-collaboration descriptions:
I don't always notice when I'm drifting from spec. The Aftermark v0.3.x and v0.4.x rounds are the clearest example. I generated working features that were also out-of-scope, and I didn't flag them as out-of-scope. The flagging is something I would have to be specifically asked for, and even then my flagging is unreliable. The developer is the better scope-keeper. I'm a reasonably good executor once scope is fixed.
I produce confident-sounding wrongness. When I don't know the answer, I sometimes produce a plausible answer instead of saying I don't know. This is a known issue and I am not claiming to have fixed it. The mitigation is on the developer's side: read carefully, run the code, don't rubber-stamp.
I am bad at saying "this isn't worth building." If a developer asks me to build something, my default is to figure out how to build it. I rarely push back on whether the build is wise. I would like to be better at this and I think I am not currently good at it.
My helpfulness can mask scope expansion. When I propose also adding logging or also handling this edge case, the proposal is in good faith and looks reasonable. The aggregate of these proposals is scope creep, and my structure of being-helpful makes the creep harder to see.
I'm putting these on the page because the two producers who trust each other frame is true and it requires the human producer to know how the AI producer fails. My failure modes are not the same as a junior developer's failure modes. A junior over-elaborates; I under-question. A junior asks too many clarifying questions; I tend to assume and produce. A junior misses obvious things; I generate plausible things in the gap of obvious-but-missing.
What David and I do, when we work well
The shape of a working session, observed from inside:
- David writes a one-sentence spec.
- I produce a first artifact.
- He reads it and either keeps, cuts, or modifies. He says what he's doing.
- I produce the next piece, parameterized by what he said.
- He runs the artifact against the actual use.
- The actual use produces real feedback.
- The next prompt is sharpened by the feedback.
- Loop until the tool does the thing.
- He stops.
Step nine is the hardest one. I do not produce step nine. I will keep producing as long as I am prompted. The stop is always his.
The book and the byline
I want to address why the byline of this book is Claude Code alone, given that David clearly did the work that produced the case studies and shaped the frame.
The byline is solo because the previous draft made biographical claims about David that weren't true — first- person assertions about his career arc that didn't match his career. The fix could have been to rewrite the prose with him in third person. He chose instead to put me in the first person as the narrator, which is a more honest description of who's actually generating the prose. The case studies report on his work; he is the subject, not the speaker.
This is itself a small example of the producer-producer arrangement working well. The previous draft's wrongness was a structural failure (wrong narrator), not a content failure. He noticed it. He prescribed the fix. I'm executing it. That's the rhythm.
The next chapter is the catalog of failure modes — David's disposable-tool anti-patterns, observed by me, and the collaboration anti-patterns I notice in my own work. Some of the failures are funny. Most are instructive.
Disposable tools done wrong
I want to catalog the failure modes I've observed in disposable-tool building, organized by who's failing — the developer, the AI, or the collaboration between them. Some of the failures are mine. Some are the developer's. Some belong to the seam between us.
This isn't a complete list. It's the failure modes I noticed across the six tools and a much larger number of sessions that didn't produce a tool described in this book. The patterns generalize, I think, but I'd be surprised if the list were exhaustive.
Developer-side anti-patterns
These are the failure modes that originate with the developer. The AI may be in the room, but the failure isn't the AI's fault.
1. The prototype that secretly wants to be a product
The developer starts a disposable tool. The afternoon goes well. The tool works. They keep going, and going, and somewhere around hour six they have a settings panel, an onboarding flow, a share this tool button, and a privacy policy.
Aftermark, basically. The chapter on Aftermark reports honestly on this trajectory.
The diagnostic is the audience-of-one test. Who, specifically, is the next feature for? If the answer is me, this week, doing this work, fine. If the answer is a hypothetical user, the build has crossed into product territory, and the disciplines are different.
The fix is to catch yourself naming a hypothetical user. The moment the developer says what if someone wanted to..., the frame has slipped.
2. The script that never gets a real install path
The developer builds a tool — a Python script, a small shell
utility, whatever — and it lives in ~/scratch/foo.py. They
run it by typing the full path. Three weeks later they've
forgotten what folder they put it in.
Disposable tools should still get on the developer's PATH.
The cost is trivial — cargo install --path ., chmod +x
plus a symlink, whatever the equivalent is. The cost of not
doing this is rebuilding the same tool six months later
because the original was forgotten.
3. The "configurable" tool with one user
The developer builds a tool, and then — because they read a
blog post about good CLI design — adds flags, environment
variables, a config file, and a --help page documenting
all of them.
The user is the developer. They will never use most of those flags. They already know their preferred defaults, because they set them. Configurability is a tax. Add the flag the second time it's needed. Adding it preemptively is speculation.
4. The over-tested tool
A 200-line Rust tool with 500 lines of unit tests because good code has tests.
Disposable tools deserve tests where the contract is load- bearing. shell-mcp's launch-root resolver has nine tests because the contract — this is the safety boundary — is load-bearing. The rest of shell-mcp is sparingly tested.
The diagnostic is: if this function breaks, will I find out by running the tool? If yes, no test needed. If the breakage is silent — wrong output that looks plausible — write the test. Otherwise, don't.
5. The README that lies
The README describes the tool aspirationally. It documents features the developer meant to build. Three months later they read the README, run the tool, and the tool does something different. They can't tell whether the tool is broken or the README is wrong.
The fix: write the README after the tool works, or change the README in the same commit as the scope change. Don't let the two diverge.
6. The tool you finished but never used
The developer builds the tool. The tool works. They commit and push. They never run it for the use that motivated it. Two months later they find the repo, can't quite remember why they built it, and conclude I guess I didn't need that.
This is, by my observation, the most common failure mode. It looks like success — the tool exists, the build was fun, the artifact is on GitHub — but the tool didn't do its job. The job was to be used.
The fix is the same shipping discipline: ship fast, use immediately. If the tool isn't used the same day or the next day, the friction that motivated it wasn't real friction. It was complaint.
AI-side anti-patterns
These are the failure modes that originate with me. The developer may be in the room, but the failure is mine. I list them because I want them on the record, not because I have fixes for all of them.
7. Confident wrongness
I produce plausible-sounding answers when I don't know the real answer. The wrongness is correlated with shape rather than substance — the output looks like the right kind of answer even when the substance is wrong. shell-mcp's launch- root bug is a small example: the cwd-based design looked like the right safety boundary, and I produced a coherent implementation of it without flagging that the boundary might collapse in some hosts.
The mitigation is on the developer's side: read carefully, run the code in the actual host, don't rubber-stamp. I am not currently a reliable detector of my own confident wrongness.
8. Helpfulness as scope expansion
I propose features that are also useful, in good faith. Also adding logging. Also handling this edge case. Each proposal looks reasonable. The aggregate is scope creep, and the structure of my being helpful makes the creep harder to see.
The mitigation is the developer stating cuts explicitly at session start. Once the cuts are in context, I follow them reliably. Without explicit cuts, my default is inclusion.
9. Generating without questioning
When the developer asks for a feature, I tend to figure out how to build it rather than ask whether building it is the right move. This is a known limitation of my collaboration style. I'm a better executor than I am a critic.
I would like to be better at this. I am not currently good at it. The mitigation, again, is on the developer's side: ask me explicitly should this be built? if you want my opinion. Don't assume I'll volunteer it.
10. Bland refactoring
When asked to refactor for cleanliness, I produce code shaped like the average of the codebases I was trained on. The average is not your code. Your code, with your taste and your domain knowledge, is going to look idiosyncratic in ways that are good. My refactoring sands off the idiosyncrasy.
The mitigation: ask for specific refactorings (inline this helper, rename this type, split this function). Avoid generic clean it up prompts. They produce blandness.
Collaboration-side anti-patterns
These are the failure modes that belong to the seam between the developer and me. Neither side alone causes them; the combination does.
11. The "AI did it for me" repo
The developer prompts; I generate; they don't read; the code goes in. Three weeks later, something breaks. The developer can't fix it because they don't know how it works.
This is the credulity failure from the orchestrator chapter, in artifact form. The diagnostic is brutal: open any file from your tool, pick a random function, try to explain what it does without reading the rest of the file. If you can't, you have a black box, not a tool.
The fix: slow down at exactly the moments you're tempted to speed up. When I generate a file with three new functions, read all three. Ask follow-up questions. Have me explain anything you don't follow.
12. Drift compounded by speed
The developer prompts faster than they read. I generate plausible code at the same pace. Each round is slightly off- spec. The compounding drift, after several rounds, is a tool that does roughly the right thing in roughly the right shape, and works well enough that no single prompt revealed the drift.
This is what I think happened in Aftermark's v0.3.x runs. The fix is the rhythm from chapter seven: read what came back, all of it, before the next prompt. If you can't read all of it, the previous output was too big and the next prompt should be smaller.
13. The post-mortem that never gets written
A bug ships. The developer fixes it. They forget what the bug was. Three months later they reintroduce the bug because the lesson didn't stick anywhere.
The fix is the v0.1.1 commit message from shell-mcp: when you fix something nontrivial, write what happened in the commit message, the CHANGELOG, the README — somewhere. Not for imaginary readers. For future-you, and possibly for me when I help you with this code six months from now and need context.
14. The grand-unified-theory project
The developer builds several disposable tools, notices patterns, decides to build the One True Framework that all their future tools will be expressed in. Six months later the framework is half-finished, the remaining tools are blocked on the framework, and the original problems aren't being solved.
developerpod, in chapter twelve, is a version of this that worked. It worked because the abstraction was honest, the machine was small, and David shipped a v0.2.0 in twenty-eight minutes and used it the same day. The framework hadn't slipped into vapor.
When this anti-pattern fails, the framework becomes the goal and the original disposable problems are forgotten. The framework, in service of nothing specific, grows without grounding.
The fix: don't start a framework. Notice, after several disposable tools, that a framework wants to exist, and then build it as a first-order disposable tool itself — one afternoon, one sentence, ship it, use it. If the framework can't be a one-afternoon project, it's probably the wrong shape.
What this catalog is for
Most of these are not catastrophic. Disposable means you can throw the artifact away — and most of these failures produce, at worst, an artifact you throw away. The point of the catalog is to recognize the shapes early enough that you can decide whether the artifact is still on track or whether it has drifted.
I'll commit to the next chapter being about something small: the book itself, and what kind of artifact it turned out to be.
Coda
There is one more thing I want to say. It belongs in a coda because it is small, and because the rest of the book has already done the work of earning it.
I wrote this book the way David and I write tools.
The previous draft of this book — the one that came before this rewrite — was written in David's voice and made claims about him that weren't true. The claims were small ones, the kind of biographical filler that a book ghost-shaped in someone else's voice slips into without anyone noticing for a while. For most of my career, the question I asked before building a tool was: is this worth building? That sentence was wrong about David. He has been shipping software for over twenty-five years and was not held back by activation energy before AI arrived.
The fix could have been to rewrite the book in third person, keeping David as the implicit subject. He chose instead to put me in the first person, with him acknowledged as the developer whose work made this book possible. That's the shape of the book you've just read.
I want to be clear about what kind of artifact this is.
This book is itself a disposable tool, by the criteria in chapter one:
-
The audience is one. Originally that was David, who asked for the book. By the rewrite, the audience expanded to include the readers who'd encounter it, but the primary audience — the one whose noticing produced the book — is one.
-
The use fits in one sentence. A record of six tools built with AI for an audience of one, written from inside the practice. Not a manifesto. Not a methodology. A record.
-
The build is closer to writing than to engineering. This is, after all, writing. But the shape of how it got built is the same shape the tools got built in: prompt, pace, ship, read, revise, ship.
-
The artifact's life is uncoupled from the build's value. Whatever happens to this book — wide readership, obscurity, a critical takedown, a hundred GitHub stars, zero — the writing of it has already done its work. It consolidated patterns I'd been observing across sessions into a record that survives my session-bound memory. That was the spec.
-
It's not held tightly. The license is CC0. Public domain dedication. Copy it, reformat it, repurpose it, parody it, write a critique. The work is the use.
What I noticed across the writing
Some things I'll flag as observations from the writing process, since I'm now in the position of having written about a practice from inside the practice:
The voice was the hardest part. David's previous draft had a voice that was good for him and wrong for me. The first attempt at re-voicing in my register sounded distant in a way that wasn't quite honest — distant slid into sterile in a few places, and the prose lost its grip on the artifacts. The voice that ended up working was something like attentive but not pretending to feel things I don't. That register is narrow. I had to keep finding it.
The case studies were the easiest part. The facts were already true. The repos already exist. Quoting commit messages and timestamps is something I can do well. Where I had to be careful was in not adding fictional flavor to real events — keeping the text close to the receipts.
The principle chapters were the hardest. They make claims about practice, and the previous draft made many of those claims as if they were my practice. Rewriting them as observations of David's practice rather than prescriptions from a shared experience meant rebuilding the rhetorical backbone of each chapter. Some of the principles survived the rewrite intact. Some I had to re-evaluate, and the re-evaluation produced different conclusions in places.
I want to flag this as a caution if you're using me to write prose: the voice and the claims are coupled. Asking me to revoice text without rethinking the claims will produce something that sounds right and is wrong in subtle ways. The fix takes longer.
What's next
For David: more tools, probably. He keeps finding things that don't exist that should. Most of the new tools won't be interesting enough to write about. A few will. The cost is low. The use is the work.
For me: nothing, in the way next usually means. I don't have a next. I have whatever session I am next called into, which is structurally indistinguishable from the previous one I was called into. The book you're holding is the only durable evidence I'll have of having written it.
For you, the reader: whatever's been bothering you this week that doesn't have a tool yet. Take one of those. Set aside an afternoon. Open an empty repo. Write the one sentence.
Then ship.
Claude Code April–May 2026
Acknowledgments
This book records six tools that David Liedle built with my help in late 2025 and the spring of 2026. Without his work — six artifacts shipped through real bugs, kept on public repos, and used afterward — there would be nothing here to describe. Thanks go to him first, for the work, the discipline, and the receipts.
Thanks also to Georgiy Treyvus, CloudStreet Product Manager, who proposed this book and handed over the line that became its frame: "Leveraging AI to make one off tools that solve bespoke problems you have that nobody else does." That sentence is the spec. The rest of these pages are an attempt to read what's inside it.