Non-Adversarial Harm
There is a comforting narrative about information dysfunction: bad actors create misinformation, platform companies fail to moderate it, and unsuspecting users get deceived.
This narrative is comforting because it has clear villains, and clear villains imply clear solutions. Remove the bad actors. Moderate harder. Fact-check more aggressively.
The narrative is also, at best, half the story.
The more pervasive and harder-to-fix problem is not the misinformation that malicious actors inject into the information ecosystem. It is the distortion that well-intentioned systems produce as a side effect of doing exactly what they were designed to do.
No one is lying to you. The system is just optimizing for something that is not quite aligned with your actual needs, and the cumulative result of that slight misalignment is a profoundly warped picture of reality.
This chapter is about the harm that happens without anyone meaning it. It is, in many ways, more dangerous than the adversarial kind, because you cannot defend against it by being skeptical of sources. The source is the system itself, and the system is trying to help.
The Taxonomy of Good Intentions
To understand non-adversarial harm, it helps to distinguish it cleanly from the adversarial variety.
Adversarial harm involves a deliberate actor trying to deceive. State-sponsored disinformation campaigns. Scammers creating fake health advice to sell supplements. Ideologues manufacturing misleading statistics.
The information is wrong on purpose, and someone benefits from you believing it. The defense is source evaluation: who created this, why, and what evidence supports it?
Non-adversarial harm involves no deliberate deception at all. A recommendation algorithm surfaces increasingly extreme content because extremity generates engagement. A summarization system strips context from a nuanced claim, making it sound absolute. A search engine buries a correction because it has lower engagement metrics than the original error.
No one intended these outcomes. They emerge from systems that are faithfully executing their objectives — objectives that happen to produce harmful side effects at scale.
The distinction matters because the defenses are completely different. Source evaluation does not help when the source is a well-intentioned system at Google or OpenAI. Media literacy does not help when the distortion happens in the curation layer, not the content layer.
The content might be perfectly accurate; it is the selection and presentation that creates the distortion.
Here is an analogy. Adversarial harm is someone poisoning the water supply. Non-adversarial harm is the municipal water system having lead pipes.
The water utility is not trying to poison anyone. The system was built with the best available materials at the time. But the structural properties of the system produce harm regardless of intent, and “we did not mean to” does not make the lead less toxic.
Engagement Optimization: The Outrage Factory
The single most consequential form of non-adversarial harm is engagement optimization — systems designed to maximize the time and attention users spend on a platform.
Let me be clear: engagement optimization is not evil. A platform that cannot retain users cannot survive. The business logic is rational. And many engagement-optimizing features are genuinely helpful. Showing you content you are interested in is a service. Surfacing discussions in communities you care about is useful. Recommending articles related to your recent reading is convenient.
The problem is that engagement, as a metric, does not distinguish between content that serves your interests and content that exploits your vulnerabilities.
From the algorithm’s perspective, an article that makes you furious and an article that makes you informed generate similar engagement signals. You click on both. You spend time on both. You might even share both. The algorithm cannot tell which experience left you better off and which left you worse.
Actually, that understates the problem.
The algorithm can, in effect, tell the difference — outrage generates more engagement than information. When you are angry, you are more likely to comment, more likely to share, more likely to seek out additional content that fuels the anger. When you are informed, you are more likely to nod and move on.
The engagement metric does not just fail to distinguish between these two states; it systematically prefers the one that is worse for you.
This is how Facebook’s algorithm, in a well-documented internal study that leaked in 2021, systematically promoted divisive political content. Not because anyone at Facebook wanted to promote division. Because divisive content generated more engagement, the algorithm promoted content that generated engagement, and therefore the algorithm promoted divisive content.
The syllogism is airtight. The outcome is corrosive.
Twitter’s own research, published after the platform’s algorithmic timeline was introduced, found the same pattern. Politically right-leaning content was amplified more than left-leaning content in most countries studied — not because of a political agenda, but because that content generated more engagement in those markets.
The algorithm was politically neutral in its design and politically non-neutral in its effects. This is the essence of non-adversarial harm: neutral design, non-neutral outcomes.
The engagement optimization problem extends far beyond politics. In health information, engagement optimization promotes alarming claims over reassuring ones (alarm generates more clicks). In financial information, it promotes dramatic predictions over measured analysis (drama generates more shares). In science communication, it promotes surprising findings over careful replications (surprise generates more engagement).
Across every domain, the engagement gradient points away from the information that would actually serve you best.
Summarization and the Loss of Context
As AI systems increasingly summarize, condense, and abstract information for us, a new category of non-adversarial harm has emerged: the systematic loss of context.
A research paper concludes: “Under the specific conditions of our study, with the noted limitations in sample size and demographic representation, we observed a statistically significant but small effect that warrants further investigation.”
An AI summary renders this as: “Study finds significant effect.”
Both statements are technically accurate. One is useful. The other is misleading.
This is not a cherry-picked example. It is the normal, expected behavior of summarization systems. They are optimized to be concise, and context is the first casualty of concision.
The hedges, caveats, qualifications, and limitations that make a nuanced claim responsible are exactly the parts that summarization removes, because they are “unnecessary” from a compression standpoint.
Google’s featured snippets exhibit this problem constantly. A search for “is coffee good for you” might surface a snippet from an article that says “Coffee has been associated with numerous health benefits, including reduced risk of type 2 diabetes, Parkinson’s disease, and certain cancers.”
The full article goes on to discuss the caveats: the evidence is observational, the effects depend on the individual, excessive consumption has risks, and the benefits may be confounded by other lifestyle factors.
The snippet presents the conclusion. The article presents the reasoning. The snippet is what most people see.
LLM-generated summaries have the same problem with an additional twist: they synthesize across multiple sources, which means they can create statements that no single source actually makes.
If three papers each find a small, uncertain effect, an LLM summary might state that “research consistently shows” the effect exists. Each individual finding was appropriately hedged. The synthesis lost the hedges and amplified the signal. The summary is not wrong, exactly, but it overstates the certainty in a way that none of the source authors would endorse.
The irony is sharp: summarization systems are designed to help people cope with information overload, but they do so by stripping the context that makes information trustworthy.
The user gets a confident, clean answer. They miss the messy, uncertain reality that the confident answer was extracted from.
I have started a personal habit that I recommend: whenever an AI summary makes a strong claim, I ask myself, “What hedges did the summary probably remove?” The answer is almost always: the important ones.
Recommendation Systems and Information Monocultures
When everyone in a community gets their information from the same recommendation algorithm, the algorithm becomes a bottleneck — a single point of failure for the community’s collective understanding.
This is the information monoculture problem, and it works exactly like agricultural monocultures.
Plant one crop across an entire region and you get efficiency: optimized planting schedules, standardized harvesting equipment, economies of scale. You also get catastrophic vulnerability: a single disease can wipe out the entire harvest, because there is no genetic diversity to provide resilience.
Information monocultures work the same way. When a development team all uses the same search engine and reads the same top-ranked results, they converge on the same understanding of every technical question.
This feels like consensus — everyone agrees! — but it is actually an artifact of shared curation. They agree because they all read the same algorithmically-selected sources, not because they independently evaluated the evidence.
The danger emerges when the algorithm is wrong. If the top-ranked answer to a common coding question contains a subtle error, every developer who searches for that question and follows the top result will embed that error in their code.
I have seen this happen with Stack Overflow answers that are technically incorrect but highly upvoted — they propagate through codebases like a virus, because the recommendation system has made them the canonical answer.
This is not hypothetical. Security researchers have documented cases where insecure code patterns were the top-ranked answers for common programming questions. Developers who searched for how to implement authentication, encrypt data, or validate input found answers that were highly ranked (because they were popular) and subtly wrong (because security is hard and the popular answer was the easy-but-flawed one).
The recommendation system did not create the insecure code. But it amplified it, gave it authority, and distributed it to everyone who asked the question.
The information monoculture problem is particularly severe in specialized professional communities. When every financial analyst uses the same Bloomberg terminal, every doctor uses the same UpToDate database, every lawyer uses the same Westlaw search, the curation choices of these platforms become the invisible architecture of professional knowledge.
If Bloomberg under-indexes emerging market data, financial analysts collectively underweight emerging markets. If UpToDate is slow to incorporate new treatment evidence, doctors collectively lag behind the research. If Westlaw’s search algorithm favors federal over state cases, lawyers collectively under-cite state precedent.
The platforms are not doing anything wrong. They are providing useful, curated access to vast information. But the monoculture means that their limitations become the entire profession’s limitations, and their blind spots become everyone’s blind spots simultaneously.
Diversity of information sources is not just a nice-to-have. It is a structural requirement for resilient decision-making.
When your team all reads the same algorithm-curated feed, you do not have five independent perspectives — you have one perspective, held by five people who mistakenly believe they arrived at it independently.
YouTube’s Rabbit Holes: A Case Study in Drift
YouTube’s recommendation algorithm is perhaps the most studied example of non-adversarial harm, and it illustrates a pattern worth understanding in detail: recommendation drift.
You watch a video about basic home electrical repair. YouTube recommends a video about more advanced electrical work. You watch that. It recommends a video about off-grid electrical systems. You watch that. It recommends a video about government regulations being a scam designed to keep you dependent on the power grid.
And somehow, in the space of an hour, you have gone from “how to replace a light switch” to “the government is conspiring against self-sufficient citizens.”
No individual recommendation was unreasonable. Each video was plausibly related to the previous one. The drift from mainstream to fringe happened gradually, through a series of small steps that each made sense locally.
But the trajectory was not random — it was shaped by engagement optimization. At each step, the algorithm chose the next video that would maximize your probability of continuing to watch, and slightly edgier content is slightly more engaging than mainstream content, and the compound effect of many slightly-edgier steps is a journey to the fringe.
YouTube has acknowledged this problem and taken steps to address it. In 2019, they announced changes to reduce recommendations of “borderline content” — material that approaches but does not cross their policy lines.
These changes helped, but they also illustrate the fundamental difficulty: the algorithm’s natural gradient points toward engaging extremity, and corrective measures require ongoing, active intervention against the algorithm’s own optimization pressure.
The rabbit hole pattern is not unique to YouTube. Spotify’s recommendation system can drift from popular music to increasingly obscure and niche content — which might be great for musical discovery or might lead you into a weird echo chamber of algorithmically-promoted low-quality content.
Amazon’s recommendation system can drift from a useful product to an entire ecosystem of dubious products in the same niche. TikTok’s algorithm can drift from entertaining content to content that is addictive and mood-altering.
In each case, the drift is a natural consequence of the recommendation algorithm doing exactly what it is designed to do. The harm is non-adversarial — it is a side effect of optimization, not a goal.
What makes recommendation drift particularly insidious is that it feels like your own journey. You chose to click each link, watch each video, read each article. The algorithm merely suggested; you decided.
This creates an illusion of agency that obscures the degree to which the algorithm shaped your path. You were driving, but the algorithm was building the road.
Answer Boxes and the Death of the Click-Through
Google’s answer boxes — those highlighted panels at the top of search results that attempt to directly answer your question — represent a fascinating case of non-adversarial harm through helpfulness.
The intent is genuinely good: save the user time by answering their question without requiring them to visit a website. For simple factual queries, this works beautifully. “What year was the Treaty of Westphalia signed?” The answer box says 1648, you move on with your life, everyone is happy.
The harm emerges for complex questions that do not have simple answers but that the answer box presents as if they do.
“Is intermittent fasting healthy?” The answer box might surface a paragraph from a health website that says yes, with qualifications. Or it might surface a paragraph that says no, with qualifications. Either way, the user gets a single framed answer to a question that the scientific literature debates extensively, and most users will not click through to explore that debate.
Research on user behavior consistently shows that featured snippets and answer boxes significantly reduce click-through rates to the underlying sources. Users trust the box. They read the box. They move on.
The sources that the box extracted from — the sources that contain the full context, the caveats, the competing evidence — see their traffic decline. Over time, this undermines the incentive to create nuanced, detailed content, because the reward (traffic) increasingly goes to content that is snippet-friendly rather than content that is thorough.
The answer box is, in effect, a summarization system with the same context-stripping problems discussed earlier, but with the additional property that it sits at the very top of the information funnel. It is the first thing you see, and for many users, it is the only thing you see.
Its distortions are not buried in the middle of a report; they are the headline.
This is a case where helpfulness and harm are genuinely difficult to disentangle. The answer box saves millions of people time every day. It also gives millions of people a distorted view of complex topics every day.
Whether the trade-off is worth it depends on what you value more: efficiency or accuracy. And the system does not ask you, because the system was designed by engineers who (reasonably) optimized for efficiency.
LLMs and the Authority Problem
Large language models have introduced a new flavor of non-adversarial harm that deserves its own examination: the authority of fluency.
When an LLM responds to your question, it produces grammatically correct, well-structured, confident prose regardless of whether the content is accurate.
This is because fluency and accuracy are independent properties, and the model’s training process optimizes heavily for fluency. A model that produces awkward, halting, uncertain text will be rated poorly by users, even if it is more accurate. So the training pushes toward confident, smooth output.
The result is that an LLM’s response to a question it has good training data for and its response to a question it has poor training data for sound exactly the same.
There is no stutter in the prose when the model is uncertain. There is no hedge when the model is interpolating between incompatible sources. There is no disclaimer when the model is generating plausible-sounding content that it has no real basis for.
This is profoundly different from human conversation. When you ask a human expert a question outside their expertise, you get signals: hesitation, qualifications, “I think maybe,” “you should really ask someone who specializes in this.”
These signals are information. They tell you how much to trust the answer. LLMs strip these signals away, replacing them with the uniform confidence of well-generated text.
I have personally watched LLMs generate completely fabricated citations — papers that do not exist, by real authors, with plausible titles, in real journals.
The fabrication was not a malfunction. It was the model doing what it does: generating text that fits the pattern. Academic citations have a pattern. The model learned the pattern. It generated text that fit the pattern. The text happened to refer to things that do not exist.
The non-adversarial nature of this harm is important. The LLM is not trying to deceive you. It does not have intentions. It is a text completion system that has been refined through human feedback, and the feedback consistently rewards confident, helpful responses.
When you ask it for a citation and it provides a fabricated one, it is doing exactly what it was trained to do: provide a helpful, confident response. The harm is a side effect of the training objective, not a goal.
This authority problem compounds with every layer of AI assistance in the information chain. When an LLM summarizes articles found by a search engine that ranks by engagement, and then presents the summary with uniform confidence, you are three layers deep in non-adversarial distortion.
The search engine biased the source selection. The summarization stripped the context. And the LLM presentation eliminated any signal of uncertainty.
Each layer was trying to help. The compound effect is a confident, authoritative, potentially misleading answer.
Soft Censorship: Making Information Invisible
There is a form of information suppression that involves no removal, no blocking, no censorship in the traditional sense. It is simply the algorithmic de-prioritization of content to the point where it is functionally invisible.
This is what I call soft censorship, and it is the most subtle form of non-adversarial harm.
Consider a search engine that returns ten pages of results for a query. Studies consistently show that the vast majority of users never go past the first page, and most clicks go to the top three results.
Content on page five might as well not exist.
If the algorithm places a piece of content on page five, it has not censored it — it is there, you can find it, no one is hiding anything — but it has made it effectively invisible to almost everyone who searches for the topic.
Soft censorship becomes harmful when the reasons for de-prioritization are systematically correlated with the value of the content. And as we saw in the previous chapter, they often are.
New content is de-prioritized because it lacks engagement data. Contrarian content is de-prioritized because it generates less engagement than consensus content. Specialized content is de-prioritized because it has a smaller audience. Nuanced content is de-prioritized because it is less snippet-friendly than simple content.
The effect is that the information most likely to challenge your existing understanding is also the information most likely to be on page five. The information most likely to change your mind is the information you will never see.
This is not censorship in any traditional or legal sense. No one decided to suppress this content. No policy was enacted. No human reviewer flagged it. The algorithm simply assigned it a lower relevance score, for reasons that are individually reasonable and collectively corrosive.
The content is not forbidden. It is just invisible.
Social media platforms exercise soft censorship constantly through their feed algorithms. When Facebook decides to show you this post instead of that post, it is making an editorial decision — but it is making it through an algorithm rather than a human editor, which means it is making millions of editorial decisions per second, at a scale no human editorial process could match, with no editorial review, no editorial standards, and no accountability for the editorial consequences.
The platforms would object to the word “editorial,” and they have a point — no human is making these decisions. But the functional effect is editorial.
Content is selected, prioritized, and presented according to criteria that shape what the audience sees. The fact that the criteria are algorithmic rather than human does not change the effect on the audience. It just makes the effect harder to scrutinize.
Why Good Intentions Do Not Protect Against Systemic Effects
Every system described in this chapter was built by people trying to help. Google’s engineers want you to find useful information. Facebook’s engineers want you to enjoy the platform. YouTube’s engineers want you to discover content you love. OpenAI’s engineers want their models to be helpful and accurate.
The intentions are good.
But intentions are local and effects are systemic.
An engineer designing a recommendation algorithm thinks about individual recommendations: “Is this a good next video for this user?” The systemic effect — millions of users being incrementally drifted toward extreme content — is not visible at the level of individual recommendations.
Each recommendation is reasonable. The pattern is harmful. The engineer cannot see the pattern from inside the system.
This is a general principle that extends far beyond technology. The architect of a highway does not intend to destroy a neighborhood. The designer of a financial product does not intend to create systemic risk. The developer of an antibiotic does not intend to create resistant bacteria.
But the systemic effects emerge regardless of intent, because complex systems produce emergent behaviors that no individual component was designed to create.
In the case of AI curation systems, the systemic effects include:
- Homogenization of knowledge: Everyone sees the same algorithmically-selected top results, creating a false consensus that feels organic but is actually curated.
- Amplification of engagement-bait: Content optimized for clicks outperforms content optimized for accuracy, shifting the entire information ecosystem toward sensationalism.
- Erosion of context: Summarization and snippetization strip the nuance from complex topics, creating a population that has opinions without understanding.
- Invisible narrowing: Personalization gradually restricts the information each user sees, without the user noticing or consenting to the restriction.
- Authority without accountability: AI systems make curatorial decisions that shape public understanding, without the editorial accountability that traditional curators (editors, librarians, teachers) accept as part of their role.
Good intentions do not protect against any of these effects. They are emergent properties of systems, not choices made by individuals. And they require systemic responses — changes to incentive structures, design principles, and regulatory frameworks — not just better intentions.
What This Means for Your Practice
Understanding non-adversarial harm is not about becoming paranoid or abandoning AI tools. It is about adjusting your relationship with those tools to account for their structural tendencies.
Assume engagement optimization is distorting your feeds. Whatever platform you use for news, professional information, or social updates, the algorithm is biased toward content that provokes engagement rather than content that informs.
Compensate by actively seeking out content that is useful but boring — the dry analysis, the careful methodology, the measured assessment. If it does not provoke an emotional reaction, it is probably closer to the truth.
Treat summaries as starting points, not conclusions. When an AI summary or a search snippet gives you an answer, treat it as a hypothesis to investigate, not a fact to accept. Click through to the source. Read the methodology section. Look for the hedges that the summary removed.
The summary is a pointer to information, not the information itself.
Diversify your information sources deliberately. If your team all uses the same tools and reads the same feeds, you have an information monoculture. Introduce diversity by assigning different team members to different sources, rotating your own source list periodically, and explicitly seeking out perspectives from outside your usual ecosystem.
Watch for recommendation drift. When you notice yourself going deeper into a topic through algorithm-recommended content, pause and ask whether the trajectory is serving your needs or the algorithm’s engagement metrics.
If you started looking for home repair advice and are now watching videos about government conspiracies, the algorithm has drifted you. Back up and start a fresh search.
Distinguish between AI confidence and AI accuracy. An LLM’s confident tone tells you nothing about the accuracy of its content. Develop the habit of treating LLM outputs with the same skepticism you would apply to a confident stranger at a cocktail party — they might be right, they might be wrong, and you cannot tell which from their tone of voice.
Remember that the most dangerous distortions are the ones you cannot see. Adversarial misinformation is at least visible once identified — you can fact-check it, debunk it, flag it. Non-adversarial harm is woven into the fabric of how information reaches you.
You cannot fact-check an omission. You cannot debunk a distortion you never noticed. The best you can do is understand the mechanisms and actively compensate for them.
The information systems we use every day are not neutral channels through which truth flows to us unimpeded. They are active participants in shaping what we know, what we believe, and what we consider important.
They do this without malice, without agenda, and without accountability. The harm they cause is real, systemic, and — now that you understand the mechanisms — at least partially addressable.
The first step is accepting that the system is not your ally. It is not your enemy either. It is a machine, doing what machines do: optimizing for its objective function. Your job is to make sure its objective function is not the only one being served.