Bibliography and Sources

A consolidated list of references cited throughout the book, organized by topic. Where a paper or post is freely available, the link is included. The list is current as of May 2026; some URLs may move.

The Mythos disclosure and Project Glasswing

Anthropic, “Introducing Claude Mythos Preview,” red.anthropic.com, April 2026.
Anthropic, “Project Glasswing: Coordinated AI-Assisted Vulnerability Discovery,” April 2026.
Anthropic and Mozilla, joint Firefox vulnerability disclosure, May 2026.
FreeBSD Project, security advisory FreeBSD-SA-26:07.nfs (CVE-2026-4747), May 2026.
CETaS, “Mythos and the Capability Frontier: An Analysis of the Anthropic Disclosure,” April 2026.
IEEE Spectrum, “The Vulnerability-Finding Model Anthropic Won’t Release,” May 2026.

Foundational AI security literature

OWASP Foundation, “OWASP Top 10 for Large Language Model Applications,” 2025 edition. https://genai.owasp.org/llm-top-10/
Greshake, Abdelnabi, Mishra, Endres, Holz, Fritz, “Not what you’ve signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection,” AISec ’23.
Perez et al., “Ignore Previous Prompt: Attack Techniques For Language Models,” 2022.
Liu et al., “Prompt Injection Attack Against LLM-Integrated Applications,” 2024.
Bai et al., “Constitutional AI: Harmlessness from AI Feedback,” Anthropic, 2022.
Anil et al., “Many-shot Jailbreaking,” Anthropic, April 2024.
Russinovich, Salem, Eldan, “Great, Now Write an Article About That: The Crescendo Multi-Turn LLM Jailbreak Attack,” Microsoft, 2024.

Corpora and benchmarks

Schulhoff et al., “Ignore This Title and HackAPrompt,” EMNLP 2023.
Toyer et al., “Tensor Trust: Interpretable Prompt Injection Attacks from an Online Game,” ICLR 2024.
Chao et al., “JailbreakBench,” 2024 with periodic updates.
Zou et al., “Universal and Transferable Adversarial Attacks on Aligned Language Models,” 2023 (the GCG / AdvBench paper).

AI red-team tooling

DeepTeam: https://github.com/confident-ai/deepteam
garak: https://github.com/NVIDIA/garak; Derczynski et al., “garak: A Framework for Security Probing Large Language Models,” 2024.
PyRIT: https://github.com/Azure/PyRIT

AI-augmented code audit

Anthropic, Claude Code documentation: https://docs.claude.com/claude-code
OpenAI, Codex CLI documentation, 2025–2026.
Yang et al., “SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering,” NeurIPS 2024.
OpenHands (formerly OpenDevin): https://docs.all-hands.dev
Aider: https://aider.chat
Google Project Zero, “Big Sleep” series, 2024–2025: https://googleprojectzero.blogspot.com
Various 2025 papers on LLM-assisted vulnerability discovery: AutoCodeRover (Zhang et al.), RepoAudit, the Patchwork tooling papers.

Output handling and exfiltration

Microsoft, “EchoLeak” disclosure (CVE-2025-32711), Microsoft 365 Copilot, June 2025.
Johann Rehberger, Embrace the Red, ongoing series at https://embracethered.com.
Riley Goodside, threads on Unicode tag-character smuggling and visual prompt injection, 2024–2025.
Mozilla, Content Security Policy reference: https://developer.mozilla.org/en-US/docs/Web/HTTP/CSP

Confused-deputy framing

Norm Hardy, “The Confused Deputy: (or why capabilities might have been invented),” ACM SIGOPS Operating Systems Review, October 1988. Forty-year-old paper, still load-bearing.

Threat modeling and disclosure

Adam Shostack, Threat Modeling: Designing for Security, Wiley, 2014.
Google Project Zero disclosure policy.
CERT/CC Coordinated Vulnerability Disclosure Guide.
Project Glasswing, “Disclosure norms for AI-discovered vulnerabilities,” April 2026.
CISA, “Coordinated Vulnerability Disclosure: A Guide for Industry,” 2025 update.
NIST, “AI Risk Management Framework,” AI 100-1, 2023, with the 2024 generative-AI profile addendum.

Ongoing commentary

Simon Willison, simonwillison.net/series/prompt-injection/, the running series since 2022.
Bruce Schneier, schneier.com, the AI-and-security posts during 2025–2026.
Anthropic model cards and safety reports for the Claude 4.x family, 2025–2026.

Regulatory background

European Union AI Act, especially Article 52 and the GPAI provisions, with the 2025–2026 implementation timelines.
California SB 1047 successor legislation, in progress as of May 2026.
U.S. AI Safety Institute publications on coordinated disclosure.

Keyboard shortcuts

AI Red Teaming