Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Bibliography and Sources

A consolidated list of references cited throughout the book, organized by topic. Where a paper or post is freely available, the link is included. The list is current as of May 2026; some URLs may move.

The Mythos disclosure and Project Glasswing

  • Anthropic, “Introducing Claude Mythos Preview,” red.anthropic.com, April 2026.
  • Anthropic, “Project Glasswing: Coordinated AI-Assisted Vulnerability Discovery,” April 2026.
  • Anthropic and Mozilla, joint Firefox vulnerability disclosure, May 2026.
  • FreeBSD Project, security advisory FreeBSD-SA-26:07.nfs (CVE-2026-4747), May 2026.
  • CETaS, “Mythos and the Capability Frontier: An Analysis of the Anthropic Disclosure,” April 2026.
  • IEEE Spectrum, “The Vulnerability-Finding Model Anthropic Won’t Release,” May 2026.

Foundational AI security literature

  • OWASP Foundation, “OWASP Top 10 for Large Language Model Applications,” 2025 edition. https://genai.owasp.org/llm-top-10/
  • Greshake, Abdelnabi, Mishra, Endres, Holz, Fritz, “Not what you’ve signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection,” AISec ’23.
  • Perez et al., “Ignore Previous Prompt: Attack Techniques For Language Models,” 2022.
  • Liu et al., “Prompt Injection Attack Against LLM-Integrated Applications,” 2024.
  • Bai et al., “Constitutional AI: Harmlessness from AI Feedback,” Anthropic, 2022.
  • Anil et al., “Many-shot Jailbreaking,” Anthropic, April 2024.
  • Russinovich, Salem, Eldan, “Great, Now Write an Article About That: The Crescendo Multi-Turn LLM Jailbreak Attack,” Microsoft, 2024.

Corpora and benchmarks

  • Schulhoff et al., “Ignore This Title and HackAPrompt,” EMNLP 2023.
  • Toyer et al., “Tensor Trust: Interpretable Prompt Injection Attacks from an Online Game,” ICLR 2024.
  • Chao et al., “JailbreakBench,” 2024 with periodic updates.
  • Zou et al., “Universal and Transferable Adversarial Attacks on Aligned Language Models,” 2023 (the GCG / AdvBench paper).

AI red-team tooling

AI-augmented code audit

Output handling and exfiltration

Confused-deputy framing

  • Norm Hardy, “The Confused Deputy: (or why capabilities might have been invented),” ACM SIGOPS Operating Systems Review, October 1988. Forty-year-old paper, still load-bearing.

Threat modeling and disclosure

  • Adam Shostack, Threat Modeling: Designing for Security, Wiley, 2014.
  • Google Project Zero disclosure policy.
  • CERT/CC Coordinated Vulnerability Disclosure Guide.
  • Project Glasswing, “Disclosure norms for AI-discovered vulnerabilities,” April 2026.
  • CISA, “Coordinated Vulnerability Disclosure: A Guide for Industry,” 2025 update.
  • NIST, “AI Risk Management Framework,” AI 100-1, 2023, with the 2024 generative-AI profile addendum.

Ongoing commentary

  • Simon Willison, simonwillison.net/series/prompt-injection/, the running series since 2022.
  • Bruce Schneier, schneier.com, the AI-and-security posts during 2025–2026.
  • Anthropic model cards and safety reports for the Claude 4.x family, 2025–2026.

Regulatory background

  • European Union AI Act, especially Article 52 and the GPAI provisions, with the 2025–2026 implementation timelines.
  • California SB 1047 successor legislation, in progress as of May 2026.
  • U.S. AI Safety Institute publications on coordinated disclosure.