Anthropic is scaling Project Glasswing to 150 new organizations, providing access to Claude Mythos Preview for vulnerability detection. The expansion comes amid concerns about validation transparency and the race to secure critical infrastructure before offensive AI capabilities proliferate.
Single-turn safety benchmarks don't predict real-world vulnerability. Cisco's testing of 15 frontier models reveals that iterative attacks succeed up to 88% of the time—even against models that look secure in standard evaluations.
OpenAI's approach to running Codex safely isn't just about one code-generation model—it's a template for how AI labs must deploy increasingly capable systems. The three-layer framework they developed combines technical safeguards, operational controls, and external oversight in ways that scale beyond code generation.