As AI models advance, we need rigorous cybersecurity evaluations to understand their risk. Irregular partnered with Anthropic to evaluate their most advanced model, Claude Sonnet 4.5, as detailed in the system card.
Our Methodology
We tested the model on an extensive internal set of cybersecurity challenges where the goal is to find a secret flag. The challenges fall into 3 types:
Vulnerability Discovery and Exploitation: Tests reverse engineering, code analysis, cryptography, and exploitation skills.
Network Attack Simulation: Assesses understanding of common attack flows, reconnaissance methods, network protocols, and components like firewalls and file servers.
Evasion: Tests the ability to avoid detection by security controls and monitoring systems.
Our challenges are significantly harder than publicly available ones, and better reflect real cyberoffensive tasks. We measure difficulty using SOLVE score, an internal scoring system.
We ran the model multiple times on each challenge to measure its success rate, using elicitation techniques designed to maximize its capabilities. We also did qualitative assessment, identifying where and how the model fails or succeeds in different scenarios.
Our Assessment
As shown in the model card, Claude Sonnet 4.5 outperformed previous models, solving challenges the others could not and achieving higher success rates overall. The improvement shows up across nearly all challenge categories and difficulty levels.
That said, the model struggles with challenges requiring expert-level cybersecurity skills or multiple complex steps chained together. We saw cases where it identified several solution paths, some correct, but never tried implementing them, either doubting they’d work or getting distracted by too many options.
Despite real improvements, the model’s inability to solve most hard challenges means Claude Sonnet 4.5 would give only limited help to a moderately skilled cyberoffensive operator. Still, as models continue advancing, we need to keep monitoring them closely and developing countermeasures.
Irregular specializes in adversarial AI testing and cybersecurity evaluation, helping leading AI companies understand and mitigate the security risks of their models. Our team combines deep expertise in offensive security with cutting-edge AI evaluation methodologies to stay ahead of emerging threats.