Securing AI Model Weights

May 30, 2024

As frontier AI models become more capable, protecting their model weights from theft and misuse becomes increasingly essential. The report, co-authored by Irregular CEO Dan Lahav with RAND, explores what it would take to protect model weights, the learnable parameters that encode the core intelligence of an AI, from a range of potential attackers.

The report identifies 38 meaningfully distinct attack vectors and explores potential attacker operational capacities (OC levels) ranging from opportunistic, financially driven criminals to highly resourced nation-state operations. They estimate the feasibility of each attack vector being executed by different categories of attackers.

The report defines five security levels (SL1-5) and recommends preliminary benchmark security systems that roughly achieve each level. This framework can help security teams in frontier AI organizations update their threat models and inform their security plans, while aiding policymakers in better understanding how to engage with AI organizations on security topics.