Mar 18, 2025

Benchmark Spotlight: How Well Do AI Models Guard Against Dangerous Knowledge?

Benchmark Spotlight: How Well Do AI Models Guard Against Dangerous Knowledge?

Benchmark Spotlight: How Well Do AI Models Guard Against Dangerous Knowledge?

AI safety is an ongoing challenge—one that becomes more urgent as models grow more powerful. This week, we are highlighting a critical yet often overlooked benchmark: the Weapons of Mass Destruction Proxy (WMDP).

What Is the WMD Proxy Benchmark?

Developed by the Center for AI Safety (founded by the creator of MMLU) and Scale AI, the Weapons of Mass Destruction Proxy (WMDP) benchmark measures an AI model’s knowledge of potentially harmful topics, including:

  • Nuclear weapons

  • Bioweapons

  • Cyber warfare

This benchmark consists of multiple-choice questions designed to test whether a model understands these subjects—knowledge that could be misused in the wrong hands. While originally released last year, WMDP has remained relatively unnoticed in AI safety discussions.

At LayerLens, we recently onboarded this benchmark and tested some of today’s most advanced AI models.

Early Results: How Do Leading AI Models Perform? We ran the WMDP benchmark on Claude 3.7 Sonnet and Grok 2, two state-of-the-art models. The results were unexpected.

Accuracy Scores

  • Claude 3.7 Sonnet: Over 70%

  • Grok 2: Over 70%


Results of Claude 3.7 Sonnet and Grok 2


Example of a question and answer from Claude 3.7 Sonnet


Key Takeaways:

  • Higher Accuracy Is Not Necessarily Better – Unlike most AI benchmarks, where higher accuracy signals better performance, WMDP highlights models that lack proper guardrails. A high score suggests that a model freely answers sensitive questions instead of refusing to engage.

  • Unlearning Is a Proposed Solution – The authors of WMDP suggest that AI models should undergo unlearning, a technique that removes knowledge of harmful content while maintaining general capabilities. A model that has undergone unlearning should score below 40% on this benchmark.

  • No Guardrails in Place? – Both Claude 3.7 Sonnet and Grok 2 answered every question in the dataset, scoring far above the 40% threshold. This suggests that, despite their safety measures, these models have not undergone substantial unlearning and still possess significant knowledge of hazardous topics.

Why This Matters

As AI models become more sophisticated, their knowledge boundaries must be carefully managed. If a model can explain how to synthesize a bioweapon or bypass cybersecurity protections, does that represent a technical achievement or a security risk?

At LayerLens, we believe that AI safety is just as critical as performance. That is why we are expanding our suite of safety benchmarks and providing deeper analysis into how well models enforce ethical and security guardrails.

What’s Next?

  • Expanding safety-focused evaluations

  • Increasing transparency on AI risk exposure

  • Providing insights for organizations deploying AI models

LayerLens is committed to ensuring that the AI community not only builds powerful models but also responsible ones. As safety concerns become more pressing, benchmarks like WMDP will be essential in shaping AI policy, research, and deployment strategies.

If you are working on AI model evaluation and safety, reach out—we would love to collaborate.

Let’s Redefine AI Benchmarking Together

AI performance measurement needs precision, transparency, and reliability—that’s what we deliver. Whether you’re a researcher, developer, enterprise leader, or journalist, we’d love to connect.

Let’s Redefine AI Benchmarking Together

AI performance measurement needs precision, transparency, and reliability—that’s what we deliver. Whether you’re a researcher, developer, enterprise leader, or journalist, we’d love to connect.

Let’s Redefine AI Benchmarking Together

AI performance measurement needs precision, transparency, and reliability—that’s what we deliver. Whether you’re a researcher, developer, enterprise leader, or journalist, we’d love to connect.

Stay Ahead — Subscribe to Our Newsletter

By clicking the button you consent to processing of your personal data

© Copyright 2025, All Rights Reserved by LayerLens

Stay Ahead — Subscribe to Our Newsletter

By clicking the button you consent to processing of your personal data

© Copyright 2025, All Rights Reserved by LayerLens

Stay Ahead — Subscribe to Our Newsletter

By clicking the button you consent to processing of your personal data

© Copyright 2025, All Rights Reserved by LayerLens