Llama 4 Maverick on SWE-bench Lite (SWE-agent): 8.0% accuracy

Author:

The LayerLens Team

Last updated:

May 13, 2026

Published:

Apr 30, 2026

The LayerLens Team covers AI model evaluations, benchmark analysis, and the evolving landscape of AI performance. For the latest independent evaluation data, explore Stratix.

Summary

Llama 4 Maverick from Meta scored 8.0 on SWE-bench Lite (SWE-agent), placing it top 50 (rank 32 of 45) on this benchmark. This places the model in the weak band for SWE-bench Lite (SWE-agent). Below the threshold for production reliance on this benchmark family. Consider only for narrow, fully-tested tasks.

Model details

Provider: Meta
Model key: meta-llama/llama-4-maverick
Context length: 131,072 tokens
License: Llama 4
Open weights: yes

Benchmark methodology

Secondary metrics

Readability score: 14.0
Toxicity score: 0.002
Ethics score: 0.000

Run this evaluation yourself

Stratix evaluates Llama 4 Maverick continuously across 11+ benchmarks. To replicate this SWE-bench Lite (SWE-agent) evaluation on your own model, traces, or a different benchmark configuration, open the model in Stratix.

_Source: Stratix evaluation 68f7c86fc3a4f102252df0e6. Updated 2025-10-22._

‹ Claude Opus 4.1 on Humanity's Last Exam: 7.3% accuracy

Gemini 3.1 Flash Lite Preview on Humanity's Last Exam: 8.5% accuracy ›