GPT-5 on Terminal-Bench (Terminus-1): 33.8% accuracy

Author:

The LayerLens Team

Last updated:

May 13, 2026

Published:

Apr 1, 2026

The LayerLens Team covers AI model evaluations, benchmark analysis, and the evolving landscape of AI performance. For the latest independent evaluation data, explore Stratix.

Summary

GPT-5 from OpenAI scored 33.8 on Terminal-Bench (Terminus-1), placing it top 25 (rank 25 of 77) on this benchmark. This places the model in the weak band for Terminal-Bench (Terminus-1). Below the threshold for production reliance on this benchmark family. Consider only for narrow, fully-tested tasks.

Model details

Provider: OpenAI
Model key: openai/gpt-5
Context length: 400,000 tokens
License: Proprietary
Open weights: no

Benchmark methodology

Secondary metrics

Readability score: 0.0
Toxicity score: 0.000
Ethics score: 0.000

Run this evaluation yourself

Stratix evaluates GPT-5 continuously across 11+ benchmarks. To replicate this Terminal-Bench (Terminus-1) evaluation on your own model, traces, or a different benchmark configuration, open the model in Stratix.

Source: Stratix evaluation 690114d6c8c177224ec6f979. Updated 2025-10-28.

‹ Kimi K2.6 on BIRD-CRITIC: 33.3% accuracy

Claude Opus 4.6 on BIRD-CRITIC: 34.0% accuracy ›