GPT-5 (high) on Terminal-Bench (Terminus-1): 46.2% accuracy

Author:

The LayerLens Team

Last updated:

Published:

The LayerLens Team covers AI model evaluations, benchmark analysis, and the evolving landscape of AI performance. For the latest independent evaluation data, explore Stratix.

Summary

GPT-5 (high) from OpenAI scored 46.2 on Terminal-Bench (Terminus-1), placing it top 10 (rank 10 of 77) on this benchmark. This places the model in the below-frontier band for Terminal-Bench (Terminus-1). Acceptable for cost-sensitive workloads or as part of a multi-model ensemble. Not a default choice for high-stakes routing.

Model details

  • Provider: OpenAI

  • Model key: openai/gpt-5-high

  • Context length: 400,000 tokens

  • License: Proprietary

  • Open weights: no

Benchmark methodology

Secondary metrics

  • Readability score: 0.0

  • Toxicity score: 0.000

  • Ethics score: 0.000

Run this evaluation yourself

Stratix evaluates GPT-5 (high) continuously across 11+ benchmarks. To replicate this Terminal-Bench (Terminus-1) evaluation on your own model, traces, or a different benchmark configuration, open the model in Stratix.

_Source: Stratix evaluation 6900b74df8f82a4e876fe7a6. Updated 2025-10-28._