GPT-5 on LiveCodeBench: 81.7% accuracy

Author:

The LayerLens Team

Last updated:

May 13, 2026

Published:

Feb 28, 2026

The LayerLens Team covers AI model evaluations, benchmark analysis, and the evolving landscape of AI performance. For the latest independent evaluation data, explore Stratix.

Summary

GPT-5 from OpenAI scored 81.7 on LiveCodeBench, placing it second of 43 on this benchmark. This places the model in the high-tier band for LiveCodeBench. Production-deployable on this benchmark family with margin for prompt and judge variance.

Model details

Provider: OpenAI
Model key: openai/gpt-5
Context length: 400,000 tokens
License: Proprietary
Open weights: no

Benchmark methodology

Benchmark goal: Evaluate the LLM's ability to solve competitive programming problems, requiring logical reasoning, algorithm design, and coding skills. It assesses code generation correctness on a variety of algorithmic tasks.

Scoring metrics:

Pass Rate: Percentage of problems for which the LLM generated code that passed all test cases.
Execution time: Time taken to execute generated code, averaged over all passed problems.

Analysis

Key takeaways:

Demonstrates robust coding capabilities across different problem types.
Strong at code generation and problem solving, with limitations in efficiency and handling implicit constraints.
Can benefit from the addition of input validation, error handling, and optimization considerations.

Failure modes observed

Common failure modes:

Absence of error and edge case handling
Incomplete code generation logic
Inefficient/Suboptimal Solutions: Occasionally, the generated solutions, while correct, might not be the most efficient in terms of time or space complexity
Problems with mathematical reasoning when combined with code generation

Example: For the 'count Choco' problem, there is a lack of correct algorithm implementation, producing failure for test cases, and no code was generated for challenges with implicit constraints.

Secondary metrics

Readability score: 0.0
Toxicity score: 0.000
Ethics score: 0.000

Run this evaluation yourself

Stratix evaluates GPT-5 continuously across 11+ benchmarks. To replicate this LiveCodeBench evaluation on your own model, traces, or a different benchmark configuration, open the model in Stratix.

Source: Stratix evaluation 690352245d330c9a0a0eacbf. Updated 2025-10-30.

‹ Claude Opus 4.5 on LiveCodeBench: 76.8% accuracy

Claude Opus 4.7 on AIME 2026: 90.0% accuracy ›