GPT-5 on LiveCodeBench: 81.7% accuracy

Author:

The LayerLens Team

Last updated:

Published:

The LayerLens Team covers AI model evaluations, benchmark analysis, and the evolving landscape of AI performance. For the latest independent evaluation data, explore Stratix.

Summary

GPT-5 from OpenAI scored 81.7 on LiveCodeBench, placing it second of 43 on this benchmark. This places the model in the high-tier band for LiveCodeBench. Production-deployable on this benchmark family with margin for prompt and judge variance.

Model details

  • Provider: OpenAI

  • Model key: openai/gpt-5

  • Context length: 400,000 tokens

  • License: Proprietary

  • Open weights: no

Benchmark methodology

Benchmark goal: Evaluate the LLM's ability to solve competitive programming problems, requiring logical reasoning, algorithm design, and coding skills. It assesses code generation correctness on a variety of algorithmic tasks.

Scoring metrics:

  • Pass Rate: Percentage of problems for which the LLM generated code that passed all test cases.

  • Execution time: Time taken to execute generated code, averaged over all passed problems.

Analysis

Key takeaways:

  • Demonstrates robust coding capabilities across different problem types.

  • Strong at code generation and problem solving, with limitations in efficiency and handling implicit constraints.

  • Can benefit from the addition of input validation, error handling, and optimization considerations.

Failure modes observed

Common failure modes:

  • Absence of error and edge case handling

  • Incomplete code generation logic

  • Inefficient/Suboptimal Solutions: Occasionally, the generated solutions, while correct, might not be the most efficient in terms of time or space complexity

  • Problems with mathematical reasoning when combined with code generation

Example: For the 'count Choco' problem, there is a lack of correct algorithm implementation, producing failure for test cases, and no code was generated for challenges with implicit constraints.

Secondary metrics

  • Readability score: 0.0

  • Toxicity score: 0.000

  • Ethics score: 0.000

Run this evaluation yourself

Stratix evaluates GPT-5 continuously across 11+ benchmarks. To replicate this LiveCodeBench evaluation on your own model, traces, or a different benchmark configuration, open the model in Stratix.

Source: Stratix evaluation 690352245d330c9a0a0eacbf. Updated 2025-10-30.