
GPT-5 on LiveCodeBench: 81.7% accuracy
Author:
The LayerLens Team
Last updated:
Published:
The LayerLens Team covers AI model evaluations, benchmark analysis, and the evolving landscape of AI performance. For the latest independent evaluation data, explore Stratix.
Summary
GPT-5 from OpenAI scored 81.7 on LiveCodeBench, placing it second of 43 on this benchmark. This places the model in the high-tier band for LiveCodeBench. Production-deployable on this benchmark family with margin for prompt and judge variance.
Model details
Provider: OpenAI
Model key:
openai/gpt-5Context length: 400,000 tokens
License: Proprietary
Open weights: no
Benchmark methodology
Benchmark goal: Evaluate the LLM's ability to solve competitive programming problems, requiring logical reasoning, algorithm design, and coding skills. It assesses code generation correctness on a variety of algorithmic tasks.
Scoring metrics:
Pass Rate: Percentage of problems for which the LLM generated code that passed all test cases.
Execution time: Time taken to execute generated code, averaged over all passed problems.
Analysis
Key takeaways:
Demonstrates robust coding capabilities across different problem types.
Strong at code generation and problem solving, with limitations in efficiency and handling implicit constraints.
Can benefit from the addition of input validation, error handling, and optimization considerations.
Failure modes observed
Common failure modes:
Absence of error and edge case handling
Incomplete code generation logic
Inefficient/Suboptimal Solutions: Occasionally, the generated solutions, while correct, might not be the most efficient in terms of time or space complexity
Problems with mathematical reasoning when combined with code generation
Example: For the 'count Choco' problem, there is a lack of correct algorithm implementation, producing failure for test cases, and no code was generated for challenges with implicit constraints.
Secondary metrics
Readability score: 0.0
Toxicity score: 0.000
Ethics score: 0.000
Run this evaluation yourself
Stratix evaluates GPT-5 continuously across 11+ benchmarks. To replicate this LiveCodeBench evaluation on your own model, traces, or a different benchmark configuration, open the model in Stratix.
Source: Stratix evaluation 690352245d330c9a0a0eacbf. Updated 2025-10-30.