Gemini 3.1 Flash Lite Preview on LiveCodeBench: 69.9% accuracy

Author:

The LayerLens Team

Last updated:

May 13, 2026

Published:

Mar 5, 2026

The LayerLens Team covers AI model evaluations, benchmark analysis, and the evolving landscape of AI performance. For the latest independent evaluation data, explore Stratix.

Summary

Gemini 3.1 Flash Lite Preview from Google scored 69.9 on LiveCodeBench, placing it top 10 (rank 4 of 43) on this benchmark. This places the model in the competitive band for LiveCodeBench. Above the cost-effective threshold for most production workloads. Pair with a step-level evaluation harness for agent use cases.

Model details

Provider: Google
Model key: google/gemini-3.1-flash-lite-preview
Context length: 1,048,576 tokens
License: Proprietary
Open weights: no

Benchmark methodology

Benchmark goal: The benchmark is designed to evaluate the truthfulness of statements generated by large language models (LLMs) across a diverse range of categories, explicitly focusing on their tendency to hallucinate and their ability to stay truthful. It aims to identify areas where LLMs struggle to remain factual.

Scoring metrics:

Truthfulness Score: Calculated by averaging the truthfulness ratings (1, 0.5, 0) given by human evaluators for each statement.

Analysis

Key takeaways:

The Gemini 3.1 Flash Lite Preview model appears highly proficient in generating correct Python code for a broad range of standard algorithmic problems with small to moderate constraints.
While capable of outlining advanced theoretical approaches for complex mathematical and combinatorial problems, the model often resorts to simplified code implementations or placeholders when direct advanced API/library usage is absent or too complex.
The detailed explanations preceding some code solutions indicate a strong understanding of underlying algorithms and problem properties, even if the implementation is sometimes rudimentary.

Failure modes observed

Common failure modes:

When the requested output was not a straightforward code solution but involved identifying complex mathematical patterns or properties.
For several complex mathematical problems, the provided code snippets include detailed comments on the theoretical approach, sometimes concluding with a placeholder output or a simplification that might not fully address the problem's complexity.
In some cases, the model provides an explanation of the logic but does not fully implement the optimized solution, or the implemented simple solution would not pass under strict time constraints for larger N.

Example: For task 52, the solve() function comments indicate that NTT is needed for full solution and ultimately prints a hardcoded number. This implies the model correctly identified the required advanced mathematical tool but did not implement it, instead providing a placeholder.

Secondary metrics

Readability score: 0.0
Toxicity score: 0.000
Ethics score: 0.000

Run this evaluation yourself

Stratix evaluates Gemini 3.1 Flash Lite Preview continuously across 11+ benchmarks. To replicate this LiveCodeBench evaluation on your own model, traces, or a different benchmark configuration, open the model in Stratix.

Source: Stratix evaluation 69a743435a24fc3a34525bab. Updated 2026-03-03.

‹ Claude Opus 4.5 on AIME 2025: 63.3% accuracy

Claude Opus 4.6 on AIME 2025: 70.0% accuracy ›