Read on app Read on app
✕
Prayer Times
  • Morocco
  • Lifestyle
  • Western Sahara
  • Login
Morocco World News
  • Home
  • Culture
  • Politics
  • Society
  • Economy
  • Opinion
  • Education
  • Sustainability
  • Tech
  • Sport
  • GITEX 2026
No Result
View All Result
Morocco World News
  • Home
  • Culture
  • Politics
  • Society
  • Economy
  • Opinion
  • Education
  • Sustainability
  • Tech
  • Sport
  • GITEX 2026
No Result
View All Result
Morocco World News

Home > Headlines > BridgeBench Shows Top AI Models at 10% Accuracy Despite Strong Reasoning

BridgeBench Shows Top AI Models at 10% Accuracy Despite Strong Reasoning

The results suggest that even when leading AI models are wrong, they can still produce highly convincing explanations, as reflected in consistently high evidence scores.

Oumaima Moho AmerbyOumaima Moho Amer
Apr, 17, 2026
0 0
A A
AI models

AI models

Follow the latest news from Morocco World News

Join on WhatsApp Join on Telegram

Casablanca – BridgeBench, a new benchmarking project focused on AI reasoning, has released a ranking that exposes a gap between how confidently models explain answers and how often those answers are correct.

The benchmark tests models on reasoning-heavy tasks and  scores them across three metrics. Accuracy measures whether the final answer is correct. Evidence evaluates how well the model supports its reasoning with verifiable steps or sources. The overall score combines both, aiming to reward systems that not only answer, but also justify.

In the latest results, xAI’s Grok 4.20 Reasoning model ranks first with a score of 41.8. It records 10.0% accuracy and 89.7% on evidence. OpenAI’s GPT-5.4 follows closely with a score of 40.6, matching the same 10.0% accuracy and slightly stronger evidence at 90.6%.

Anthropic’s Claude Opus 4.7 comes third at 40.3, but with lower accuracy at 6.7%, offset by the highest evidence score among the top models at 91.3%.

Read also: Google Launches AI-powered Desktop Search App for Windows

In fourth place is Grok 4.20, the non-reasoning version, scoring 40.0 with 6.7% accuracy and 89.9% evidence. Claude Opus 4.6 rounds out the top five with a score of 39.6, posting 10.0% accuracy and 86.1% evidence.

Further down, Google’s Gemini 3.1 Pro ranks 15th with a score of 34.3. Its accuracy drops sharply to 3.3%, despite an evidence score of 89.1%.

What makes the ranking striking is not who leads, but how low the accuracy remains across all models. Even the top systems only answer correctly about one in ten times.

At the same time, their evidence scores are consistently high, raising questions about what exactly is being measured. If models can produce convincing chains of reasoning while still being wrong most of the time, the benchmark may be capturing fluency more than reliability.

Morocco World News is also on X — check out our latest posts now! Get MWN on iOS and Android for instant access to breaking news.

Tags: AIchatgptClaudegeminiGrok
TweetShareShareSendShareScan

Recent News

Fouzi Lekjaa, head of the Royal Moroccan Football Federation (FRMF), met Nuevo León Governor Samuel Alejandro García Sepúlveda in Monterrey, the city set to host the Round of 32 encounter. 

Lekjaa Meets Nuevo León Governor Ahead of Morocco-Netherlands Clash

June 29, 2026
Seventy-two matches, hundreds of chances and plenty of drama later, the 2026 World Cup group stage has finally come to an end. 

World Cup 2026: Group Stage Delivers Goals, Surprises, and Early Casualties

June 29, 2026
Japan entered Asian history after their midfielder Kaishu Sano scored a goal in the first half against Brazil.

Japan Equals South Korea’s Scoring Record at the FIFA World Cup

June 29, 2026
Germany vs Paraguay

Germany, Paraguay Confirm Lineups for 2026 World Cup Knockout Clash

June 29, 2026
Oracle Opens Second Morocco R&D Center in Agadir to Advance AI, Cloud Innovation

Oracle Opens Second Morocco R&D Center in Agadir to Advance AI, Cloud Innovation

June 29, 2026

USEFUL LINKS

  • About
  • Privacy Policy
  • Contact
  • Careers
  • Terms Of Use
  • Cookies Policy

TOPICS

  • Mawazine 2025
  • Environment
  • Politics
  • Lifestyle
  • Sports
  • Western Sahara

REGIONS

  • International
  • Maghreb
  • Middle East
  • Africa

Download our App


Download the Morocco World News app on Google Play for Android

Download the Morocco World News app on the Apple App Store for iPhone and iPad

Copyright 2026 Morocco World News. All rights reserved. Morocco World News is not responsible for the content of external sites.
Read about our approach to external linking.

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In
  • Login
No Result
View All Result
  • Home
  • Culture
  • Politics
  • Society
  • Economy
  • Opinion
  • Education
  • Sustainability
  • Tech
  • Sport
  • GITEX 2026

Useful Links

  • Prayer Times

Useful Links:

  • Prayer Times

All Right Reserved © 2025 Morocco World News .

Contact us
Are you sure want to unlock this post?
Unlock left : 0
Are you sure want to cancel subscription?