Modelling Bench - Search News

10h

Google's Android coding tests reveal an unexpected Gemini 3.5 Flash weakness

Google's Gemini 3.5 Flash flunks the Android coding test by being slower, dumber, and three times more expensive than older ...

MSN on MSN

Microsoft unveiled MAI-Code-1-Flash, its first model that turns descriptions into working code

Software developers working with command-line tools and large codebases now have a new option from Microsoft: ...

Morning Overview on MSN

Google is teasing Gemini 3.5 Pro for this month after rushing its faster Flash model out first

Developers building AI-powered terminal tools and coding agents now face a split timeline from Google. The company released ...

Tech Times

MiniMax M3 Open-Weight Coding Model: Frontier Claims, Unverified Benchmarks

MiniMax M3 launched June 1, 2026 with a 1-million-token context window and company-reported SWE-Bench Pro scores that edge ...

Live Science

Scientists design new 'AGI benchmark' that indicates whether any future AI model could cause 'catastrophic harm'

OpenAI scientists have designed MLE-bench — a compilation of 75 extremely difficult tests that can assess whether a future advanced AI agent is capable of modifying its own code and improving itself.

VentureBeat

Arthur unveils Bench, an open-source AI model evaluator

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More New York City-based artificial intelligence (AI) startup Arthur has ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results