Artificial intelligence (AI) continues to show unprecedented advances in its capabilities and reasoning, although these progressions have also been accompanied by a disturbing phenomenon: the increase in hallucination rates in some models.
In particular, the latest models from OpenAI, such as the o3 and the o4-mini, have recorded significantly high hallucination rates, which sharply contrasts with the developments of competitors like Google, whose Gemini models maintain rates below 1%.
A test shows that OpenAI’s o3 model has a hallucination rate of 6.8%
The recently updated ‘Hallucination Leaderboard’ benchmark reveals that some of the most advanced models have not only achieved a low hallucination rate but also contradict the idea that greater technological advancement leads to higher rates of these indices.
For example, models like Google Gemini-2.0-Flash-001 and Vectara Mockingbird-2-Echo have shown hallucination rates of 0.7% and 0.9%, respectively, while OpenAI’s o3 model has an alarming rate of 6.8% according to Professor Ethan Mollick from Wharton.
This situation presents a dilemma in AI innovation: the models that possess greater reasoning capabilities are, ironically, the ones that face significant difficulties in reliability and accuracy.
Despite the fact that OpenAI has been a pioneer in creating sophisticated AI technologies, the challenge remains to balance performance with accuracy. As companies compete to achieve artificial general intelligence (AGI), this race has led to more powerful developments, although reliability remains a critical component yet to be resolved.
OpenAI is aware of these issues and is working to correct the high rates of hallucination in its systems. This acknowledgment could be key to ensuring that advancements in capabilities do not compromise the quality of its models in the future, thus emphasizing the importance of reliability in the evolution of generative AI.