Skip to main content

Reliability issues in large language models explored

Researchers Examine Accuracy and Transparency of Leading AI Chatbots: A Closer Look

Introduction to the Study

Performance of a selection of GPT and LLaMA models

Researchers from the Universitat Politecnica de Valencia in Spain have discovered that as Large Language Models expand size and complexity, they are less inclined to acknowledge their lack of knowledge to users.

Study: Examining AI Chatbots

The researchers, in their Nature study, assessed the newest versions of three popular AI chatbots, examining their response accuracy and users' effectiveness in recognizing incorrect information.

Increased Reliance on LLMs

As LLMs gain widespread adoption, users have increasingly relied on them for tasks like writing essays, composing poems or songs, solving mathematical problems, and more, Consequently, accuracy has become a growing concern.

Study Objective: Evaluating AI Accuracy

In this new study, the researchers sought to determine whether popular LLMs improve in accuracy with each update and how they respond when they provide incorrect answers.

AI Chatbots Assessed: BLOOM, LLaMA and GPT

In order to assess the accuracy of three leading LLMs--BLOOM, LLaMA and GPT---the researchers presented them with thousands of questions and compared the answers to those generated by earlier versions in response to the same prompts.

Diverse Themes Tested

The researchers also diversified the themes, encompassing math, science, anagrams, and geography, while evaluating the LLMs' capabilities to generate text or execute tasks like list ordering. Each question was initially assigned a level of difficulty.

Key Findings: Accuracy and Transparency

The researchers discovered that accuracy generally improved with each new iteration of the chatbots. However, they observed that as question difficulty increased, accuracy declined, as anticipated.

Transparency Decreases with Size

Interestingly, they noted that as LLMs became larger and more advanced, they tended to be less transparent about their ability to provide correct answers.

Behavioral Shift in AI Chatbots

In previous iterations, most LLMs would inform users that they were unable to find answers or required additional information. However, in the latest versions, these models are more inclined to make guesses, resulting in a greater number of responses, both accurate and inaccurate.

Reliability Concerns

The researchers also found that all LLMs occasionally generated incorrect answers, even to straightforward questions, indicating their continued lack of reliability.

User Study: Evaluating Incorrect Answers

The research team subsequently requested volunteers to evaluate the answers from the initial phase of the study, determining their correctness. They discovered that most participants struggled to identify the incorrect responses.

Source 

Comments

Popular posts from this blog

NASA chile scientists comet 3i atlas nickel mystery

NASA and Chilean Scientists Study 3I/ATLAS, A Comet That Breaks the Rules Interstellar visitors are rare guests in our Solar System , but when they appear they often rewrite the rules of astronomy. Such is the case with 3I/ATLAS , a fast-moving object that has left scientists puzzled with its bizarre behaviour. Recent findings from NASA and Chilean researchers reveal that this comet-like body is expelling an unusual plume of nickel — without the iron that typically accompanies it. The discovery challenges conventional wisdom about how comets form and evolve, sparking both excitement and controversy across the scientific community. A Cosmic Outsider: What Is 3I/ATLAS? The object 3I/ATLAS —the third known interstellar traveler after "Oumuamua (2017) and 2I/Borisov (2019) —was first detected in July 2025 by the ATLAS telescope network , which scans he skies for potentially hazardous objects. Earlier images from Chile's Vera C. Rubin Observatory had unknowingly captured it, but ...

Quantum neural algorithms for creating illusions

Quantum Neural Networks and Optical Illusions: A New Era for AI? Introduction At first glance, optical illusions, quantum mechanics, and neural networks may appear unrelated. However, my recent research in APL Machine Learning Leverages "quantum tunneling" to create a neural network that perceives optical illusions similarly to humans. Neural Network Performance The neural network I developed successfully replicated human perception of the Necker cube and Rubin's vase illusions, surpassing the performance of several larger, conventional neural networks in computer vision tasks. This study may offer new perspectives on the potential for AI systems to approximate human cognitive processes. Why Focus on Optical Illusions? Understanding Visual Perception O ptical illusions mani pulate our visual  perce ption,  presenting scenarios that may or may not align with reality. Investigating these illusions  provides valuable understanding of brain function and dysfunction, inc...

fractal universe cosmic structure mandelbrot

Is the Universe a Fractal? Unraveling the Patterns of Nature The Cosmic Debate: Is the Universe a Fractal? For decades, cosmologists have debated whether the universe's large-scale structure exhibits fractal characteristics — appearing identical across scales. The answer is nuanced: not entirely, but in certain res pects, yes. It's a com plex matter. The Vast Universe and Its Hierarchical Structure Our universe is incredibly vast, com prising a p proximately 2 trillion galaxies. These galaxies are not distributed randomly but are organized into hierarchical structures. Small grou ps ty pically consist of u p to a dozen galaxies. Larger clusters contain thousands, while immense su perclusters extend for millions of light-years, forming intricate cosmic  patterns. Is this where the story comes to an end? Benoit Mandelbrot and the Introduction of Fractals During the mid-20th century, Benoit Mandelbrot introduced fractals to a wider audience . While he did not invent the conce pt —...