Tow Digital News Center: New study finds AI search tools are only 60% accurate on average

It’s a fact that AI models can lack accuracy. Hallucinations and repeating false information have long been a thorny issue for developers. Because use cases vary so widely, it’s difficult to pin down quantifiable percentages associated with AI accuracy. A team of researchers claims they now have the numbers.

The Tow Center for Digital Journalism recently studied eight AI search engines, including ChatGPT Search, Perplexity, Perplexity Pro, Gemini, DeepSeek Search, Grok-2 Search, Grok-3 Search, and Copilot. They tested the accuracy of each tool and recorded how often the tool refused to answer.

The researchers randomly selected 200 news articles from 20 news publishers (10 articles each). They ensured that each article returned the top three results in a Google search when using an article excerpt. They then ran the same query in each AI search tool and rated the accuracy based on whether the search correctly cited A) the article, B) the news organization, and C) the URL.

The researchers then labeled each search based on a range of accuracy from “completely correct” to “completely incorrect.” As you can see in the graph below, all but two versions of Perplexity performed poorly. Overall, the AI search engines were inaccurate 60% of the time. Furthermore, the AI’s “confidence” in these incorrect results reinforced them.

This study is fascinating because it confirms in a quantitative way what we have known for years – that LLMs are “the most sophisticated liars ever.” They report with complete authority that what they say is true, even when it is not, and sometimes even argue or make up other false assertions when faced with doubt.

In an anecdotal article from 2023, Ted Gioia (The Honest Broker) pointed to dozens of ChatGPT responses showing the bot confidently “lying” in response to a large number of queries. While some examples were adversarial queries, many were just general questions.

Even after admitting it was wrong, ChatGPT provided more false information after admitting it was wrong. LLM seemed to be programmed to answer every user input at all costs. The researchers' data confirmed this hypothesis, noting that ChatGPT Search was the only AI tool that answered all 200 article queries. However, it was completely accurate only 28% of the time and was completely inaccurate 57% of the time.

ChatGPT wasn’t the worst. Both versions of X’s Grok AI performed poorly, but Grok-3 Search had an accuracy of 94%. Microsoft’s Copilot wasn’t much better, as it refused to answer 104 of the 200 queries. Of the remaining 96 queries, only 16 were “completely correct,” 14 were “partially correct,” and 66 were “completely wrong,” giving it an accuracy of about 70%.

Arguably the craziest part of all this is that the companies that make these tools are not transparent about this lack of accuracy while charging the public $20-200/month. Furthermore, Perplexity Pro ($20/month) and Grok-3 Search ($40/month) answer a slightly higher percentage of queries correctly than their free versions (Perplexity and Grok-2 Search), but also have significantly higher error rates (above).

Not everyone agrees, though. Lance Ulanoff of TechRadar said he may never use Google again after trying ChatGPT Search. He described the tool as fast, clear, and accurate, with a clean interface and no ads.

<<: PC prices are also rising: manufacturers want profits due to component shortages

>>: The truth behind computer and mobile phone freezing: cosmic ray interference

Practical case: 3 major marketing scenarios to explain the precise delivery of APP

Blog

New version of mobile QQ released! This is the first update after the Ministry of Industry and Information Technology took transitional administrative guidance on Tencent

Blog

Are the popular plant-based foods just a waste of money or are they really good for your health?

Blog

Double the battery life of your phone with just one piece of glass

Blog

Is it expensive to develop a mini program for daily necessities in Zhuhai? Zhuhai daily necessities applet development process and cost

Blog

Recommend

Is osteoporosis only for the elderly? Does drinking coffee easily lead to osteoporosis? Pay attention to "bone health" and pay attention to these habits →

October 20th is World Osteoporosis Day. This year...

Sony was fined 1 million yuan. These advertising laws should be remembered.

Do you still remember that Sony was unanimously b...

Are 400 numbers useful? What are the functions of 400 numbers?

There are two main ways for enterprises to handle...

Why do some people always perform exceptionally well in the college entrance examination every year? Learn this secret and you can also excel

When you think of the college entrance examinatio...

Which mini program company in Shanghai has a better reputation? Which mini program development company in Shanghai is the best?

Which mini program company in Shanghai has a bett...

Can whole wheat bread help you lose weight and control blood sugar? Eating it right will have an effect. Here are some tips for choosing it →

gossip Whole wheat bread is a good staple food in...

Tow Digital News Center: New study finds AI search tools are only 60% accurate on average

Practical case: 3 major marketing scenarios to explain the precise delivery of APP

New version of mobile QQ released! This is the first update after the Ministry of Industry and Information Technology took transitional administrative guidance on Tencent

Are the popular plant-based foods just a waste of money or are they really good for your health?

Double the battery life of your phone with just one piece of glass

Is it expensive to develop a mini program for daily necessities in Zhuhai? Zhuhai daily necessities applet development process and cost

How to formulate a new user recommendation strategy? It's right to meet on a blind date

Look at the soles of your shoes to know your health! Reminder: These 6 types of soles are exposing health problems

Six flight missions have been confirmed for this year! And here are some breaking news →

How to develop a React Native app for both Android and iOS

Have humans captured signals from extraterrestrial civilizations? The search for the unknown

Recommend

Is osteoporosis only for the elderly? Does drinking coffee easily lead to osteoporosis? Pay attention to "bone health" and pay attention to these habits →

Sony was fined 1 million yuan. These advertising laws should be remembered.

Are 400 numbers useful? What are the functions of 400 numbers?

Why do some people always perform exceptionally well in the college entrance examination every year? Learn this secret and you can also excel

Which mini program company in Shanghai has a better reputation? Which mini program development company in Shanghai is the best?

A small account with thousands of followers or a big V with millions? The key lies in these three moves!

I advise you to try Guizhou Matcha, just one reason is enough!

New studio angry dregs learn PR from scratch CC2022

The third round of the Aite Tribe Story Collection with prizes has begun~

WOT2016 Wang Qingyou: Listen to the Chief Architect Discussing Large APP Server Architecture

6 reasons why Apple Pay will accelerate its growth

Luo Yonghao talks about live streaming sales

Summary of Web App Development Skills

The perfect tracker that took half a year to design was dismantled by the Bully Bird in ten minutes

Can whole wheat bread help you lose weight and control blood sugar? Eating it right will have an effect. Here are some tips for choosing it →