ChatGPT's health advice accuracy plummets the more evidence supplied, landmark study reveals

An evolving AI project from Mi3 | Automation with Editor curation. And oversight. Always.

In partnership with

Posted 04/04/2024 8:36am

Image by DALL·E Pic: Midjourney

Editors' Note: Many Fast News images are stylised illustrations generated by Dall-E. Photorealism is not intended. View as early and evolving AI art!

ChatGPT's advice,
With evidence, less precise,
Needs research, says vice.

In partnership with

ChatGPT's health advice accuracy plummets the more evidence supplied, landmark study reveals

A pioneering study by CSIRO and The University of Queensland (UQ) has found the more evidence given to ChatGPT, a large language model (LLM), the less reliable it becomes, with the accuracy of its responses dropping to as low as 28%.

The study explored a hypothetical scenario of an average person asking ChatGPT if a certain treatment has a positive effect on a condition.

The study presented 100 questions, ranging from 'Can zinc help treat the common cold?' to 'Will drinking vinegar dissolve a stuck fish bone?' ChatGPT's response was compared to the known correct response, or 'ground truth', based on existing medical knowledge. The study found ChatGPT was quite good at giving accurate answers in a question-only format, with an 80% accuracy in this scenario.

However, when the language model was given an evidence-based prompt, accuracy reduced to 63%. Accuracy was reduced again to 28% when an 'unsure' answer was allowed.

ChatGPT, which launched on November 30, 2022, has quickly become one of the most widely used LLMs. LLMs are a form of artificial intelligence that recognise, translate, summarise, predict, and generate text. Major search engines are now integrating LLMs and search technologies in a process called Retrieval Augmented Generation.

"The widespread popularity of using LLMs online for answers on people’s health is why we need continued research to inform the public about risks and to help them optimise the accuracy of their answers," said study co-author, CSIRO Principal Research Scientist and Associate Professor at UQ, Dr Bevan Koopman. "While LLMs have the potential to greatly improve the way people access information, we need more research to understand where they are effective and where they are not."

"We’re not sure why this happens. But given this occurs whether the evidence given is correct or not, perhaps the evidence adds too much noise, thus lowering accuracy."

Study co-author UQ Professor Guido Zuccon, Director of AI for the Queensland Digital Health Centre (QDHeC) said major search engines are now integrating LLMs and search technologies in a process called Retrieval Augmented Generation.

"We demonstrate that the interaction between the LLM and the search component is still poorly understood and controllable, resulting in the generation of inaccurate health information," he said.

The study was recently presented at Empirical Methods in Natural Language Processing (EMNLP), a premier Natural Language Processing conference in the field. The next steps for the research are to investigate how the public uses the health information generated by LLMs.

Want your daily shot of #AI powered, human curated news? Become a member of Mi3 Australia and ensure you get our Fast News daily newsletter straight to your inbox here.

Got news you want to share with Mi3 Fast News? Email your media announcements with either Word or PDF attachments plus images to fastnews@ai.mi-3.com.au.