Home GADGETS AI Hallucinations Ranked: ChatGPT is Best, Palm-Chat Needs to Sober Up

AI Hallucinations Ranked: ChatGPT is Best, Palm-Chat Needs to Sober Up

Vectara has published an AI hallucination leaderboard that ranks various leading AI chatbots according to their ability to not ‘hallucinate.’ It’s obviously designed to highlight the extent to which the various public large language models (LLMs) hallucinate, but what does this mean, why is it important, and how is it being measured?

One of the characteristics of AI chatbots we have become wary of is their tendency to ‘hallucinate’ — to make up facts to fill in gaps. A highly public example of this was when law firm Levidow, Levidow & Oberman got in trouble after they “submitted non-existent judicial opinions with fake quotes and citations created by the artificial intelligence tool ChatGPT.” It was noted that made-up legal decisions such as Martinez v. Delta Air Lines have some traits consistent with actual judicial decisions, but closer scrutiny revealed portions of “gibberish.”

If you think about the potential use of LLMs in areas such as health, industry, defense, and so on, it’s clearly imperative to stamp out AI hallucinations as part of any ongoing development. To observe a practical example of an AI hallucinating under controlled reference circumstances, Vectara decided to run some tests with eleven public LLMs:

(Image credit: Vectara / GitHub)
  • Feed the LLMs a stack of over 800 short reference documents.
  • Ask the LLMs to provide factual summaries of the documents, as directed by a standard prompt.
  • Feed the answers to a model that detects the introduction of data that wasn’t contained in the source(s).

Source link