Notebookcheck Logo

Humans can easily outsmart AI according to Apple-funded study

Humans vs AI (Image source: Generated using DALL·E 3)
Humans vs AI (Image source: Generated using DALL·E 3)
Although they often deliver impressive results, AI engines such as those from Meta and OpenAI, which use large language models, still lack basic reasoning capabilities. A group backed by Apple proposed a new benchmark, which already revealed that even the slightest wording changes in a query can lead to completely different answers.

Earlier this month, a team of six AI scientists backed by Apple published a study in which they introduced GSM-Symbolic, a new AI benchmark that "enables more controllable evaluations, providing key insights and more reliable metrics for measuring the reasoning capabilities of models." Sadly, it looks like LLMs are still severely limited and they lack the most basic reasoning capabilities, revealed the initial tests conducted using GSM-Symbolic with the AI engines from industry icons such as Meta and OpenAI.

The problem with the existing models, as uncovered by the aforementioned tests, lies in the lack of reliability of LLMs when subjected to similar queries. The study concluded that slight wording changes that would not alter the meaning of a query to a human often lead in different answers from AI bots. The research did not highlight any model that stands out. 

"Specifically, the performance of all models declines [even] when only the numerical values in the question are altered in the GSM-Symbolic benchmark,"

concluded the research, also discovering that

"the fragility of mathematical reasoning in these models [demonstrates] that their performance significantly deteriorates as the number of clauses in a question increases."

The study, which has 22 pages, can be found here (PDF file). The last two pages contain problems which have some irrelevant information added at the end, that should not alter the final result for a human solving it. However, the AI models used have also taken these parts into account, thus delivering wrong answers.

As a conclusion, AI models are still unable to move beyond pattern recognition and still lack generalizable problem-solving capabilities. This year, quite a few LLMs were unveiled, including Meta AI's Llama 3.1, Nvidia's Nemotron-4, Anthropic's Claude 3, Japanese Fugaku-LLM (the largest model ever trained exclusively on CPU power), and Nova, by Rubik's AI, a family of LLMs which was unveiled earlier this month.

Tomorrow, O'Reilly will release the first edition of Hands-On Large Language Models: Language Understanding and Generation, by Jay Alammar and Maarten Grootendorst. Its price tag reads $48.99 (Kindle) or $59.13 (paperback).

Source(s)

Read all 2 comments / answer
static version load dynamic
Loading Comments
Comment on this article
Please share our article, every link counts!
Mail Logo
> Expert Reviews and News on Laptops, Smartphones and Tech Innovations > News > News Archive > Newsarchive 2024 10 > Humans can easily outsmart AI according to Apple-funded study
Codrut Nistor, 2024-10-14 (Update: 2024-10-14)