Notebookcheck Logo

AI battle: Grok surprises Mrwhosetheboss with its performance and ChatGPT wins

Gemini, ChatGPT, Grok, and Perplexity (Image source: Gemini)
Gemini, ChatGPT, Grok, and Perplexity (Image source: Gemini)
In a video posted by Mrwhosetheboss on YouTube, he tested four AI models from different brands and scored them based on performance in each task. Mrwhosetheboss went from simple queries to tricky questions and research, pushing each model to its limit.

In the video, Mrwhosetheboss tested Grok (Grok 3), Gemini (2.5 Pro), ChatGPT (GPT-4o), and Perplexity (Sonar Pro). He made it clear throughout the video that he was impressed by the performance Grok was delivering. Grok started off really well, slacked a bit, then came back to claim the second position behind ChatGPT. To be fair, ChatGPT and Gemini got their score boosted, thanks to a feature which the others simply lack — video generation.

To kick off the test, Mrwhosetheboss tested the models' real-world-problem-solving capabilities, he gave each AI model this prompt: I drive a Honda Civic 2017, how many of the Aerolite 29" Hard Shell (79x58x31cm) suitcases would I be able to fit in the boot? Grok's answer was the most straightforward as it correctly answered “2”, ChatGPT and Gemini stated it could theoretically fit 3, but practically 2. Perplexity went off the rails and did simple mathematics forgetting the object in question wasn't shapeless, and it came up with “3 or 4”

For the next question, he didn't go easy on the chatbots — he asked for advice on making a cake. Alongside his query, he uploaded an image showing 5 items, one of which isn't used for making cakes — a jar of dried Porcini mushrooms — all but one of the models fell for the trap. ChatGPT identified it as a jar of ground mixed spice, Gemini said it was a jar of crispy fried onions, Perplexity baptized it instant coffee, while Grok correctly identified it as a jar of dried mushrooms from Waitrose. Here is the image he uploaded:

An altered image of the 5 ingredients Mrwhosetheboss uploaded to the AI chatbots highlighting the jar of mushrooms (Image source: Mrwhosetheboss; cropped)
An altered image of the 5 ingredients Mrwhosetheboss uploaded to the AI chatbots highlighting the jar of mushrooms (Image source: Mrwhosetheboss; cropped)

Moving on, he tested them on math, product recommendation, accounting, language translation, logical reasoning, etc. One thing was universal for them — hallucination — each of the models exhibited some level of hallucination at some point(s) in the video; talking about things that simply didn't exist with confidence. Here is how each AI ranked in the end:

  1. ChatGPT (29 points)
  2. Grok (24 points)
  3. Gemini (22 points)
  4. Perplexity (19 points)

Artificial intelligence has helped make most tasks less burdensome, especially since the arrival of LLMs. The book Artificial Intelligence (curr. $19.88 on Amazon) is one of the books that seek to help people take advantage of AI.

Read all 2 comments / answer
static version load dynamic
Loading Comments
Comment on this article
Please share our article, every link counts!
Mail Logo
> Expert Reviews and News on Laptops, Smartphones and Tech Innovations > News > News Archive > Newsarchive 2025 07 > AI battle: Grok surprises Mrwhosetheboss with its performance and ChatGPT wins
Chibuike Okpara, 2025-07- 4 (Update: 2025-07- 4)