2024 saw the rise of bizarre yet captivating AI benchmarks that had the tech world buzzing. From Will Smith slurping spaghetti to AI mastering Minecraft castles, these unconventional tests offered a quirky lens into the ever-evolving world of artificial intelligence. While academic benchmarks remain crucial, these viral challenges became cultural phenomena, showcasing AI’s capabilities in ways that resonated far beyond tech circles.
Imagine an AI-generated video of actor Will Smith eating spaghetti. It’s as odd as it sounds, yet this visual has become a meme-worthy benchmark for testing the realism of AI video generators. Each time a new video-generation tool debuts, the question arises: Can it create a believable rendition of Will Smith enjoying a bowl of noodles?
The trend became so popular that Smith himself joined the fun, posting a parody on Instagram in February. What started as a playful experiment has grown into an unofficial metric for evaluating AI’s prowess in rendering detailed, lifelike visuals. But why has it taken off? It’s simple, relatable, and, most importantly, amusing.
Beyond Will Smith and his pasta, other quirky benchmarks captured the public’s imagination in 2024. A 16-year-old developer created an app that lets AI control Minecraft, testing its ability to design and build intricate structures. Meanwhile, a British programmer developed a platform where AI competes in games like Pictionary and Connect 4. These benchmarks may not be academic, but they’re undeniably entertaining and accessible.
Unlike traditional benchmarks that measure AI’s ability to solve Math Olympiad problems or tackle PhD-level questions, these challenges resonate with everyday users. They’re fun, visual, and easy to grasp, offering a stark contrast to the often abstract metrics used in academia.
AI companies often tout their systems’ performance on rigorous, industry-standard tests. While these benchmarks are essential for pushing the boundaries of AI research, they don’t always connect with the average user. For example, a chatbot’s ability to solve complex math problems might impress researchers but offers little insight into how it will handle day-to-day tasks like drafting emails or summarizing articles.
Crowdsourced benchmarks, like Chatbot Arena, aim to fill this gap by letting users evaluate AI performance on real-world tasks. However, these platforms are often dominated by tech enthusiasts whose preferences might not reflect those of the general population. As Ethan Mollick, a Wharton professor, noted in a post on X, many benchmarks fail to compare AI performance to human standards in practical domains like medicine, law, or advice quality.
So, why do benchmarks like Will Smith eating spaghetti or AI-controlled Minecraft thrive? They’re simple, engaging, and easy to understand. They strip away the complexity of AI systems and present their capabilities in relatable, often humorous ways. Watching an AI build a Minecraft castle or generate an image of a celebrity doing something mundane is both entertaining and revealing.
These challenges also highlight the creative and sometimes chaotic potential of AI. While they might not provide empirical data or generalizable insights, they spark curiosity and invite broader audiences to engage with AI technology. In a field that can often feel inaccessible, these quirky benchmarks offer a refreshing dose of fun.
Of course, these benchmarks have their limitations. Just because an AI can render a convincing video of Will Smith doesn’t mean it’s equipped to generate a realistic burger or tackle complex creative tasks. These tests often focus on narrow domains and don’t address broader questions about AI’s impact or ethical considerations.
Some experts argue that the AI community should shift its focus to measuring downstream impacts—how AI systems affect society, industries, and individuals. While this approach is undoubtedly important, it’s unlikely to diminish the appeal of quirky benchmarks. After all, they’re not just tests; they’re entertainment.
In an industry grappling with how to explain AI’s capabilities to the public, quirky benchmarks offer a unique marketing tool. They distill complex technology into bite-sized, shareable content. As Max Zeff, a tech journalist, recently pointed out, these benchmarks help make AI more relatable and digestible for non-experts.
For tech companies, viral benchmarks are a way to showcase their tools in action. They generate buzz, spark conversations, and invite users to explore the technology for themselves. It’s a win-win: companies get free publicity, and audiences get a good laugh.
As 2025 approaches, the question isn’t whether quirky benchmarks will continue but which ones will capture the spotlight next. Will it be AI creating photorealistic desserts? Competing in virtual reality obstacle courses? Writing original screenplays? The possibilities are endless.
What’s clear is that these benchmarks are here to stay. They’ve carved out a niche in the AI world, blending humor, creativity, and technological exploration. They might not replace academic tests, but they’ve proven their value in making AI more approachable and engaging.
2024 showed us that AI isn’t just about solving hard problems or automating tasks—it’s also about having fun. Whether it’s Will Smith eating spaghetti, AI playing Connect 4, or virtual Minecraft architects, these benchmarks remind us of the lighter side of innovation.
As the AI community continues to evolve, these quirky challenges will likely inspire new ways of thinking, testing, and connecting with technology. They’re a testament to the creativity and playfulness that drive the field forward, proving that sometimes, the weirdest ideas are the ones that stick.