Google is ending 2024 with a bang. On Wednesday, the Mountain View giant announced a flurry of AI news, including the release of Gemini 2.0, a brand-new language model with advanced multimodal capabilities. The new model ushers in what Google calls the "agentic era," where virtual AI agents will be able to carry out tasks on your behalf.
Initially, Google is only releasing one model from the Gemini 2.0 family: Gemini 2.0 Flash experimental, a superfast, lightweight model with multimodal input and output support. It can natively generate images mixed with text and multilingual audio, and can seamlessly tap into Google Search, code execution and other tools. These capabilities are currently in preview for developers and beta testers. Despite its small size, 2.0 Flash outperforms Gemini 1.5 Pro in several areas such as factuality, reasoning, coding, math, etc.—and that too at twice the speed. Normal users can try out the chat-optimized version of Gemini 2.0 Flash on the web starting today and it’s coming to the Gemini mobile app pretty soon.
Google is also showcasing several impressive experiences built with Gemini 2.0. First is the updated version of Project Astra, an experimental virtual AI agent that Google first showed off in May 2024. With Gemini 2.0, it can now hold conversations in multiple languages; use tools like Google Search, Lens, and Maps; remember stuff from your past conversations with it, and understand language at the latency of human conversation. Project Astra is designed to run on smartphones and glasses, though it's currently limited to a small group of trusted testers. Those interested in trying out the prototype on their Android phones can join the waitlist here. There's also this really cool demo of the Multimodal Live API, which is somewhat similar to Project Astra and lets you interact with a chatbot in real time using video, voice, and screen sharing.
Next is Project Mariner, an experimental Chrome browser extension that can browse the internet and complete tasks for you. The extension, which is currently available to select testers in the US, leverages Gemini 2.0’s multimodal capabilities "to understand and reason across information in your browser screen, including pixels and web elements like text, code, images and forms." Google acknowledges the technology is still in its infancy and not always reliable. But even in its current prototype form, it's undeniably impressive, as you can see for yourself in this YouTube demo.
Google also announced Jules, an AI-powered code agent powered by Gemini 2.0. It integrates directly into your GitHub workflow and the company says it can handle bug fixes and repetitive time-consuming tasks “while you focus on what you actually want to build."
Much of the newly announced stuff is currently restricted to early testers and app developers. Google says it plans to integrate Gemini 2.0 across its product portfolio, including Search, Workspace, Maps, and more early next year. That's when we'll get a better understanding of how these new multimodal capabilities and improvements translate to real-world use cases. There's also no word on the Gemini 2.0 Ultra and Pro models yet.
Are you a techie who knows how to write? Then join our Team! Wanted:
- News Writer (Romania based)
Details here