Gemma 4 on Hugging Face: Google's Easter surprise for download

Gemma-4

Google releases Gemma 4: The new model family (E2B to 31B) brings reasoning capabilities and multimodality directly to laptops and smartphones. With a huge context window of up to 256K tokens and Apache 2.0 license, Google is setting an example for free local AI.

Marc Herter (translated by Marc Herter), Published 04/03/2026 🇩🇪 🇪🇸 ...

Just before Easter, Google dropped a major surprise on Hugging Face: the long-awaited Gemma 4 is now available for download. The launch features four primary size classes: E2B, E4B, 26B A4B, and 31B. All models feature an integrated "Thinking" mode, enabling them to process complex problems step-by-step before delivering a final answer. The excitement surrounding the release is evident, as Gemma 4 became locally usable in tools like LM Studio and Unsloth within hours of its debut.

According to Google, this new generation prioritizes efficiency over raw size. A standout improvement over the previous Gemma 3 iteration is that the smallest models in the current series already match the performance levels of the largest Gemma 3 model across various benchmarks. In practical terms, this means tasks that previously required high-end hardware can now be performed locally on a smartphone.

The architecture varies depending on the intended use case. While the 31B variant utilizes a relatively classic structure, the 26B-A4B model employs a Mixture-of-Experts (MoE) approach. During inference—the actual calculation process—only about four billion parameters are activated, despite the model possessing 26 billion in total. This ensures high speed and moderate resource consumption without sacrificing depth of knowledge. The smaller E2B and E4B models utilize Per-Layer Embeddings (PLE), which provide specialized information for each token at every layer of the model, optimizing performance specifically for mobile processors.

There are also significant advancements in the context window—the amount of data the model can keep "in mind" simultaneously. The E2B and E4B models support 128,000 tokens, while the larger variants (26B A4B and 31B) can handle up to 256,000 tokens. This capacity allows users to analyze massive documents or complex code structures in a single pass.

Multi-modality is deeply integrated into Gemma 4, allowing users to mix text and images seamlessly within a single prompt. The models are capable of object recognition, reading PDF documents, and Optical Character Recognition (OCR). Furthermore, the edge models (E2B and E4B) include native processing for video and audio formats, enabling features such as automatic speech recognition.

Another powerful feature is native support for "Function Calling." This allows the AI to act as a virtual assistant, independently executing software commands or using external tools to complete tasks. A clear example of this trend is the "OpenClaw" tool currently popular in China, which relies on this principle of AI agents. With Gemma 4, deploying such systems entirely on one's own device becomes significantly easier.

The legal framework is also a welcome change: the models are released under the Apache 2.0 license. This means they are not only free to use but can also be flexibly integrated into proprietary projects and used commercially—drastically lowering the barrier for developers. Previously, all Gemma models were released under a custom license authored by Google.

Initial hands-on testing underscores the impressive linguistic capabilities and increased efficiency of these models. Using LM Studio on a Bosgame M5, we achieved a response speed of just over 10 tokens per second (tok/s) with the Gemma 4 31B model—faster than the average reader can process information. The smaller models are even more agile: the E4B and 26B A4B variants easily exceed 40 tok/s, with the smallest model topping 60 tok/s. However, those wishing to utilize the full context size of the largest Gemma 4 model may find even 128 GB of RAM (as found in the Bosgame M5) to be tight; the AI can claim over 80 GB for itself, leaving little memory available for other tasks.

Sources

Gemma 4 | Hugging Face

Google Blog

⟨

Xiaomi Redmi K90 Ultra heads for release as certification emerges

New Casio G-Shock cloth strap watches with magnetic buckle now available in more countries

⟩

Add as a preferred source on Google

Loading Comments

Comment on this article

Editor of the original article: Marc Herter - Managing Editor Consumer Laptops - 628 articles published on Notebookcheck since 2021

From an early age I liked to thoroughly examine all kinds of devices to see how they worked, which also involved taking my own devices apart and therefore not always to the delight of my parents. Nevertheless, with my grandfather’s support, I became a computer and electronics tinkerer. With the family PC and Lego Mindstorms, my interested in software and programming took off, and I am currently an engineering program student. I enjoy building all sorts of gadgets with Arduino and 3D printers, and I still like to put electronic devices through their paces. By joining the Notebookcheck editorial team, I have been able to turn my hobby into a profession.

contact me via: Facebook, marc_i_may

> Expert Reviews and News on Laptops, Smartphones and Tech Innovations > News > News Archive > Newsarchive 2026 04 > Gemma 4 on Hugging Face: Google's Easter surprise for download

Marc Herter, 2026-04- 3 (Update: 2026-04- 3)

Sources

Related Articles