Notebookcheck Logo

Google announces new PaliGemma 2 vision-language models

Google announces new PaliGemma 2 vision-language models (Image Source: Google)
Google announces new PaliGemma 2 vision-language models (Image Source: Google)
Google's PaliGemma 2 models are available in multiple sizes and resolutions, and they can understand text, images, and videos. Google is also touting the ability to create detailed, contextually relevant captions.

Google has announced the follow-up to the visual-language model PaliGemma launched in May 2024. PaliGemma 2 is available in multiple sizes ranging from 3 billion parameters to 28 billion and various resolution sizes up to 896px.  

The company says the model displays "leading performance on chemical formula recognition, music score recognition, spatial reasoning, and chest X-ray report generation." 

It also has long captioning capabilities with "detailed, contextually relevant captions for images, going beyond simple object identification to describe actions, emotions, and the overall narrative of the scene."

The new models will be offered as a "drop-in replacement" in multiple sizes without "major code modifications." The pre-trained models are available on Hugging Face and Kaggle and are free for anyone to download and try out. It also supports multiple frameworks including Hugging Face Transformers, Keras, PyTorch, JAX, and Gemma.cpp. 

Google says PaliGemma 2's "flexibility makes fine-tuning for specific tasks and datasets straightforward, empowering you to tailor its capabilities to your precise needs."

Source(s)

static version load dynamic
Loading Comments
Comment on this article
Please share our article, every link counts!
Mail Logo
> Expert Reviews and News on Laptops, Smartphones and Tech Innovations > News > News Archive > Newsarchive 2024 12 > Google announces new PaliGemma 2 vision-language models
Rohith Bhaskar, 2024-12- 6 (Update: 2024-12- 6)