Google has announced the follow-up to the visual-language model PaliGemma launched in May 2024. PaliGemma 2 is available in multiple sizes ranging from 3 billion parameters to 28 billion and various resolution sizes up to 896px.
The company says the model displays "leading performance on chemical formula recognition, music score recognition, spatial reasoning, and chest X-ray report generation."
It also has long captioning capabilities with "detailed, contextually relevant captions for images, going beyond simple object identification to describe actions, emotions, and the overall narrative of the scene."
The new models will be offered as a "drop-in replacement" in multiple sizes without "major code modifications." The pre-trained models are available on Hugging Face and Kaggle and are free for anyone to download and try out. It also supports multiple frameworks including Hugging Face Transformers, Keras, PyTorch, JAX, and Gemma.cpp.
Google says PaliGemma 2's "flexibility makes fine-tuning for specific tasks and datasets straightforward, empowering you to tailor its capabilities to your precise needs."
Are you a techie who knows how to write? Then join our Team! Wanted:
- News Writer (Romania based)
Details here