Hugging Face announces new open-source vision language model SmolVLM

Hugging Face announces new open-source vision language model SmolVLM (Image Source: Hugging Face)

Hugging Face has introduced a lightweight, open-source vision language model, SmolVLM, that the company says is built for efficiency and speed.

Rohith Bhaskar, Published 12/03/2024 🇮🇹 🇫🇷 ...

Hugging Face, a repository for machine learning, data sets, and AI tools, has released an open-source vision language model that is lightweight and built for efficiency and speed. Vision Language Models (VLM) can understand both text and visual input.

The model is available for commercial use with open training pipelines, which means the datasets, code, and methods used to train the model are available to the public. Hugging Face has three variants of the model - SmolVM-Base, SmolVM-Synthetic, and SmolVM Instruct.

SmolVM-Base is designed for downstream fine-tuning, meaning it can be adopted and trained for specific tasks. Synthetic is trained on artificial data and does not use real-world datasets, and Instruct can be "used out of the box for interactive end-user applications."

Hugging Face says SmolVM requires just 5.7GB of GPU RAM, making it smaller and more efficient than competitors like PaliGemma 3B, InternVL2 2B, and Qwen2-VL-2B. This allows it to run on laptops with limited VRAM.

It is also more token-efficient compared to other models. Tokens measure a model's speed and efficiency, and SmolVM can encode a 384x384 image in 81 tokens, compared to Qwen2-VL, which uses 16k tokens. The model also requires less computational power and RAM to get it running.

Hugging Face is hosting a demo built on SmolVM-Instruct with a supervised training script for anyone to try out.

Source(s)

Hugging Face

The anyon_e is basically a DIY laptop (Image: Bryan)

Anyon_e open-source laptop features 4K AMOLED, metal chassis and Ubuntu 01/27/2025

World Labs' new AI system can make interactive 3D worlds from 2D images (Image Source: World Labs)

World Labs' new AI system can make interactive 3D worlds from 2D images 12/04/2024

Microsoft says it won't train AI models on Office data without permission (Image Source: Microsoft 365)

Microsoft says it won't train AI models on Office data without permission 11/28/2024

Amazon may announce a new multimodal AI model soon (Image Source: Generated with DALL-E 3)

Amazon may announce a new multimodal AI model soon 11/28/2024

OpenAI's Sora video generator was briefly leaked on Hugging Face (Image Source: OpenAI)

OpenAI's Sora video generator was briefly leaked on Hugging Face 11/27/2024

Samsung shows off its second-generation AI model Gauss2 (Image Source: Samsung Newsroom)

Samsung shows off its second-generation AI model Gauss2 11/21/2024

Loading Comments

Comment on this article

Insider reinforces Sony PlayStation...

Stellantis CEO resigns as company s...

Rohith Bhaskar - Tech Writer - 330 articles published on Notebookcheck since 2024

I might look like a normal human being, but I’m secretly powered by tech news and bad puns. With a soft spot for all things digital, I dive into the world of innovation and bring back stories that make sense to techies and newbies alike.

contact me via: LinkedIn

Please share our article, every link counts!