Google unveils Lumiere generative AI to create more realistic images and videos from text

Google unveils Lumiere - the latest in generative AI that creates realistic video clips from text. (Source: Google Research)

Google has unveiled Lumiere - the latest in realistic text-to-image and text-to-video generation using machine learning. A key innovation is the ability to create realistic motion such as walking that current generative AIs have trouble with. The software does this by creating all video frames at once rather than using keyframes and training to learn how moving objects should appear.

David Chien, Published 01/31/2024 🇩🇪 🇵🇱 ...

Google has unveiled Lumiere, the state-of-the-art in realistic text-to-image and video generative AI. The software greatly improves upon the motion by using a novel approach to video frame generation that creates all the frames in one pass to mitigate motion errors.

Generative image AI creates images from text. One key enabling this is the huge amount of online images and videos available for training. Another is the development of methods to associate all words in a language with each other through vectors. Therefore, AI can understand as a pair of words, or in a sentence, “I am” is more likely than “I unilaterally”. Image creation AI such as Stable Diffusion associates words with object images. Such AI understands the words “royal residence” are more closely associated with a “castle” image than a “house” image.

Generative video AI extends image AI to create videos from text. Lumiere competitors first create keyframes, then the frames in between. This is like a master animator drawing the beginning and end images of a basketball shot, then having an assistant draw the images in between. The issue is that motion errors often occur because the in-between images aren’t drawn correctly, so Lumiere bypasses this by creating all video frames without keyframing. Also, Lumiere is trained to know what moving objects look like at various image sizes, so its videos look superior.

Technically, Lumiere utilizes diffusion probabilistic models to generate images coupled with a Space-Time U-Net, a U-net architecture with temporal up and down scaling plus attention blocks added to the usual image resolution scaling. Down-scaling temporally simultaneously with resolution significantly reduces computational workloads while up-scaling coupled with a temporally-aware, spatial super-resolution model generates the high-resolution output. Still, image frame segmentation is required due to memory limitations, so Multidiffusion is used across overlapping, frame segment boundaries to help mitigate temporal motion artifacts.

Lumiere can be coupled with other AI to create a broader range of output. This includes:

Cinemagraphs - one section of an image is animated
Inpainting - one object in a video is replaced by another
Stylized generation - the appearance is re-created in another art style
Image-to-video - a desired image is animated
Video-to-video – videos are re-created in another art style

The video length is limited to 5 seconds while the ability to create video transitions and multiple camera angles are non-existent. Readers interested in experimenting with generative AI on their desktop computers should upgrade to a powerful video card (like this at Amazon) for the best performance during training.

Lumiere can create images and videos from text, stylized to match another art, and even replace objects. (Source: Google Research)

Lumiere can animate a part of an image and the output can be fed into other AI easily. (Source: Google Research)

Source(s)

Google Research - Lumiere, Inbar Mosseri on YouTube

Loading Comments

Comment on this article

⟨

Lenovo previews AMD Xiaoxin 2024 laptop lineup with Ryzen 7 8845H

Samsung Galaxy Z Fold6 and Galaxy Z Flip6: UTG as hard as classic glass and larger battery expected

⟩

Please share our article, every link counts!

Add as a preferred
source on Google

David Chien - Tech Writer - 932 articles published on Notebookcheck since 2023

Having worked at Activision, UCLA, Anime Expo and more, I've seen technology being used to save lives, create games, and create fantastic 3D VR/AR worlds. There's always something fun in emerging technology that I want to get my hands on and all my friends turn to me to find the best for their needs, so I'm glad to bring my experience to Notebookcheck.

> Expert Reviews and News on Laptops, Smartphones and Tech Innovations > News > News Archive > Newsarchive 2024 01 > Google unveils Lumiere generative AI to create more realistic images and videos from text

David Chien, 2024-01-31 (Update: 2024-08-15)

Source(s)

Related Articles