
AI generated videos with consistent characters and scenes? Hands-on test of Vidu.com
Artificial intelligence issues.
Vidu.com claims its AI can generate videos with consistent characters and environments. That would mean you could string together multiple sequences with the same “actors” and potentially create entire (short) films. But does it actually work? We put it to the test.Christian Hintze (translated by Christian Hintze) Published 🇩🇪 🇫🇷 ...
Verdict – Huge Potential, But Not There Yet
The potential (and risks) are impressive. AI will undoubtedly reshape not only content creation but also the film and gaming industries. Having consistent characters, settings, and environments is a major step toward making AI video generators practical beyond just novelty use.
Right now, though, it’s not quite there. What’s the point if the person stays the same (assuming they even do) but then walks through solid objects, suddenly duplicates, or grows six fingers on one hand?
At the moment, Vidu.com feels more like a toy. It’s fun, but the tech isn’t yet reliable enough for professional use in film, advertising, or similar areas. There are simply too many glitches.
Pros
Cons
Pricing and Availability
Anyone can try Vidu.com for free after creating an account. The system works on credits. New users get some starter credits, and you can earn more through certain actions.
There are also monthly and yearly subscription options.
Despite the valid criticisms, we find AI in this space fascinating. But the output varies wildly in quality: on one hand, you can get incredibly detailed environments straight from your imagination. On the other, AI-generated people often stare blankly into space or move in weird slow-motion.
So, we gave the new AI video generator Vidu.com a spin ourselves. The developers kindly added 500 credits to our account for testing. The platform’s flagship model, Vidu Q1, can process up to seven reference images in a single video sequence. Missing elements can be generated through text prompts.
What can you do with Vidu?
With Vidu, you can generate AI videos using these tools:
- Text to Video: Type a prompt describing what should happen in the video
- Image to Video: Generate a video from a single image. Even cooler: set a start and an end frame, and Vidu fills in the transition
- Reference to Video: The most exciting feature. Upload images of characters, locations, or objects, and the AI tries to keep them consistent across a video.
utput is Full HD, 16:9, with optional smartphone format. Videos can even be upscaled, e.g., to 4K.
Hands-on experience – learning curve, precise prompts
Our ambitious plan:
- Scene 1: A Notebookcheck editor stands in an office testing a laptop.
- Scene 2: He types in a mediocre rating.
- Scene 3: Cut to another office, where men in suits watch surveillance footage of the editor entering that poor rating.
- Scene 4: One of the suits slams an alarm button.
- Scene 5: A SWAT-like team is dispatched.
In total, we mapped out 10 rough scenes.
1. Scene: Editor testing a Laptop
We used an older, not entirely up-to-date photo of one of our editors as the character reference, plus an office photo for the setting. We added our Notebookcheck logo, a laptop, and a fictional PC brand logo (“Lavani”). Then we gave the following prompt:
In short: our editor (Image 1) should be standing in the office (Image 2), testing a laptop. On the brick wall in the background, the Notebookcheck logo (Image 3) should appear.
Rendering a 5-second clip with the Vidu Q1 model takes only about 1–2 minutes and costs 15–20 credits. The result?
We weren’t too happy with the following issues:
- Why does our character’s hairstyle not match the reference photo? Will it stay consistent in later scenes?
- Why was our reference logo turned into “notobochech”?
- Why wasn’t the requested “over-the-shoulder” camera angle included?
- Why does the editor keep talking into a microphone the whole time?
For filmmakers, this makes it tough to get the exact camera angles and setups they describe.
We tweaked our prompt, but the more detail we added, the more problems popped up. Now the hairstyle was correct, but suddenly there were two twin editors on screen. Then our editor walked straight through a desk.
The AI also struggled with foreground placement. And despite repeated prompts, it never produced the requested over-the-shoulder shot. In short, our text inputs weren’t followed accurately.
Image to Video
We set aside our ambitious short-film idea and tried Image-to-Video and transitions between two reference images.
he first worked fairly well. For example, our image of Illidan (from *Warcraft 3*) was turned into a short clip. The camera zoomed in on the demon hunter’s face as he scowled. Because of copyright concerns, we won’t show the image or video here.
Finally, we tested a transition between two frames:
- Illidan stands on a rock.
- Illidan lands on the ground.
The idea: Illidan should jump from the rock (Image 1) and land on the ground (Image 2). But in the generated video, Illidan morphed into a black, bird-like shadow in between. Completely unusable.
Summary
Generating videos with Vidu.com is dead simple. But getting the exact scenes, camera angles, and actions you want is anything but. Prompts aren’t followed closely, reference images get distorted (logo, hairstyle), objects lose physical consistency (walking through a desk), or other glitches occur (duplicate editor).
All in all, it’s a bit frustrating and currently makes it nearly impossible to create truly consistent videos tailored to your vision.