Runway has shouldered aside Midjourney and Stable Diffusion, introducing the first clips of text-to-video AI art that the company says is completely generated by a text prompt.
The company said that it’s offering a waitlist to join what it calls “Gen 2” of text-to-video AI, after offering a similar waitlist for its first, simpler text-to-video tools that use a real-world scene as a model.
When AI art emerged last year, it used a text-to-image model. A user would input a text prompt describing the scene, and the tool would attempt to create an image using what it knew of real-world “seeds,” artistic styles and so forth. Services like Midjourney perform these tasks on a cloud server, while Stable Diffusion and Stable Horde take advantage of similar AI models running on home PCs.
Text-to-video, however, is the next step. There are various ways of accomplishing this: Pollinations.ai has accumulated a few models which you can try out, one of which simply takes a few related scenes and constructs an animation stringing them together. Another simply creates a 3D model of an image and allows you to zoom around.
Runway takes a different approach. The company already offers AI-powered video tools: inpainting to remove objects from a video (as opposed to an image), AI-powered bokeh, transcripts and subtitles, and more. The first generation of its text-to-video tools allowed you to construct a real-world scene, then use it as a model to overlay a text-generated video on top of it. This is normally done as an image, where you could take a photo of a Golden Retriever and use AI to transform the photo into a photo of a Doberman, for example.
That was Gen 1. Runway’s Gen 2, as the company tweeted, can use existing images or videos as a base. But the technology can also completely auto-generate a short video clip from a text prompt and nothing more.
As Runway’s tweet indicates, the clips are both short (just a few seconds at most), awfully grainy, and suffers from a low frame rate. It’s not clear when Runway will release the model for early access or general access, either. But the examples on the Runway Gen 2 page do show a wide variety of video prompts: pure text-to-video AI, text+image to video, and so on. It appears that the more input you give the model, the better your luck. Applying a video “overlay” over an existing object or scene seemed to offer the smoothest video and highest resolution.
Runway already offers a $12/mo “Standard” plan that allows for unlimited video projects. But certain tools, such as actually training your own portrait or animal generator, require an additional $10 fee. It’s unclear what Runway will charge for its new model.
What Runway does demonstrate, however, is that in a few short months, we’ve moved from text-to-image AI art into text-to-video AI art… and all we can do is shake our heads in amazement.