Google has introduced a new AI model that can generate videos based on text and audio inputs. Known as the Large Language Model (LLM), this advanced technology allows users to create videos by providing specific instructions. Google researchers have also introduced a new AI model called ‘Video Poet’, which can process text, images, videos, and audio.

Remarkably, this tool can be utilized for both ‘pretraining’ and ‘Task Specific Training’, even for subjects that haven’t been previously trained. Since it operates as an autoregressive model, it can analyze prior actions. Google has integrated various video creation functions into this one LLM, making Video Point even more powerful. Some of its features include text-to-video conversion, image-to-video processing, video stylization, video inpainting, and outpainting, as well as video-to-audio conversion.