Meta has unveiled two developments in the realm of generative AI, Emu Video and Emu Edit, which simplify the process for high-quality video generation and image editing based on text prompts.
The process for video generation is split into two stages: generating images based on a text prompt and then creating videos by combining both the text and the generated image. Unlike previous complex models, Meta’s approach is streamlined, utilising just two diffusion models to produce impressive 512×512 four-second videos at 16 frames per second.

Going beyond conventional prompt engineering, Emu Edit allows free-form editing through instructions, covering tasks such as local and global editing, background changes, colour transformations, and more.
What sets Emu Edit apart is its commitment to precision. By incorporating computer vision tasks as instructions, the model ensures that only pixels relevant to the edit request are altered, leaving the rest untouched. With a dataset comprising 10 million synthesised samples, Meta claims this is the largest of its kind, resulting in unparalleled edit results regarding instruction faithfulness and image quality.
While not a replacement for professional artists, Emu Video and Emu Edit open new avenues for expression, from ideating new concepts to adding flair to personal creations. It allows users to effortlessly create animated stickers, GIFs, or enhance Instagram posts without the need for technical skills.
Comments