What is Emu Video?
This tool, as the name suggests, is for generating video. Meta describes it as “a simple method for text-to-video generation based on diffusion models”. Emu Video should respond to a variety of inputs: text only, image only, and both text and image. The process is split into two steps, Meta clarifies: first, generating images conditioned on a text prompt, and then generating video conditioned on both the text and the generated image.
What is Emu Edit?
This one should allow “precise image editing” via recognition and generation tasks. Like Meta says, the use of generative AI is often a process, not a single task.
“Emu Edit is capable of free-form editing through instructions, encompassing tasks such as local and global editing, removing and adding a background, color and geometry transformations, detection and segmentation, and more. Current methods often lean towards either over-modifying or under-performing on various editing tasks. We argue that the primary objective shouldn’t just be about producing a ‘believable’ image. Instead, the model should focus on precisely altering only the pixels relevant to the edit request. Unlike many generative AI models today, Emu Edit precisely follows instructions, ensuring that pixels in the input image unrelated to the instructions remain untouched. For instance, when adding the text ‘Aloha!’ to a baseball cap, the cap itself should remain unchanged”, says the Meta team.
The potential use cases
The road ahead is definitely AI-driven for Meta.
“Although this work is purely fundamental research right now, the potential use cases are clearly evident. Imagine generating your own animated stickers or clever GIFs on the fly to send in the group chat rather than having to search for the perfect media for your reply. Or editing your own photos and images, no technical skills required. Or adding some extra oomph to your Instagram posts by animating static photos. Or generating something entirely new”, the blog post concludes.