Announcing our new Paper: The Prompt Report, with Co-authors from OpenAI & Microsoft!

Check it out →

🟦 Multi-Fusion Retrieval Augmented Generation (MoRAG)

Last updated on September 19, 2024 by Valeriia Kuka

What is MoRAG?

Traditional models often struggle with generating realistic motion for complex or unfamiliar prompts, but MoRAG solves this by retrieving motion sequences for specific body parts, then fusing them into a cohesive full-body sequence. MoRAG (Multi-Fusion Retrieval Augmented Generation)1 is a framework designed to improve text-based human motion generation by enhancing motion diffusion models with a retrieval-augmented approach.

MoRAG breaks down prompts into motion sequences specific to body parts (like the torso, hands, and legs), and retrieves motions tailored to each part. These parts are then combined to generate a realistic, diverse motion sequence from natural language descriptions.

Example

If a user inputs:

Astronaut

Prompt


A person doing yoga.

MoRAG retrieves specific motions for the torso, arms, and legs from its database, and combines them to generate a realistic yoga pose sequence.

How to Use MoRAG

MoRAG enhances motion generation by combining retrieval-based techniques with diffusion models. The process involves:

  1. Input: A prompt (e.g., "A person dancing with raised hands").
  2. Part-Specific Retrieval: The system breaks down the motion into body parts (e.g., torso, hands, legs) and retrieves relevant motions from a motion database.
  3. Fusion: These part-specific motions are fused into a complete motion sequence.
  4. Generation: The fused motion is used as additional guidance in the motion diffusion model to create diverse, high-quality sequences.

Benefits of MoRAG

  • Better generalization to unseen text descriptions.
  • Increased diversity in generated motions.
  • Higher alignment between text and motion.

MoRAG can be used in animation, virtual reality, and gaming for generating complex, realistic human motions from simple text descriptions.

Note

To explore MoRAG further, check out the code and models released by the authors.

Footnotes

  1. Shashank, K. S., Maheshwari, S., & Sarvadevabhatla, R. K. (2024). MoRAG – Multi-Fusion Retrieval Augmented Generation for Human Motion. https://arxiv.org/abs/2409.12140

Word count: 0
Copyright © 2024 Learn Prompting.