Traditional models often struggle with generating realistic motion for complex or unfamiliar prompts, but MoRAG solves this by retrieving motion sequences for specific body parts, then fusing them into a cohesive full-body sequence. MoRAG (Multi-Fusion Retrieval Augmented Generation) is a framework designed to improve text-based human motion generation by enhancing motion diffusion models with a retrieval-augmented approach.
MoRAG breaks down prompts into motion sequences specific to body parts (like the torso, hands, and legs), and retrieves motions tailored to each part. These parts are then combined to generate a realistic, diverse motion sequence from natural language descriptions.
If a user inputs:
A person doing yoga.
MoRAG retrieves specific motions for the torso, arms, and legs from its database, and combines them to generate a realistic yoga pose sequence.
MoRAG enhances motion generation by combining retrieval-based techniques with diffusion models. The process involves:
MoRAG can be used in animation, virtual reality, and gaming for generating complex, realistic human motions from simple text descriptions.
To explore MoRAG further, check out the code and models released by the authors.
Valeriia Kuka, Head of Content at Learn Prompting, is passionate about making AI and ML accessible. Valeriia previously grew a 60K+ follower AI-focused social media account, earning reposts from Stanford NLP, Amazon Research, Hugging Face, and AI researchers. She has also worked with AI/ML newsletters and global communities with 100K+ members and authored clear and concise explainers and historical articles.
Shashank, K. S., Maheshwari, S., & Sarvadevabhatla, R. K. (2024). MoRAG β Multi-Fusion Retrieval Augmented Generation for Human Motion. https://arxiv.org/abs/2409.12140 β©