MAMMA (Markerless Accurate Multi-person Motion Acquisition) has been introduced—an innovative pipeline for capturing the motion of multiple people from video without using specialized suits and markers. The system achieves accuracy comparable to professional solutions using only standard video cameras.
What Happened
The MAMMA pipeline has been developed to reconstruct SMPL-X model parameters using the MammaNet transformer network based on the ViT-Base architecture. The system predicts 512 surface landmarks, accounting for uncertainty and the probability of physical contact between people. When using synchronized cameras, such as multiple iPhones, the system's accuracy on benchmark data is less than 1 mm.
Context
Traditional motion capture (MoCap) systems, such as Vicon or OptiTrack, require expensive equipment, special sensors, and markers, making the process complex and financially costly. MAMMA offers an alternative by shifting the focus from hardware to software-based reconstruction from video streams.
Why It Matters for the Industry
This technology provides significant democratization of the MoCap industry. The ability to accurately capture the interactions of a group of people without proprietary hardware opens new possibilities for high-quality animation production in film, video games, and metaverses, using only accessible consumer cameras.
Why It Matters for Users
For content creators and 3D animators, this means a radical reduction in the barrier to entry. Obtaining accurate motion data for Blender, Maya, or Unreal Engine no longer requires purchasing specialized suits—it is enough to place smartphones around a scene to achieve professional results.
What Is Not Yet Known / Limitations
The claimed accuracy of less than 1 mm was obtained on benchmark data; further verification of the system's effectiveness in real-world, unstructured scenarios is required, as well as investigation into latency and pipeline scalability.
Sources
Author
Look at AI, Editorial Staff