MeanFlow Achieves New Milestones: Peking University Introduces MP1, a New Robot Learning Paradigm with Dual SOTA in Speed and Success Rate}

Peking University unveils MP1, a novel robot learning framework based on MeanFlow, setting new standards in speed and success rate, pushing the boundaries of robotic manipulation technology.

MeanFlow Achieves New Milestones: Peking University Introduces MP1, a New Robot Learning Paradigm with Dual SOTA in Speed and Success Rate}

Author Introduction: Sheng Juyi, PhD candidate at Peking University, specializing in robot manipulation skill learning; Wang Ziyi, Li Peiming, master's students at Peking University, focusing on video understanding analysis; Liu Yong, professor at Zhejiang University, expert in autonomous robots and intelligent systems; Liu Mengyuan, assistant professor at Shenzhen Graduate School of Peking University, researching human behavior understanding and robot skill learning.

Current VLA models rely heavily on the 'A' component—action generation models—that determine the quality and speed of actions. Specifically, generative models face a fundamental trade-off between inference speed and task success rate.

Diffusion models (like Diffusion Policy and DP3) generate high-quality action sequences through multiple iterative steps but are slow, making real-time control difficult. Conversely, flow-based models (like FlowPolicy) offer faster inference but require additional architecture constraints or consistency losses, complicating design and potentially limiting performance and generalization.

Another challenge in robotics is data-efficient few-shot generalization. Standard imitation learning often suffers from 'feature collapse,' where key states requiring different actions are mapped to similar latent representations, impairing the model's response in new scenarios. Improving the discriminative ability of models across states is crucial for better policy generalization.

To address these issues, a research team from Peking University proposed a new robot learning framework called MP1. This framework introduces the recent breakthrough in image generation—MeanFlow paradigm—into robotics, achieving millisecond inference speeds and laying the foundation for VLA action generation models.

Core Engine of MP1—Mean Flow Paradigm

The key innovation of MP1 lies in its fundamental shift in the generative paradigm. Traditional flow matching learns an instantaneous velocity field, requiring iterative solutions of ODEs during inference, which is time-consuming and accumulates numerical errors. MP1 directly learns the interval-averaged velocity field from initial noise to target actions.

Using the 'MeanFlow Identity,' MP1 models the average velocity field without any integration during inference. This design offers two major advantages:

  • True single-step generation (1-NFE): The model requires only one forward pass to generate complete action trajectories directly from random noise, eliminating the need for iterative ODE solvers.
  • Unconstrained simplicity: Its mathematical completeness ensures trajectory quality inherently, avoiding external consistency constraints like those in FlowPolicy, making the model more elegant and robust.

This mathematically grounded approach not only accelerates inference but also guarantees high stability and real-time performance in robotic tasks.

Dispersive Loss Enhances Few-Shot Generalization

After solving the dynamic trajectory generation problem, MP1 addresses the 'feature collapse' issue in robot learning, where key states are incorrectly mapped to similar latent representations, reducing generalization in few-shot scenarios.

MP1 introduces a recent technique from representation learning—Dispersive Loss. This lightweight regularization, active only during training, encourages latent features of different samples to repel each other, creating a more discriminative feature space. It effectively improves the model's ability to distinguish subtle scene differences without increasing inference overhead, crucial for high-response speed and low data cost in robotics.

MP1’s Simulation Performance

MP1’s advantages are validated across 37 complex tasks in the Adroit and Meta-World benchmarks.

Outstanding Success Rate and Stability

MP1 achieves an average success rate of 78.9%, surpassing FlowPolicy (71.6%) and DP3 (68.7%) by significant margins. Its performance in high-difficulty tasks is even more impressive, with success rate increases of 9.8%, 17.9%, and 15.0% respectively in medium, hard, and very hard tasks. The model also demonstrates high stability, with a standard deviation of only ±2.1% across multiple runs, indicating reliable and reproducible results.

Exceptional Inference Speed and Real-time Control

On an NVIDIA RTX 4090 GPU, MP1’s average inference time is just 6.8 ms, nearly twice as fast as FlowPolicy (12.6 ms) and 19 times faster than DP3 (132.2 ms). This ultra-low latency fully meets the real-time control requirements of robotic applications.

Few-Shot Learning Validation

To further verify the dispersive loss’s role in data efficiency, ablation experiments on few-shot learning were conducted. Results show MP1 consistently outperforms FlowPolicy across all data levels, especially in extremely sparse demonstrations (2-5 samples), demonstrating its ability to learn effectively from minimal data, reducing real-world deployment costs.

Real Robot Deployment

The team deployed MP1 on an ARX R5 dual-arm robot, testing it on five real desktop manipulation tasks. Results confirmed MP1’s superior performance, with success rates reaching 90% on the 'Hummer' task, significantly higher than FlowPolicy and DP3, and task completion times averaging only 18.6 seconds, much faster than the others.

Subscribe to QQ Insights

Don’t miss out on the latest issues. Sign up now to get access to the library of members-only issues.
jamie@example.com
Subscribe