Review of Embodied Intelligence in Robotics Driven by Physical Simulators and World Models, Including 38 Pages and Over 400 References}

This comprehensive review covers 8 institutions, 38 pages, and 400+ references on embodied intelligence, focusing on physical simulators and world models for robotics development.

Review of Embodied Intelligence in Robotics Driven by Physical Simulators and World Models, Including 38 Pages and Over 400 References}

图片

Authors from: Nanjing University, University of Hong Kong, Central South University, Horizon Robotics, Chinese Academy of Sciences, Shanghai Jiao Tong University, TU Munich, Tsinghua University.

With rapid advancements in robotics and AI, embodied intelligence has become a key focus. Unlike perception or generation tasks, embodied intelligence requires autonomous perception, prediction, and action in complex environments, moving towards AGI. Deep integration of physics simulators and world models is seen as the most promising path: simulators provide safe, efficient multi-scenario training environments, while world models simulate internal environment prediction and strategy planning, enabling agents to perform “mental simulations” before acting.

This review, authored by scholars from multiple top institutions, systematically summarizes how these two technologies synergize to advance robots from “doing” to “thinking.” It uses 25 diagrams, 6 tables, and over 400 references to detail the current state and future directions of embodied intelligence research.

图片

Abstract: Pursuit of AGI makes embodied intelligence a frontier in robotics. It involves agents perceiving, reasoning, and acting in the physical world. Achieving robustness requires integrating perception, control, and cognition rooted in real interactions.

Physics simulators and world models are foundational: simulators offer controllable, high-fidelity environments for safe, scalable training; while world models enable internal environment prediction and adaptive decision-making, facilitating the transition from simulation to real-world deployment.

This review analyzes recent progress in combining these technologies, exploring their roles in enhancing autonomy, adaptability, and generalization, and discusses how external simulation and internal modeling collaborate to bridge the gap to real-world applications. A continuously updated open-source repository is maintained at: https://github.com/NJU3DV-LoongGroup/Embodied-World-Models-Survey.

Main Contributions:

  • Embodied Robot Capability Hierarchy: A five-level system (IR-L0 to IR-L4) covering autonomy, task handling, environmental adaptation, and social cognition.
  • Robot Learning Techniques: Review of advances in legged locomotion, manipulation, and human-robot interaction.
  • Physics Simulator Comparison: Analysis of platforms like Webots, Gazebo, MuJoCo, Isaac Gym/Sim, focusing on physics accuracy, rendering, and sensor support.
  • World Model Advances: Overview of architectures such as predictive networks, generative models, and multi-task models, with applications in trajectory prediction and closed-loop calibration in autonomous driving and robotic joints.

Research Content and Structure Overview

图片

1. Embodied Robot Capability Levels (IR-L0 to IR-L4):

  • IR-L0: Basic execution—fully dependent on human commands, no environment perception.
  • IR-L1: Rule-based response—limited sensor-driven tasks in closed environments.
  • IR-L2: Perception and adaptation—multi-modal sensing, basic path planning, obstacle avoidance.
  • IR-L3: Human-like collaboration—multi-turn dialogue, emotion recognition, dynamic scene cooperation.
  • IR-L4: Fully autonomous—self-generated goals, long-term learning, ethical decision-making.

2. Core Robot Technologies Review

  • Mobility: From Model Predictive Control to deep reinforcement learning-based end-to-end strategies.
  • Manipulation: From single-arm grasping to bimanual coordination, integrated with VLM/LLM-driven visual-language-action models.
  • Interaction: Advances in cognitive collaboration, physical safety, and social embedding.

3. Physics Simulator Comparison

Review of platforms like Webots, Gazebo, MuJoCo, Isaac Gym/Sim, focusing on physics fidelity, rendering quality, and sensor support; performance differences in heterogeneous hardware and large-scale training; future optimization directions.

4. World Model Architectures and Applications

Representative structures include predictive networks, generative models, and multi-task “dynamic + reward” models; applications in trajectory prediction for autonomous driving and simulation-reality calibration for robotic joints.

图片

Simulation capability comparison

图片

Rendering capability comparison

图片

Autonomous driving world model representative works

图片

Robotics world model representative works

Subscribe to QQ Insights

Don’t miss out on the latest issues. Sign up now to get access to the library of members-only issues.
jamie@example.com
Subscribe