ETH’s New Reinforcement Learning Method Achieves 90% Navigation Success and Generalization in Micro-robots Inside Blood Vessels}
ETH Zurich's innovative RL approach enables tiny robots to navigate blood vessels with 90% success and strong generalization, promising breakthroughs in biomedical micro-robotics.


Editor | %
Controlling micro-robots with diameters as small as 0.1 mm inside blood vessels is like picking bacteria with chopsticks — traditional reinforcement learning (RL) requires 25 million physical experiments to learn navigation, which is impractical in biomedical scenarios.
For researchers at ETH Zurich, instead of challenging manual control, they turn to AI. Their non-invasive AI-controlled micro-robots can provide precise propulsion and efficiently learn from images in data-scarce environments.
The model achieves a navigation success rate of up to 90% within the sample experiments; initially, it generalizes to 50% success in new environments, and after 30 minutes of further training, this exceeds 90%.
This research, titled "Model-based reinforcement learning for ultrasound-driven autonomous microrobots", was published on June 26, 2025, in Nature Machine Intelligence.

Link to paper: https://www.nature.com/articles/s42256-025-01054-2
Ultrasound Micro-robot Reinforcement Learning
Micro-robots are excellent carriers for AI, but combining the two still faces challenges like overfitting and scene adaptation. Reinforcement learning (RL) offers a way to enhance training, but its experimental conditions are uncontrollable.
In complex, narrow spaces like human blood vessels, guiding micro-robots remains a challenge. Ultrasound propulsion shows great potential in this context.

Figure 1: Autonomous ultrasound-driven micro-robot.
In this experiment, the team used a model-based RL (MBRL) strategy, first building an ultrasound physics simulator with the Dreamer v.3 algorithm, allowing the micro-bubble robot to learn through trial and error in a virtual Pygame environment.
After two hours of learning, the system adapted well and performed excellently in complex channel navigation tasks where model-free RL algorithms like PPO struggled.
To address overfitting, the team developed a universal model that simulated blood vessels, tracks, and mazes, achieving over 90% accuracy across all scenarios.
Experiment: Blood Vessel Environment
This setup includes a circular artificial blood vessel channel with eight PZT piezoelectric actuators arranged in a polygon, integrated with millisecond-level electronic switching circuits. The micro-robot self-assembles under ultrasound in biocompatible microbubbles, which scatter sound waves.
During the experiment, PZT activates and inputs images of the blood vessel channel into the MBRL model, which then evaluates the current state of the micro-robot based on feedback.

Figure 2: Navigation performance analysis of micro-robots using RL in different environments.
The simulation environment is developed in Pygame, focusing on local path planning and obstacle avoidance, deliberately ignoring complex dynamics of the micro-robots, which will be explored in future experiments.
Compared to the state-of-the-art model-free RL algorithm PPO, the model-based RL (MBRL) shows better performance and higher efficiency. In winding tracks, PPO requires about 25 million steps to converge, while MBRL only needs 600,000 steps; in blood vessel channels, MBRL also converges within a million steps. MBRL consistently outperforms in all tested environments.
Upgrading Action Environment
To improve efficiency, the team implemented frame skipping, reducing computational load without sacrificing performance. This allows the model to focus on significant changes, lowering overfitting risks.
After testing different frame skipping rates, they chose skipping four frames for faster convergence. Higher skipping rates lead to quicker convergence but increase overfitting, so they used a higher training ratio (1000:1) to reduce interaction needs.

Figure 3: Transition of autonomous micro-robots from simulation to physical environment.
The team demonstrated that the MBRL model, after training in specific environments, can generalize to various environments through fine-tuning. Transitioning from a branch channel to an unseen blood vessel environment requires about 400,000 training steps to reach 90% success rate.
In more complex environments, the model maintains over 90% success rate between 3.1 million and 4 million steps, proving it accurately captures channel dynamics and exhibits robust performance.
Remarkably, the model successfully navigates in dynamic flow environments by employing differential strategies like increasing counterflow power or reducing flow, confirming the reliability of these micro-robots.
This mainly involves utilizing low-resistance areas near walls and attractive forces between the micro-robot and the wall, enabling real-time reverse and forward flow navigation.
Summary
This research marks the first deep integration of AI reinforcement learning with ultrasound-driven technology. Using MBRL, the team can real-time guide micro-robots to move against flow using only ultrasound.
In various simulations, target navigation reaches 90% accuracy after just one hour of fine-tuning, and in unfamiliar environments, 90% success is achieved within half an hour of initial generalization.
The researchers envision these results benefiting single-cell studies and micro-animal models, significantly advancing biotech and medical research. Future work will focus on developing fully automated 3D control systems and AI-driven adaptive deformation capabilities.