By Insights Team in AI — 25 Jul 2025

HKUST & Beijing Humanoid Robotics Unveil LOVON: A New Paradigm for Open-World Goal Tracking in Legged Robots}

HKUST and Beijing Humanoid Robotics introduce LOVON, a groundbreaking open-world goal tracking framework for legged robots, enabling autonomous long-range navigation in dynamic environments.

Authors: Peng DaJie, PhD student at HKUST Guangzhou; Cao Jiahang, Intern at Beijing Humanoid Robotics Innovation Center; Zhang Qiang, PhD student at HKUST Guangzhou, Director of the Academic Committee at Beijing Humanoid Robotics Innovation Center; Corresponding mentor: Ma Jun, Assistant Professor at HKUST Guangzhou & Hong Kong University.

Enabling legged robots to autonomously complete long-range multi-target tasks like "approach the chair first, then quickly approach the pedestrian" in complex open environments has long been a challenge in robotics. Traditional methods are limited to fixed target categories or struggle with real-time challenges like visual jitter and target loss, often causing robots to get lost or misidentify objects in real-world scenarios.

HKUST Guangzhou and Beijing Humanoid Robotics Innovation Center jointly introduce LOVON (Legged Open-Vocabulary Object Navigator), an innovative framework that integrates large language models (LLMs) for task planning, open-vocabulary visual detection for generalization, and precise language-motion mapping models. This allows legged robots to perform high-precision long-range goal tracking in dynamic, unstructured environments, compatible with platforms like Unitree Go2, B2, H1-2, with plug-and-play features that break traditional scene limitations.

Paper: LOVON: Legged Open-Vocabulary Object Navigator
Project Website: https://daojiepeng.github.io/LOVON/
Code Repository: https://github.com/DaojiePENG/LOVON
Video Demo: https://www.bilibili.com/video/BV1xh3ezJEJn/

Open-world goal navigation is a significant challenge for legged robots, especially for long-range tasks requiring high-level planning and robust perception amidst visual jitter and target loss. Traditional methods struggle to integrate these components effectively, limiting real-world application.

LOVON addresses this by combining hierarchical task planning via large language models, open-vocabulary visual detection for diverse targets, and a language-motion model (L2MM) that converts instructions into precise movement vectors. This integrated approach enables robots to perform accurate, long-range navigation in complex environments, with excellent adaptability and robustness.

LOVON innovatively unifies three core modules, forming a closed loop of "language - vision - motion":

LLM Task Planner: Empowers robots with human-like reasoning, decomposing complex instructions like "approach the chair then the pedestrian" into sub-tasks, dynamically adjusting execution order for complex tasks.
Open-Vocabulary Visual Detection: Breaks the limits of predefined categories, enabling recognition of diverse targets from backpacks and potted plants to cars and pets, adaptable to indoor and outdoor scenarios.
Language-Motion Model (L2MM): Converts textual commands and visual feedback into accurate movement vectors, allowing robots to respond swiftly and precisely, improving task efficiency and accuracy.

Robust Visual Perception in Dynamic Environments

Visual jitter during robot movement often causes target detection failures. LOVON employs Laplacian variance filtering to analyze image clarity, automatically filtering blurry frames and replacing them with recent clear frames. Coupled with sliding average filtering, this technique increases detection stability by 25%, ensuring reliable target locking during running, stair climbing, or complex terrains.

Adaptive Execution Logic for Real-World Scenarios

LOVON’s adaptive logic enables robots to handle unexpected situations such as target loss, command updates, or external disturbances. When targets are lost, robots switch to search mode, scanning environment to re-localize. They seamlessly transition between tasks upon receiving new instructions and quickly re-plan paths after collisions, maintaining stable task execution in complex scenarios.

Performance in Simulation and Real-World Applications

Extensive testing shows LOVON outperforms traditional methods in both simulated and real environments:

Simulation (GymUnreal): In complex scenes like parking lots, city streets, and snowy villages, LOVON achieves a success rate (SR) of 1.00, far exceeding EVT’s 0.94. Training time is only 1.5 hours, compared to 360 hours for the best baseline TrackVLA, demonstrating remarkable efficiency.
Real-world Deployment: On platforms like Unitree Go2, B2, and H1-2, LOVON achieves four breakthroughs:
- Open-world adaptability: Recognizes targets of various sizes and shapes, quickly adapting to unfamiliar environments.
- Multi-target long-range tracking: Completes complex tasks like "find the chair, then the pedestrian, then the backpack" smoothly and continuously.
- Dynamic environment robustness: Maintains stable tracking on uneven terrains, stairs, and among vegetation, accurately following moving targets.
- Anti-interference: Quickly re-locks targets after movement or collisions, demonstrating strong resilience.

LOVON’s plug-and-play design allows easy deployment across various mainstream legged robot platforms, supporting applications in home assistance, industrial inspection, and outdoor scientific research.

Transforming Legged Robot Applications and Enabling Intelligent Services

The emergence of LOVON injects powerful innovation into legged robot navigation, filling a critical gap in open-vocabulary, long-range goal tracking. Its integrated framework and lightweight deployment bridge the gap between laboratory research and real-world applications.

With continued promotion, LOVON is expected to play a vital role in diverse fields, from smart home assistance to industrial inspection and outdoor exploration, leading a new era of intelligent service and robotic autonomy. For more details, visit the LOVON project homepage: https://daojiepeng.github.io/LOVON/.

HKUST & Beijing Humanoid Robotics Unveil LOVON: A New Paradigm for Open-World Goal Tracking in Legged Robots}

Overcoming Challenges in Open-World Navigation with LOVON

Three Core Modules for Intelligent Navigation

Robust Visual Perception in Dynamic Environments

Adaptive Execution Logic for Real-World Scenarios

Performance in Simulation and Real-World Applications

Transforming Legged Robot Applications and Enabling Intelligent Services

Overcoming Challenges in Open-World Navigation with LOVON

Three Core Modules for Intelligent Navigation

Robust Visual Perception in Dynamic Environments

Adaptive Execution Logic for Real-World Scenarios

Performance in Simulation and Real-World Applications

Transforming Legged Robot Applications and Enabling Intelligent Services

Subscribe to QQ Insights