Imitation Learning New Paradigm, Chain-of-Action: Trajectory Autoregressive Action Reasoning}

Chain-of-Action introduces a trajectory autoregressive model for robotic manipulation, overcoming limitations of traditional methods and enhancing action reasoning and generalization.

Imitation Learning New Paradigm, Chain-of-Action: Trajectory Autoregressive Action Reasoning}

图片
图片

Challenges in Imitation Learning

Embodied AI aims to endow robots with perception, decision-making, and action capabilities. Despite progress in vision-language-action models, a "GPT moment" for embodied intelligence remains elusive.

Researchers from ByteDance Seed and the University of Adelaide critique classical imitation learning paradigms like ACT and Diffusion Policy, which follow a forward-prediction approach prone to compounding errors and short-sightedness.

They propose "Chain-of-Action" (CoA), a trajectory autoregressive strategy that generates action sequences backward from the goal, improving spatial generalization and providing a new modeling approach for embodied manipulation.

Action Chain: Trajectory Autoregressive Robotic Strategy

Core Idea: Inspired by Chain-of-Thought, CoA iteratively reasons in the action space, generating a complete trajectory from the goal backward through a unified autoregressive network, starting from keyframe actions.

Global-to-Local Consistency: This "backward" generation enforces a global-to-local structure, anchoring the final actions to the goal, significantly enhancing spatial generalization.

Unified Autoregressive Framework: CoA integrates keyframe recognition and trajectory generation into a single model, enabling end-to-end training and scalable, efficient closed-loop execution.

Key Design Elements

To realize trajectory autoregression, CoA introduces four key components:

  • Continuous Action Representation: Uses continuous rather than discretized actions, with a latent consistency loss to ensure fine trajectory details.
  • Dynamic Stopping: Implements a distance-based stopping mechanism for variable-length trajectories, replacing traditional EOS tokens.
  • Reverse Temporal Ensemble: Enhances stability by integrating predictions backward in time, contrary to conventional forward-based methods.
  • Multi-token Prediction (MTP): Models local action dependencies as regularization during training, removed during inference for efficiency, with animated examples showing multi-step actions.

Experimental Validation

Simulation Environment: In RLBench, CoA achieves an average success rate of 55.2% across 60 tasks, outperforming ACT (38.9%) and DP (32.6%), with significant improvements in complex tasks.

Relevance Analysis: CoA's success rate declines more slowly with increasing object distribution variance, showing robustness in high-variance, difficult tasks.

Spatial Generalization: Tests on in-distribution and out-of-distribution button-pressing tasks reveal CoA's superior generalization, maintaining about half success in out-of-distribution scenarios, unlike ACT and DP.

Real-World Deployment: On a Fetch robot, CoA achieved an average success rate of 61.3% across 8 kitchen tasks, outperforming ACT (46.3%) and DP (36.3%), demonstrating practical applicability.

Conclusion and Outlook

Chain-of-Action presents a novel imitation learning paradigm based on trajectory autoregression, effectively addressing cumulative errors and improving generalization without increasing data or model size, promising future directions for VLA models.

Subscribe to QQ Insights

Don’t miss out on the latest issues. Sign up now to get access to the library of members-only issues.
jamie@example.com
Subscribe