Imitation Learning New Paradigm, Chain-of-Action: Trajectory Autoregressive Action Reasoning}
Chain-of-Action introduces a trajectory autoregressive model for robotic manipulation, overcoming limitations of traditional methods and enhancing action reasoning and generalization.



- Paper Title: Chain-of-Action: Trajectory Autoregressive Modeling for Robotic Manipulation
- Homepage: https://chain-of-action.github.io/
- Article: https://arxiv.org/pdf/2506.09990
- Code Repository: https://github.com/ByteDance-Seed/Chain-of-Action
Challenges in Imitation Learning
Embodied AI aims to endow robots with perception, decision-making, and action capabilities. Despite progress in vision-language-action models, a "GPT moment" for embodied intelligence remains elusive.
Researchers from ByteDance Seed and the University of Adelaide critique classical imitation learning paradigms like ACT and Diffusion Policy, which follow a forward-prediction approach prone to compounding errors and short-sightedness.
They propose "Chain-of-Action" (CoA), a trajectory autoregressive strategy that generates action sequences backward from the goal, improving spatial generalization and providing a new modeling approach for embodied manipulation.

Action Chain: Trajectory Autoregressive Robotic Strategy
Core Idea: Inspired by Chain-of-Thought, CoA iteratively reasons in the action space, generating a complete trajectory from the goal backward through a unified autoregressive network, starting from keyframe actions.
Global-to-Local Consistency: This "backward" generation enforces a global-to-local structure, anchoring the final actions to the goal, significantly enhancing spatial generalization.
Unified Autoregressive Framework: CoA integrates keyframe recognition and trajectory generation into a single model, enabling end-to-end training and scalable, efficient closed-loop execution.

Key Design Elements
To realize trajectory autoregression, CoA introduces four key components:
- Continuous Action Representation: Uses continuous rather than discretized actions, with a latent consistency loss to ensure fine trajectory details.
- Dynamic Stopping: Implements a distance-based stopping mechanism for variable-length trajectories, replacing traditional EOS tokens.
- Reverse Temporal Ensemble: Enhances stability by integrating predictions backward in time, contrary to conventional forward-based methods.
- Multi-token Prediction (MTP): Models local action dependencies as regularization during training, removed during inference for efficiency, with animated examples showing multi-step actions.


Experimental Validation
Simulation Environment: In RLBench, CoA achieves an average success rate of 55.2% across 60 tasks, outperforming ACT (38.9%) and DP (32.6%), with significant improvements in complex tasks.

Relevance Analysis: CoA's success rate declines more slowly with increasing object distribution variance, showing robustness in high-variance, difficult tasks.

Spatial Generalization: Tests on in-distribution and out-of-distribution button-pressing tasks reveal CoA's superior generalization, maintaining about half success in out-of-distribution scenarios, unlike ACT and DP.

Real-World Deployment: On a Fetch robot, CoA achieved an average success rate of 61.3% across 8 kitchen tasks, outperforming ACT (46.3%) and DP (36.3%), demonstrating practical applicability.

Conclusion and Outlook
Chain-of-Action presents a novel imitation learning paradigm based on trajectory autoregression, effectively addressing cumulative errors and improving generalization without increasing data or model size, promising future directions for VLA models.