By Insights Team in AI — 30 Jun 2025

ACL 2025 | AI Subtitles Lag Behind, New Method Brings Real-Time Performance Close to Offline Translation}

A new sequential policy optimization method significantly reduces latency in AI simultaneous translation, achieving near offline translation quality, revolutionizing real-time language processing.

The first author is Xu Ting, a PhD student at the Chinese University of Hong Kong, focusing on large model post-training; the corresponding authors are Huang Zhichao and Cheng Shanbo from ByteDance Seed team.

Have you ever experienced this scenario: watching an exciting global launch event, but the AI subtitles are always a half-beat behind, and by the time you see the translation, the joke on stage has already cooled off?

Or during international video conferences, machine translation quality fluctuates, sometimes making the speech incoherent and amusing.

This is the core challenge in the field of simultaneous machine translation (SiMT): the “Quality-Latency Trade-off”.

Now, a new solution has emerged. A research team from the Chinese University of Hong Kong, ByteDance Seed, and Stanford University jointly proposed a Sequential Policy Optimization framework for SiMT (SeqPO-SiMT).

This method models the SiMT task as a sequential decision process, optimizing the entire decision sequence to significantly improve translation quality while effectively controlling latency, sometimes surpassing offline models of similar size.

Paper Title: SeqPO-SiMT: Sequential Policy Optimization for Simultaneous Machine Translation
Link to Paper: https://arxiv.org/pdf/2505.20622

Research Background

The core of SiMT is the dynamic decision to either “continue listening” (READ) or “start speaking” (WRITE). This decision directly impacts the final translation quality. For example, when the model receives the English word “bark,” it faces a dilemma: translating immediately as “dog bark” might be correct, but if the following phrase is “of the tree,” the correct translation should be “tree bark.”

Traditional SiMT methods make each decision (whether to read or write) in isolation, risking sacrificing overall fluency and accuracy for short-term gains.

Core Method

To address this challenge, the paper proposes the SeqPO-SiMT framework. The key idea is to model the SiMT task as a sequential decision problem, evaluating the translation quality and latency across the entire process, and optimizing the entire decision sequence end-to-end.

This approach does not evaluate each step in isolation but considers the entire sentence translation process as a whole, aligning more closely with human evaluation of simultaneous interpretation.

Sampling Stage in SiMT: Uses a large language model (LLM) as a policy model

. At each timestep t, the model receives new source text

, and based on all previous source texts

and translation history

, it generates the current translation block

. This decision process can be formalized as:

. A key flexibility of this framework is that if the model decides to wait for more context, the output

can be empty, with its length entirely determined by the strategy model

. The decision-making process is formalized as:

. The reward function during optimization evaluates the entire process based on a combined reward

that assesses both translation quality

and latency

. The overall goal is to maximize the expected reward

while maintaining training stability through KL divergence constraints

against reference models

. This combined optimization allows the model to learn an optimal strategy balancing translation quality and latency:

Among them, λ is a hyperparameter used to balance the importance of quality and latency.

Optimization Objective: The model aims to maximize the expected reward

. To ensure training stability, the objective function also incorporates KL divergence as a constraint to prevent the policy model from deviating too far from reference models

. This end-to-end learning approach enables the model to develop an optimal strategy that balances translation quality and latency.

The main contribution of this work is providing a new perspective on addressing the quality-latency trade-off in SiMT. It emphasizes the importance of holistic, sequential decision optimization. The proposed method offers valuable insights for developing more efficient and intelligent real-time translation systems, with potential applications in natural language processing tasks requiring continuous, real-time decision-making.

Research Background

Subscribe to QQ Insights