By Insights Team in AI — 30 Jun 2025

Shanghai Jiao Tong University’s KinFormer: Generative Symbolic Regression Model Aids in Discovering Unknown Chemical Reaction Kinetics Mechanisms

Shanghai Jiao Tong University’s AI team introduces KinFormer, a generative symbolic regression model that autonomously uncovers unknown chemical reaction mechanisms, published at ICLR 2025.

Shanghai Jiao Tong University’s AI for Science team has proposed KinFormer, a novel AI architecture designed to discover reaction kinetics equations from experimental data, published at ICLR 2025.

Shanghai Jiao Tong University AI for Science team on the challenge of reaction kinetics equations discovery, introducing the KinFormer architecture.

Author: Shanghai Jiao Tong University AI for Science team

Introduction: Towards Precise Prediction of Chemical Reaction Kinetics

In cutting-edge organic synthesis, from innovative drug molecules to high-performance materials, understanding reaction mechanisms is crucial. Building quantitative kinetic models that accurately describe how reactant concentrations evolve over time is key to rational catalyst design and process optimization.

Such models bridge microscopic parameters (activation energy, transition state stability) with macroscopic catalytic performance (conversion rate, selectivity), shifting catalysis research from trial-and-error to theory-driven prediction.

The core task in kinetic modeling is to identify the governing equations and determine reaction rate constants. As shown in Figure 1, the interaction of species (Figure 1a) is translated into control equations via mass action law, expressed as a system of ordinary differential equations (Figure 1b), which, when matched with experimental data (Figure 1c), can guide catalyst design and reaction pathway optimization.

However, accurately constructing such models faces long-standing challenges:

1. Prior dependence dilemma: Traditional methods rely heavily on chemists’ assumptions about reaction pathways, which is inefficient and prone to subjective bias due to limited experience or cognitive constraints.

2. Generalization barriers in data-driven modeling: Emerging symbolic regression techniques can learn differential equations directly from data but struggle with complex catalytic reaction dynamics, often failing to capture multi-step coupling, nonlinear interactions, and violating physical laws like mass conservation.

Based on these issues, Shanghai Jiao Tong University’s AI for Science team, including researcher Xu Yanyan, proposed KinFormer, the first AI model capable of discovering reaction kinetics equations from experimental data. It employs a conditional training strategy to implicitly model physical laws within reaction equations, combined with search algorithms, enabling the construction of a generalizable mechanism discovery model with limited data, applicable to new reactions. This work was published at ICLR 2025.

Figure 1: Organic catalytic reaction mechanism diagram.

Innovative Mechanism: Integrating Physical Constraints and Intelligent Search for Kinetic Equation Prediction

KinFormer’s core idea is enabling the model to “understand” and adhere to the intrinsic physical laws of chemical reactions:

1. Conditional training strategy: Breaking the generalization bottleneck of end-to-end models.

KinFormer abandons the traditional approach of directly generating entire equations. During training, the model predicts the next differential equation based on a randomly selected subset of known equations as conditions.

This “conditional prediction” task encourages the model to learn the dynamic dependencies (e.g., reactant consumption rates and intermediate formation) governed by mass action laws, as well as shared kinetic parameters (rate constants). By shuffling the equations and their prediction order, the model avoids rote memorization of fixed sequences, focusing instead on internalizing physical logic.

Figure 2: Training strategy comparison diagram.

2. Monte Carlo Tree Search (MCTS): Global optimization of generation sequence.

The conditional strategy’s sensitivity to prediction order is addressed by introducing an MCTS module at the equation generation level. Each differential equation is treated as a node in a search tree, exploring different generation paths using a probabilistic upper confidence bound (P-UCB) strategy.

MCTS simulates candidate sequences, evaluates their “reward” based on metrics like R2_m and R2_u, and updates node weights via backpropagation. This dynamic process optimizes the generation order, ensuring the predicted equations are mathematically and physically consistent.

Figure 3: MCTS-based sequence search framework.

Experimental Analysis: Cross-Mechanism Generalization and Performance Advantages

The team validated KinFormer on 20 representative catalytic organic reactions, including core mechanisms, complex bifunctional systems, and key processes like catalyst activation/inactivation. Results significantly outperformed existing methods:

1. Strong cross-mechanism generalization: In “cross-category” scenarios (e.g., unseen catalyst activation mechanisms), KinFormer achieved an equation form accuracy (Accform) of 81.41%, over 30% higher than baseline models.

2. Robustness to noise: Even with Gaussian noise (std dev 1e-4), KinFormer accurately predicts concentration curves, demonstrating potential in real experimental environments.

3. Efficient intelligent search: The MCTS module ensures physical consistency and converges within about 20 iterations—three times faster than traditional beam search—and yields better performance.

Full experimental results are available in the original paper.

Figure 4: Main experimental results.

Significance: Advancing Intelligent Chemical Kinetics

Innovative scientific tool: Provides chemists with a powerful automation tool to analyze experimental data and discover unknown mechanisms, accelerating catalyst design and process optimization, reducing reliance on manual assumptions.

Universal methodology: The “conditional training + physics-guided global search” paradigm offers a new approach to embedding physical constraints in symbolic regression, with broad applications in physics, biology, and engineering systems governed by conservation laws and symmetries.

Ongoing research: The team aims to improve model robustness for higher-dimensional systems and noisy data, promoting practical applications in real laboratories and leading the development of intelligent, automated chemical kinetics research.

Team Introduction

Shanghai Jiao Tong University’s AI for Science team, led by Professor Yanhui Jin and Associate Professor Yan Yan Xu, includes over ten postdocs and graduate students focusing on generative AI and large scientific models for chemistry, proposing innovative solutions for synthesis and automation.

The team released the first chemical large language model—Bai Yulan Science Model—capable of reaction generation, feedback optimization, and guiding experimental exploration, with functions including molecular design, retrosynthesis planning, reaction condition generation, yield prediction, and iterative optimization.

Research results have been published in Nature Energy, Nature Computational Science (cover), Nature Machine Intelligence, Science Advances, and top-tier CCF A conferences. The team benefits from rich computational resources at the Shanghai Jiao Tong University AI Institute and collaborates closely with the School of Chemistry and Chemical Engineering and the Frontier Science Center for Transformative Molecules.

Paper Title: KINFORMER: Generalizable Dynamical Symbolic Regression for Catalytic Organic Reaction Kinetics

Conference: ICLR 2025

Citation: Chen, Jindou, Tian Jidong, Wu Liang, Chen Xinwei, Yang Xiaokang, Jin Yaohui, Xu Yanyan. “KinFormer: Generalizable Dynamical Symbolic Regression for Catalytic Organic Reaction Kinetics.” In The Thirteenth International Conference on Learning Representations.

Subscribe to QQ Insights