Fast, Accurate, and Interpretable General Drug Discovery Workflow LeadDisFlow: Chinese Team Advances Candidate Drugs into Phase II Clinical Trials}

LeadDisFlow, developed by a Chinese team, offers a rapid, precise, and interpretable workflow for drug discovery, successfully advancing candidate drugs into Phase II trials.

Fast, Accurate, and Interpretable General Drug Discovery Workflow LeadDisFlow: Chinese Team Advances Candidate Drugs into Phase II Clinical Trials}

In recent years, traditional targeted drug design has been hampered by three major bottlenecks: low quality of initial compound libraries, high costs of wet lab screening, and poor interpretability of activity prediction models. These issues severely limit the efficiency of new drug discovery.

To address these challenges, researchers from Hunan University, East China Normal University, Shanghai Jiao Tong University, and Huazhong University of Science and Technology have utilized advanced molecular imaging techniques to develop a fast, accurate, and interpretable targeted drug discovery workflow aimed at accelerating drug development.

The research results were published in the paper titled “Discovery of EP4 antagonists with image-guided explainable deep learning workflow” in National Science Open.

Link to the paper: https://www.sciengine.com/NSO/doi/10.1360/nso/20240015

The study proposes a universal, explainable, image-based workflow called LeadDisFlow. As an end-to-end AI-driven platform, LeadDisFlow has successfully helped researchers discover four highly selective EP4 antagonists with nanomolar affinity, significantly shortening the drug discovery cycle.

This platform has been adopted by pharmaceutical companies for drug discovery and optimization, leading to one candidate drug entering Phase II clinical trials.

Compared to existing advanced methods, LeadDisFlow demonstrates two major advantages:

  1. Superior interpretability: Using molecular images and visualization techniques, the workflow can intuitively reveal key chemical structures influencing activity, greatly enhancing model reliability and transparency.
  2. Complete end-to-end validation: Covering AI design to experimental validation, it generates large, diverse, high-quality virtual compound libraries, performs high-throughput virtual screening with image-based deep learning models, and confirms predictions through chemical synthesis and bioassays, discovering four potent and selective lead compounds.

Methodology

The core workflow of LeadDisFlow consists of two modules: the molecule generation module (LeadDisFlow-G) and the molecule screening module (LeadDisFlow-S).

1. LeadDisFlow-G (Molecule Generation)

  • Architecture: Based on recurrent neural networks (RNN), it can intelligently decorate and assemble known scaffolds and fragments to create novel molecules.
  • Library creation: Using 28 patents of pharmacologically active EP4 antagonists as seeds, the model generated 140,569 structurally novel molecules, forming a large initial virtual library.

2. LeadDisFlow-S (Molecule Screening) and Funnel-Based Filtering

The screening uses the advanced ImageMol model, a self-supervised learning-based property predictor trained on 10 million unlabeled molecular images, fine-tuned on small EP4 activity datasets for precise activity prediction.

The screening process follows a funnel approach:

  1. Initial filtering: Applying drug-likeness rules (molecular weight, LogP, etc.) reduces candidates from 140,569 to 19,250.
  2. Diversity clustering: Clustering these into 100 groups, then selecting 20 representatives from each, resulting in 2,000 diverse, high-quality candidates.
  3. Refined scoring: Using the fine-tuned LeadDisFlow-S model to predict bioactivity.
  4. Final selection: Experts evaluate the top 50 molecules for synthetic feasibility and drug-likeness, selecting four candidates (ZY001-ZY004) for synthesis and bioassays.

Results

Benchmark evaluation shows LeadDisFlow-S outperforms five other graph-based self-supervised models (GROVER, MGSSL, MPG, GraphMVP, MolCLR) in EP4 activity prediction, achieving ROC-AUC of 0.88, the highest among all.

Explainability studies reveal that LeadDisFlow accurately identifies key pharmacophores, such as the benzoic acid group, consistent with known pharmacology. It also successfully captures subtle structural differences affecting activity, such as a benzene ring replaced by cyclohexane, which causes nearly 20-fold activity change, by focusing attention on the critical ring structure.

Bioassay validation of the four candidates shows all exhibit excellent EP4 antagonism, with ZY001 reaching an IC50 of 0.51 nM. They also demonstrate high selectivity (>10,000 nM for EP1-EP3) and competitive binding with PGE2, confirming the platform’s reliability.

Further, ZY001 significantly inhibits immune-suppressive gene expression induced by PGE2 in macrophages, indicating its potential in immunotherapy and validating LeadDisFlow’s practical application in drug discovery.

Subscribe to QQ Insights

Don’t miss out on the latest issues. Sign up now to get access to the library of members-only issues.
jamie@example.com
Subscribe