ICML 2025 | New Method SUICA for Spatial Transcriptomics Data with Higher Quality, Lower Noise, and Stronger Biological Signals from the University of Tokyo & McGill University}

Researchers from the University of Tokyo and McGill University propose SUICA, a novel model for spatial transcriptomics, achieving higher quality, lower noise, and stronger biological signals, showcased at ICML 2025.

ICML 2025 | New Method SUICA for Spatial Transcriptomics Data with Higher Quality, Lower Noise, and Stronger Biological Signals from the University of Tokyo & McGill University}

图片

Authors | Research Team

Editor | ScienceAI

Spatial transcriptomics (ST) data is a high-dimensional matrix recording gene expression levels and spatial coordinates simultaneously on the same tissue section.

Recently, the research groups from the University of Tokyo led by Professor Zheng Yinqiang and McGill University led by Professor Ding Jun jointly proposed a new modeling method for spatial transcriptomics data called SUICA. The related achievement was selected for ICML 2025 and titled “Suica: Learning super-high dimensional sparse implicit neural representations for spatial transcriptomics.”

图片

Paper link: https://openreview.net/pdf?id=XWC7JXHXvo

Open-source project: https://github.com/Szym29/SUICA

SUICA is a deep learning model based on implicit neural representations (INR) and graph autoencoders. It uses graph autoencoders to reduce the dimensionality of high-dimensional spatial transcriptomics data, then models the spatial coordinates and gene expression with implicit neural representations, enabling prediction of gene expression at any location within the tissue slice. Results show that data processed by SUICA has higher quality, lower noise, and stronger biological signals.

Why enhance spatial transcriptomics data?

Compared to traditional panoramic tissue imaging (WSI), which only shows morphological structures, or conventional transcriptomics that quantifies gene expression but loses spatial information, spatial transcriptomics links gene expression with tissue location, creating a functional map of cell states and microenvironments, thus connecting histology and molecular biology.

Despite its unprecedented spatial resolution, current data faces three major bottlenecks:

  • Resolution – Cost dilemma: denser probes and higher sequencing depth increase costs (e.g., stereo-seq costs over $4000/cm²) and sample throughput;
  • Signal sparsity and noise: limited mRNA capture per probe, severe zero inflation, and difficulty detecting low-abundance or key regulatory genes;
  • Cross-platform heterogeneity: significant differences in probe layout, sequencing depth, and background noise hinder multi-sample or multi-experiment integration.

Computational enhancement methods, including super-resolution reconstruction, deep denoising, and missing data imputation, can predict unmeasured gene expressions, recover true gene signals lost due to technical limitations, and generate standardized features for cross-platform comparability, thus providing a more precise and scalable basis for cell communication analysis, disease annotation, drug target discovery, multi-omics modeling, and AI-assisted pathology.

SUICA: A Unified Model Based on Implicit Neural Representations and Graph Autoencoders

Challenges in Modeling Spatial Transcriptomics Data

Modeling spatial transcriptomics data faces multiple challenges:

  • Data is distributed in a grid in space but has thousands to tens of thousands of genes, forming a super-high-dimensional, sparse, noisy matrix; high dropout rates weaken biological signals, reducing statistical power.
  • Existing platforms balance resolution and cost: denser probes and deeper sequencing increase costs, making it hard to achieve both cellular resolution and large sample sizes.
  • Interpolating discrete spatial points into continuous gene expression fields with implicit neural representations (INRs) faces two technical difficulties: the high dimensionality of gene expression space and the uneven distribution caused by zero inflation, making it hard for traditional INRs to model complex, nonlinear spatial patterns.

Graph Autoencoders for Dimensionality Reduction

Compared to traditional autoencoders, this approach treats each spatial transcriptomics point as a graph node, constructs an adjacency matrix based on spatial proximity, and applies graph convolution in the encoder to incorporate local spatial context into low-dimensional representations (Z). This enhances the signal in sparse, noisy data.

Implicit Neural Representation: Mapping Coordinates to Gene Expression

After obtaining low-dimensional features, the implicit neural network takes the spatial coordinates as input, learning the mapping between points and their low-dimensional features. The predicted features are then fed into the decoder of the graph autoencoder to map coordinates to high-dimensional gene expression.

Experimental Validation:

SUICA produces more accurate and biologically relevant predictions

The study used mouse embryo data from stereo-seq and mouse brain slices from Slide-seq for benchmarking. In tasks of unknown point prediction (super-resolution), SUICA significantly outperformed existing models and traditional INR models like FFN and SIREN across multiple metrics.

Visualizations show SUICA’s predictions accurately restore gene expression patterns and even enhance signals. For example, the gene SEPT3, important in mouse neural development, was weakly detected in ground truth but successfully captured by SUICA.

Cluster analysis and cell type annotation reveal SUICA’s ability to generate cell types closest to real data, preserving detailed tissue and organ structures, demonstrating its capacity to enhance biological signals and distinguish subtle cell states across organs.

图片

Experimental validation:

SUICA reduces noise and alleviates dropout in spatial transcriptomics data

To test SUICA’s denoising and imputation capabilities, the researchers added Gaussian noise and randomly set 70% of gene expressions to zero. The results showed SUICA outperformed existing methods in multiple metrics, effectively denoising data and recovering true gene expression signals.

图片

Subscribe to QQ Insights

Don’t miss out on the latest issues. Sign up now to get access to the library of members-only issues.
jamie@example.com
Subscribe