By Insights Team in AI — 26 Jun 2025

Google DeepMind Releases DNA Sequence Model AlphaGenome Capable of Analyzing 1 Trillion Bases to Uncover Disease Roots}

DeepMind's AlphaGenome analyzes up to 1 trillion DNA bases, predicting gene regulation and mutations at single-base resolution, advancing understanding of genetic diseases.

Editor | Carrot Skin

Genomes can be likened to instruction manuals for cells—detailed blueprints written in DNA that govern every aspect of life, from appearance to internal functions, growth, and reproduction.

Interestingly, even a single letter mutation (often called a variant) in this “manual” can drastically change an organism’s response to its environment, potentially causing loss of resistance or new skills, or even detrimental effects.

This “manual” is extraordinarily complex. Decoding how genetic instructions are read at the molecular level remains one of biology’s greatest mysteries.

Deep learning models that predict functional genomic measurements from DNA sequences are powerful tools for deciphering genetic regulation. Existing methods often trade off between input sequence length and prediction resolution, limiting their scope and performance.

Recently, Google DeepMind researchers introduced AlphaGenome, which takes a DNA sequence of up to 1 trillion bases as input and predicts thousands of regulatory genomic features at single-base resolution, including gene expression, transcription start sites, chromatin accessibility, histone modifications, transcription factor binding, chromatin contact maps, splicing sites, and their strength and coordinates.

“This will be an extremely effective tool!” said Caleb Lareau, systems biologist at Sloan Kettering Cancer Center, who was among the first to try this technology. “It’s the most comprehensive attempt to annotate and interpret all possible variations in the 3 billion-letter human genome. It’s the most powerful computational simulation tool we’ve had to date.”

The related preprint titled “AlphaGenome: advancing regulatory variant effect prediction with a unified DNA sequence model” was published on June 25, 2025.

Link to paper: https://storage.googleapis.com/deepmind-media/papers/alphagenome.pdf

Background

Deciphering the effects of genomic sequence variants remains a core challenge in biology. Non-coding variants outside protein-coding regions can influence multiple molecular outcomes, making interpretation difficult.

For example, non-coding variants can regulate chromatin accessibility, epigenetic modifications, and 3D chromatin conformation. They can alter gene expression levels or splicing, affecting mRNA availability, often in cell-type or tissue-specific ways.

Over 98% of observed human genetic variation is non-coding, yet current tools mainly focus on the remaining 2% of the genome.

AlphaGenome

To decode the genome more accurately, rapidly, and in a multi-modal, multi-dimensional manner, DeepMind developed AlphaGenome, integrating multi-modal predictions, long sequence context, and base-pair resolution into a unified framework.

AlphaGenome processes long DNA sequences of up to 1 million bases, predicting thousands of molecular features related to regulatory activity. It can also evaluate the impact of mutations by comparing predictions between mutated and wild-type sequences.

Predicted features include gene start and end positions across cell types, splicing sites, RNA output, and DNA accessibility and protein-binding sites.

Training data comes from large public consortia like ENCODE, GTEx, 4D Nucleome, and FANTOM5, which have experimentally measured these features across hundreds of human and mouse tissues and cell types.

Illustration: AlphaGenome architecture, training process, and comprehensive performance evaluation. (Source: Paper)

The architecture uses convolutional layers to detect short motifs, Transformer layers to propagate information across the sequence, and subsequent layers to convert motifs into multi-modal predictions. During training, computations are distributed across interconnected TPUs handling individual sequences.

This model builds on DeepMind’s previous genomics model Enformer and complements AlphaMissense, which classifies the effects of variants within protein-coding regions.

Powerful Performance

AlphaGenome can predict how single-base changes affect gene expression and alter RNA and protein products. Unlike other AI systems that analyze only about 2% of the genome, AlphaGenome is the first to perform comprehensive genome-wide analysis.

Hani Goodarzi from UC San Francisco states: “This is the first AI model capable of accurately predicting the position and manner of RNA (variant) expression directly from DNA sequences. It allows us to understand whether a gene is expressed and how the resulting RNA is processed.”

Illustration: AlphaGenome’s trajectory prediction and detailed performance assessment. (Source: Paper)

After training on human and mouse genomes, AlphaGenome achieved or surpassed the best external models in 24 out of 26 mutation effect prediction tasks and achieved state-of-the-art performance in 22 out of 24 genome trajectory prediction tasks. It can evaluate mutation effects across all modalities, accurately reproducing mechanisms of clinical variants near TAL1, an oncogene.

Marc Mansour, a cancer molecular biologist at UCL, notes: “When comparing patient tumor genomes with unaffected cells, thousands of individual variants are found. It’s hard to determine which variants have functional consequences.” Mansour believes AlphaGenome has the potential to do this.

This precise localization capability is crucial for his research, which analyzes how genetic changes affect immune functions. “I no longer need to test hundreds of things but can focus on a few, guiding the right direction,” he adds.

Future Impact

Disease understanding: By more accurately predicting gene disruptions, AlphaGenome can help researchers identify disease causes and interpret the functional impact of rare variants, potentially discovering new therapeutic targets.
Synthetic biology: Its predictions can guide the design of synthetic DNA with specific regulatory functions—such as activating genes only in neurons, not muscles.
Basic research: It can help map key functional elements in the genome and identify the most critical DNA instructions for cell-specific functions, accelerating genomic understanding.

Future Directions

Despite its advances, AlphaGenome has limitations. Like other sequence-based models, capturing the effects of distant regulatory elements (>100,000 bases away) remains challenging. Improving the model’s ability to recognize cell- and tissue-specific patterns is a focus for future work.

Additionally, the team has not yet designed or validated AlphaGenome for personal genome prediction, focusing instead on its performance on individual variants.

While AlphaGenome can predict molecular outcomes, it does not fully explain how variants lead to complex traits or diseases, which involve broader biological processes like development and environment. More research teams are needed to address these challenges.

Currently, AlphaGenome is available for non-commercial use via API: https://github.com/google-deepmind/alphagenome.

Finally, there are concerns about misuse for bioweapons, but DeepMind’s VP Pushmeet Kohli states the model has been shared with biosecurity experts and deemed safe. The goal is to expand its capabilities to provide deeper insights into gene variants and disease mechanisms, akin to the early days of AlphaFold1—an important first step.

Background

AlphaGenome

Powerful Performance

Future Impact

Future Directions

Subscribe to QQ Insights