Breakthrough in AI-Driven Genome Design
A new artificial intelligence model has been introduced, marking a significant advancement in biological research. Developed using a dataset of 128,000 genomes covering various life forms, this AI can generate entire chromosomes and small genomes from scratch. Researchers claim it has the potential to interpret non-coding gene variants associated with diseases, making it a powerful tool in genetic research. This development is expected to enhance genome engineering by facilitating a deeper understanding of DNA sequences and their functions.
About the AI model
According to a study published by the Arc Institute, the AI model, named Evo-2, has been developed in collaboration with Stanford University and NVIDIA. The model, which has been made available through web interfaces, provides researchers with the ability to generate and analyse DNA sequences. Patrick Hsu, bioengineer at the Arc Institute and the University of California, Berkeley, stated during a press briefing that Evo-2 is intended to serve as a platform that scientists can modify to suit their research needs.
Trained on a Vast Repository of Genomes
Unlike previous AI models that focused primarily on protein sequences, Evo-2 has been trained on genome data, encompassing both coding and non-coding sequences. This extensive training set includes genomes from humans, animals, plants, bacteria, and archaea, covering 9.3 trillion DNA letters. The complexity of eukaryotic genomes, which contain interspersed coding and non-coding regions, has been incorporated into Evo-2’s framework to enhance its ability to predict gene activity.
Anshul Kundaje, computational genomicist at Stanford University, stated to Nature that independent testing would be required to fully assess Evo-2’s capabilities. Preliminary results suggest that it performs at a high level when predicting the effects of mutations in genes such as BRCA1, which is linked to breast cancer. The model was also used to analyse the genome of the woolly mammoth, further demonstrating its ability to interpret complex genetic structures.
Generating New DNA Sequences
The AI has been tested in designing new DNA sequences, including CRISPR gene editors, as well as bacterial and viral genomes. Earlier versions of the model produced incomplete genomes, but Evo-2 has shown improvements by generating more biologically plausible sequences. Brian Hie, computational biologist at Stanford University and Arc Institute, mentioned that while progress has been made, further refinements are necessary before these sequences can be fully functional in living cells.
Potential Applications in Genetic Research
Researchers anticipate that Evo-2 will aid in designing regulatory DNA sequences that control gene expression. Experiments are already underway to test its predictions on chromatin accessibility, which influences cell identity in multicellular organisms. Yunha Wang, computational biologist and CEO of Tatta Bio, suggested that Evo-2’s ability to learn from bacterial and archaeal genomes could assist in designing novel human proteins.
Future Prospects for AI in Genome Design
Scientists involved in the project aim to push beyond protein design towards comprehensive genome engineering. With ongoing refinements and laboratory validations, Evo-2 may contribute to advancements in synthetic biology and precision medicine. The model’s role in understanding genetic regulation and designing functional DNA sequences is expected to grow as more researchers adopt and refine its capabilities.