Scientists have developed a novel AI system, called Evo, which uses bacterial genomes to predict the structure and function of proteins. The system is trained on an enormous dataset of bacterial genes, where genes with related functions are clustered together, allowing it to learn the statistical rules of base frequencies at the nucleic acid level.
Evo's success lies in its ability to produce novel protein sequences that have never been seen before, without taking into account the structure of the protein. When tested on known bacteria toxins and CRISPR inhibitors, the system was able to output functional proteins with a high degree of accuracy, even when no similar sequences were present in its training data.
The researchers used a "genomic language model" approach, where Evo was trained to predict the next base in a sequence and rewarded for correct predictions. The system's performance improved as it learned more about the genetic context, allowing it to make connections between different genes and functions.
One of the most impressive aspects of Evo is its ability to generate novel proteins with no similarity to known ones. In one test, two antitoxins were produced that fully restored growth to bacteria that were producing the toxin, despite having only 25% sequence identity to known anti-toxins.
While the researchers acknowledge that their system may not work for more complex genomes like vertebrates, it has opened up new avenues for understanding protein function and evolution. The potential applications of this technology are vast, from designing novel enzymes to creating new medicines.
The team's success highlights the power of using bacterial genomes to understand protein biology and has implications for the fields of genomics, evolutionary biology, and synthetic biology. With its ability to generate novel proteins, Evo is poised to revolutionize our understanding of how life works at the molecular level.
Evo's success lies in its ability to produce novel protein sequences that have never been seen before, without taking into account the structure of the protein. When tested on known bacteria toxins and CRISPR inhibitors, the system was able to output functional proteins with a high degree of accuracy, even when no similar sequences were present in its training data.
The researchers used a "genomic language model" approach, where Evo was trained to predict the next base in a sequence and rewarded for correct predictions. The system's performance improved as it learned more about the genetic context, allowing it to make connections between different genes and functions.
One of the most impressive aspects of Evo is its ability to generate novel proteins with no similarity to known ones. In one test, two antitoxins were produced that fully restored growth to bacteria that were producing the toxin, despite having only 25% sequence identity to known anti-toxins.
While the researchers acknowledge that their system may not work for more complex genomes like vertebrates, it has opened up new avenues for understanding protein function and evolution. The potential applications of this technology are vast, from designing novel enzymes to creating new medicines.
The team's success highlights the power of using bacterial genomes to understand protein biology and has implications for the fields of genomics, evolutionary biology, and synthetic biology. With its ability to generate novel proteins, Evo is poised to revolutionize our understanding of how life works at the molecular level.