A German-American team of scientists has used artificial intelligence (AI) to decode complex instructions for gene regulation in DNA. A neural network was used for this purpose.
An interdisciplinary research team from the Technical University of Munich, the Stowers Institute for Medical Research and Stanford University has shown that neural networks, in combination with newly developed techniques for model interpretation, can decode complex instructions encoded in DNA. In doing so, the scientists are addressing one of the great unsolved problems in biology: the regulatory code of the genome.
This sequence of DNA bases contains not only instructions on how to build proteins, but also information on when and where they are produced in an organism. The code is read by proteins (“transcription factors”) that bind to short DNA segments (“motifs”). The extent to which specific combinations and arrangements of motifs influence regulatory activity is an extremely complex and as yet unsolvable question.
Precise models and experiments
For the researchers, the highest possible resolution of the transcription factor-DNA binding experiments and computer modeling was an important success factor. It extends down to the level of individual DNA bases. The team was thus able to train high-precision neural network models and extract key elements and patterns from the models. This included the binding motifs for transcription factors and the combinatorial rules by which they function together as a code.
“Neural networks are considered a black box that is difficult to understand, but they can be interrogated digitally. With a large number of virtual experiments, it is thus possible to find out the rules that the neural network has learned,” explains first author Dr. Žiga Avsec, a member of the laboratory of Julien Gagneur, Professor of Computational Molecular Medicince at TU Munich. Together with Anshul Kundaje, professor at Stanford University, he created the first version of the model as a visiting scientist.
The team applied the approach to the master regulators of mouse embryonic stem cells and experimentally confirmed the results by CRISPR genome editing. The patterns thus discovered showed clear rules that, among other things, indicated precise positioning along the DNA double helix and included a preferred order of transcription factors.
Making patterns visible
Among the findings, for example, is that a well-studied transcription factor called Nanog binds preferentially to DNA when several of its motifs are arranged periodically and appear on the same side of the helical DNA helix. This motif periodicity has been suspected but has been difficult to detect.
“This is the main advantage of using neural networks for this task. A classical computational model relies on hand-crafted, rigid rules to ensure that it can be interpreted, ” Avsec clarifies. “Biology, however, is extremely rich and complicated. By refraining from interpreting individual parameters, we can train much more flexible and multilayered models that capture all biological phenomena, including those that are still unknown,” he says.