The rapid development of artificial intelligence (AI) systems will significantly impact businesses that rely on genomic data analysis for research and development of therapeutics and diagnostic decision-making. This article provides an overview of current trends in using AI systems to meet the challenge of extracting clinically useful information from highly complex genomic data.
The Challenge and Promise of Interpreting Complex Genomic Data
While completing the gap-less sequence of the human genome was a milestone for science, the complexity poses a considerable challenge for clinical use of the data, as we have previously discussed. In this post-human genome sequence world, it is becoming increasingly clear that human disease and disease susceptibility are not only a consequence of a particular mutation causing a particular gene dysfunction, but are often a result of genetic variations in non-coding regions, the three-dimensional (3D) structure of the genome, and chemical modifications of the DNA and protein molecules that make up the genome (referred to as the “epigenome”).
Taking full advantage of genomic data for therapeutic and diagnostic decision-making will require integrating the linear DNA sequence data of coding and non-coding regions, the 3D genomic structure information, and the epigenome. Information about these different genomic features may come from entirely different data modalities, such as DNA sequencing, imaging, and various biochemical assays. Moreover, accurate therapeutic and diagnostic decisions may require integrating genomic data analysis with medical information and patient data.
Accordingly, AI systems, with their capacity for capturing intricate patterns within large data sets and combinations of different data modalities, could become powerful tools for therapeutic and diagnostic decision-making that can address some of the challenges posed by the human genome complexity.
AI Systems for Interpreting Genomic Data
Recently developed AI systems significantly improve the accuracy of therapeutic and diagnostic predictions. Below, we describe the recent development of AI systems for analyzing information from the non-coding regions in the genome (I), from a combination of different genomic and medical information (II), and from liquid biopsies and cfDNA that depend on interpreting genomic data from fragments of the overall genome (III).
I. Interpretation of Non-Coding Genetic Variation in a Three-Dimensional Context
Most genetic variation associated with diseases locate in non-coding regions of the genome. Now that the first gap-less human genome has been completed, the next stage of research and analysis in this field will yield vast non-coding genetic data, which will in turn improve the diagnostic and therapeutic decision-making capabilities of AI systems that can be built on this as-of-yet untapped information.
However, non-coding genetic variants are not as easy to interpret as the coding region genetic variants assigned to a known gene. Variants in coding regions can be interpreted based on knowledge of the particular gene function, which considerably simplifies the analysis. That being said, non-coding variants may regulate different genes depending on the genomic 3D structure and the epigenome. Moreover, non-coding variants may influence the 3D structure and the epigenome. Accordingly, interpreting non-coding variants is a highly complex task that may require more than traditional data analysis.
The rapid developments of AI models show promising results in interpreting genetic data in the 3D context. For example, an AI model (DeepC) can accurately predict topologically associated domains (TADs). TADs are fundamental units of the 3D nuclear organization of the genome that contribute to gene expression by controlling the interaction of gene regulatory regions to their target genes in the 3D space. DeepC predicts TADs using a transfer learning approach and tissue-specific Hi-C data to train models that predict genome folding from megabase (Mb) windows of DNA sequence, which allows prediction of how variations in the primary sequence can impact the 3D genomic structure.
DeepC has been used to address why some people only get mild symptoms from COVID-19, whereas others experience severe respiratory failure and even death. As described in this article, DeepC was able to identify causative single nucleotide non-coding variants and effector genes that may underlie respiratory failure from COVID-19.
These studies demonstrate that AI systems can provide an improved capability to predict disease-linked genetic variants located in the non-coding regions of the genome by taking into account the 3D structure of the genome.
II. Interpretation of Genomic Data in Combination With Different Data Modalities
AI will make data analysis of the vast amount of genomic data more accurate and readily available. For example, Moor et al. reported on generalist medical AI (GMAI) models that can support clinical decision-making by combining multiple data modalities.
The most active innovation in genomic medicine involves simplifying data analysis for efficient clinical decision-making and combining the various types of genomic data, such as primary nucleic acid sequence data, epigenomic data, structural genomic information, and imaging information of native nucleic acids. The emerging AI models like GMAI will provide efficient and accurate data analysis of a combination of different genomic data modalities and other medically relevant information that will aid accurate diagnostic and therapeutic decision-making.
III. Interpretation of Data from Liquid Biopsy
Liquid biopsy and, in particular, analysis of circulating cell-free DNA (cfDNA) have an enormous potential for clinical treatment and diagnostics. There are currently numerous possibilities for non-invasive screening of disease and monitoring of treatment responses. Recently the analysis of cfDNA also goes beyond detecting variations in the primary DNA sequences to include the methylation levels and structural information such as fragmentation patterns. The complexity of the data currently obtained from cfDNA has rendered traditional data analysis insufficient. AI models have been increasingly used to interpret genomic data from cfDNA to make therapeutic and diagnostic decisions, as explained in this research paper.
Future Opportunities and Challenges of Using AI in Genomic Medicine
AI systems have the potential to revolutionize the development of new treatment and diagnostic options based on human genomic data and spur innovation and growth in the genomic medicine industry. While the new AI systems and their use for interpreting genomic data are exciting, the success of using AI in genomic medicine will require that the AI systems are fully trusted and accepted by the scientific community and society. Moreover, the data analysis from an AI system is only as good as the data provided, so great care must be taken to ensure the quality and accuracy of the data used for the analysis. Access to sufficiently comprehensive quality data may involve data sharing between various businesses, clinics, and government entities. Accordingly, private industry and the government will need to collaborate to ensure the careful use of AI and medical information to successfully develop AI-driven genomic medicine that is trusted and accepted by the scientific community and society.
AI in Health Care Series
For additional thinking on how artificial intelligence will change the world of health care, click here to read the other articles in our series.