{"title":"从 omics 数据推断因果分子关系的可解释人工智能。","authors":"Payam Dibaeinia, Abhishek Ojha, Saurabh Sinha","doi":"10.1126/sciadv.adk0837","DOIUrl":null,"url":null,"abstract":"<p><p>The discovery of molecular relationships from high-dimensional data is a major open problem in bioinformatics. Machine learning and feature attribution models have shown great promise in this context but lack causal interpretation. Here, we show that a popular feature attribution model, under certain assumptions, estimates an average of a causal quantity reflecting the direct influence of one variable on another. We leverage this insight to propose a precise definition of a gene regulatory relationship and implement a new tool, CIMLA (Counterfactual Inference by Machine Learning and Attribution Models), to identify differences in gene regulatory networks between biological conditions, a problem that has received great attention in recent years. Using extensive benchmarking on simulated data, we show that CIMLA is more robust to confounding variables and is more accurate than leading methods. Last, we use CIMLA to analyze a previously published single-cell RNA sequencing dataset from subjects with and without Alzheimer's disease (AD), discovering several potential regulators of AD.</p>","PeriodicalId":21609,"journal":{"name":"Science Advances","volume":"11 7","pages":"eadk0837"},"PeriodicalIF":12.5000,"publicationDate":"2025-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11827637/pdf/","citationCount":"0","resultStr":"{\"title\":\"Interpretable AI for inference of causal molecular relationships from omics data.\",\"authors\":\"Payam Dibaeinia, Abhishek Ojha, Saurabh Sinha\",\"doi\":\"10.1126/sciadv.adk0837\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>The discovery of molecular relationships from high-dimensional data is a major open problem in bioinformatics. Machine learning and feature attribution models have shown great promise in this context but lack causal interpretation. Here, we show that a popular feature attribution model, under certain assumptions, estimates an average of a causal quantity reflecting the direct influence of one variable on another. We leverage this insight to propose a precise definition of a gene regulatory relationship and implement a new tool, CIMLA (Counterfactual Inference by Machine Learning and Attribution Models), to identify differences in gene regulatory networks between biological conditions, a problem that has received great attention in recent years. Using extensive benchmarking on simulated data, we show that CIMLA is more robust to confounding variables and is more accurate than leading methods. Last, we use CIMLA to analyze a previously published single-cell RNA sequencing dataset from subjects with and without Alzheimer's disease (AD), discovering several potential regulators of AD.</p>\",\"PeriodicalId\":21609,\"journal\":{\"name\":\"Science Advances\",\"volume\":\"11 7\",\"pages\":\"eadk0837\"},\"PeriodicalIF\":12.5000,\"publicationDate\":\"2025-02-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11827637/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Science Advances\",\"FirstCategoryId\":\"103\",\"ListUrlMain\":\"https://doi.org/10.1126/sciadv.adk0837\",\"RegionNum\":1,\"RegionCategory\":\"综合性期刊\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"MULTIDISCIPLINARY SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Science Advances","FirstCategoryId":"103","ListUrlMain":"https://doi.org/10.1126/sciadv.adk0837","RegionNum":1,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
Interpretable AI for inference of causal molecular relationships from omics data.
The discovery of molecular relationships from high-dimensional data is a major open problem in bioinformatics. Machine learning and feature attribution models have shown great promise in this context but lack causal interpretation. Here, we show that a popular feature attribution model, under certain assumptions, estimates an average of a causal quantity reflecting the direct influence of one variable on another. We leverage this insight to propose a precise definition of a gene regulatory relationship and implement a new tool, CIMLA (Counterfactual Inference by Machine Learning and Attribution Models), to identify differences in gene regulatory networks between biological conditions, a problem that has received great attention in recent years. Using extensive benchmarking on simulated data, we show that CIMLA is more robust to confounding variables and is more accurate than leading methods. Last, we use CIMLA to analyze a previously published single-cell RNA sequencing dataset from subjects with and without Alzheimer's disease (AD), discovering several potential regulators of AD.
期刊介绍:
Science Advances, an open-access journal by AAAS, publishes impactful research in diverse scientific areas. It aims for fair, fast, and expert peer review, providing freely accessible research to readers. Led by distinguished scientists, the journal supports AAAS's mission by extending Science magazine's capacity to identify and promote significant advances. Evolving digital publishing technologies play a crucial role in advancing AAAS's global mission for science communication and benefitting humankind.