{"title":"From correlation to causation using directed topological overlap matrix: Applications in genomics","authors":"Borzou Alipourfard , Jean Gao","doi":"10.1016/j.ymeth.2023.09.005","DOIUrl":null,"url":null,"abstract":"<div><p><span>Most causal discovery tools assume the local causal Markov condition. However, the theoretical assumptions that underlie the local causal Markov condition are often not met in practice. This is especially marked in genomics, where the unwanted presence of measurement errors, averaging effects, and feedback loops significantly undermine the legitimacy of the local causal Markov condition. Furthermore, these causal discovery algorithms require very large samples, orders above what is often available. In this paper, relaxing the local causal Markov condition and using Reichenbach's common cause principle instead, we present a more flexible approach to causal discovery, the directed topological overlap matrix (DTOM). DTOM is robust w.r.t. the presence of measurement errors, averaging effects, feedback loops, and is significantly more sample efficient. We study the utility of DTOM for discovering causal relations in biological data using three real gene expression data-sets. We first examine if DTOM can help distinguish the Myostatin mutation in the Piedmontese cattle by contrasting the muscle </span>transcriptomes<span> of the Piedmontese and Wagyu crosses: the Myostatin mutation is the cause of the double-muscling the Piedmontese cattle are famous for. We then consider a large-scale gene deletion study in yeast. We show that DTOM allows us to distinguish the deleted gene in a sample knowing only the set of differentially expressed genes in that sample. We then examine the progression of Alzheimer's disease (AD) under the lens of DTOM. The genes implicated as having a causal role in the progression of AD by our DTOM analysis were significantly enriched in cellular components that had been repeatedly implicated in the progression of AD.</span></p></div>","PeriodicalId":390,"journal":{"name":"Methods","volume":"219 ","pages":"Pages 58-67"},"PeriodicalIF":4.2000,"publicationDate":"2023-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Methods","FirstCategoryId":"99","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1046202323001597","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0
Abstract
Most causal discovery tools assume the local causal Markov condition. However, the theoretical assumptions that underlie the local causal Markov condition are often not met in practice. This is especially marked in genomics, where the unwanted presence of measurement errors, averaging effects, and feedback loops significantly undermine the legitimacy of the local causal Markov condition. Furthermore, these causal discovery algorithms require very large samples, orders above what is often available. In this paper, relaxing the local causal Markov condition and using Reichenbach's common cause principle instead, we present a more flexible approach to causal discovery, the directed topological overlap matrix (DTOM). DTOM is robust w.r.t. the presence of measurement errors, averaging effects, feedback loops, and is significantly more sample efficient. We study the utility of DTOM for discovering causal relations in biological data using three real gene expression data-sets. We first examine if DTOM can help distinguish the Myostatin mutation in the Piedmontese cattle by contrasting the muscle transcriptomes of the Piedmontese and Wagyu crosses: the Myostatin mutation is the cause of the double-muscling the Piedmontese cattle are famous for. We then consider a large-scale gene deletion study in yeast. We show that DTOM allows us to distinguish the deleted gene in a sample knowing only the set of differentially expressed genes in that sample. We then examine the progression of Alzheimer's disease (AD) under the lens of DTOM. The genes implicated as having a causal role in the progression of AD by our DTOM analysis were significantly enriched in cellular components that had been repeatedly implicated in the progression of AD.
期刊介绍:
Methods focuses on rapidly developing techniques in the experimental biological and medical sciences.
Each topical issue, organized by a guest editor who is an expert in the area covered, consists solely of invited quality articles by specialist authors, many of them reviews. Issues are devoted to specific technical approaches with emphasis on clear detailed descriptions of protocols that allow them to be reproduced easily. The background information provided enables researchers to understand the principles underlying the methods; other helpful sections include comparisons of alternative methods giving the advantages and disadvantages of particular methods, guidance on avoiding potential pitfalls, and suggestions for troubleshooting.