Rudolf Schill, Maren Klever, Andreas Lösch, Y Linda Hu, Stefan Vocht, Kevin Rupp, Lars Grasedyck, Rainer Spang, Niko Beerenwinkel
{"title":"Correcting for Observation Bias in Cancer Progression Modeling.","authors":"Rudolf Schill, Maren Klever, Andreas Lösch, Y Linda Hu, Stefan Vocht, Kevin Rupp, Lars Grasedyck, Rainer Spang, Niko Beerenwinkel","doi":"10.1089/cmb.2024.0666","DOIUrl":null,"url":null,"abstract":"<p><p>Tumor progression is driven by the accumulation of genetic alterations, including both point mutations and copy number changes. Understanding the temporal sequence of these events is crucial for comprehending the disease but is not directly discernible from cross-sectional genomic data. Cancer progression models, including Mutual Hazard Networks (MHNs), aim to reconstruct the dynamics of tumor progression by learning the causal interactions between genetic events based on their co-occurrence patterns in cross-sectional data. Here, we highlight a commonly overlooked bias in cross-sectional datasets that can distort progression modeling. Tumors become clinically detectable when they cause symptoms or are identified through imaging or tests. Detection factors, such as size, inflammation (fever, fatigue), and elevated biochemical markers, are influenced by genomic alterations. Ignoring these effects leads to \"conditioning on a collider\" bias, where events making the tumor more observable appear anticorrelated, creating false suppressive effects or masking promoting effects among genetic events. We enhance MHNs by incorporating the effects of genetic progression events on the inclusion of a tumor in a dataset, thus correcting for collider bias. We derive an efficient tensor formula for the likelihood function and apply it to two datasets from the MSK-IMPACT study. In colon adenocarcinoma, we observe a significantly higher rate of clinical detection for TP53-positive tumors, while in lung adenocarcinoma, the same is true for EGFR-positive tumors. Compared to classical MHNs, this approach eliminates several spurious suppressive interactions and uncovers multiple promoting effects.</p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":"31 10","pages":"927-945"},"PeriodicalIF":1.4000,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Computational Biology","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1089/cmb.2024.0666","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0
Abstract
Tumor progression is driven by the accumulation of genetic alterations, including both point mutations and copy number changes. Understanding the temporal sequence of these events is crucial for comprehending the disease but is not directly discernible from cross-sectional genomic data. Cancer progression models, including Mutual Hazard Networks (MHNs), aim to reconstruct the dynamics of tumor progression by learning the causal interactions between genetic events based on their co-occurrence patterns in cross-sectional data. Here, we highlight a commonly overlooked bias in cross-sectional datasets that can distort progression modeling. Tumors become clinically detectable when they cause symptoms or are identified through imaging or tests. Detection factors, such as size, inflammation (fever, fatigue), and elevated biochemical markers, are influenced by genomic alterations. Ignoring these effects leads to "conditioning on a collider" bias, where events making the tumor more observable appear anticorrelated, creating false suppressive effects or masking promoting effects among genetic events. We enhance MHNs by incorporating the effects of genetic progression events on the inclusion of a tumor in a dataset, thus correcting for collider bias. We derive an efficient tensor formula for the likelihood function and apply it to two datasets from the MSK-IMPACT study. In colon adenocarcinoma, we observe a significantly higher rate of clinical detection for TP53-positive tumors, while in lung adenocarcinoma, the same is true for EGFR-positive tumors. Compared to classical MHNs, this approach eliminates several spurious suppressive interactions and uncovers multiple promoting effects.
期刊介绍:
Journal of Computational Biology is the leading peer-reviewed journal in computational biology and bioinformatics, publishing in-depth statistical, mathematical, and computational analysis of methods, as well as their practical impact. Available only online, this is an essential journal for scientists and students who want to keep abreast of developments in bioinformatics.
Journal of Computational Biology coverage includes:
-Genomics
-Mathematical modeling and simulation
-Distributed and parallel biological computing
-Designing biological databases
-Pattern matching and pattern detection
-Linking disparate databases and data
-New tools for computational biology
-Relational and object-oriented database technology for bioinformatics
-Biological expert system design and use
-Reasoning by analogy, hypothesis formation, and testing by machine
-Management of biological databases