{"title":"Perfect collinearity not created equal: measuring and visualizing the severity of multi-collinearity of modern omics data.","authors":"Wei Q Deng, Radu V Craiu, Lei Sun","doi":"10.1515/sagmb-2025-0043","DOIUrl":null,"url":null,"abstract":"<p><p>Multi-collinearity frequently occurs in modern statistical applications and when ignored, can negatively impact model selection and statistical inference. Though perfect collinearity is always present in \"<i>n</i> < <i>p</i>\" data, we demonstrate that perfect collinearity arises differently, from diverse data redundancy patterns and/or data dimensions. Classic tools and measures that were developed for \"<i>n</i> > <i>p</i>\" data cannot be used to distinguish or visualize these patterns in the high-dimensional regime. Here we propose 1) new individualized measures that can be used to visualize patterns of perfect collinearity, and subsequently 2) global measures to assess the overall burden of multi-collinearity irrespective of data dimensions. We applied these measures to the human X chromosome data to understand similarity and differences in linkage disequilibrium structure due to sex and genetic features. The measures can highlight gene regions of excessive multi-collinearity and contrast the severity of perfect collinearity between different sexes. Utility of these measures to high-dimensional statistical application were also discussed.</p>","PeriodicalId":49477,"journal":{"name":"Statistical Applications in Genetics and Molecular Biology","volume":"24 1","pages":""},"PeriodicalIF":0.4000,"publicationDate":"2026-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12909097/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Statistical Applications in Genetics and Molecular Biology","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1515/sagmb-2025-0043","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q3","JCRName":"Mathematics","Score":null,"Total":0}
引用次数: 0
Abstract
Multi-collinearity frequently occurs in modern statistical applications and when ignored, can negatively impact model selection and statistical inference. Though perfect collinearity is always present in "n < p" data, we demonstrate that perfect collinearity arises differently, from diverse data redundancy patterns and/or data dimensions. Classic tools and measures that were developed for "n > p" data cannot be used to distinguish or visualize these patterns in the high-dimensional regime. Here we propose 1) new individualized measures that can be used to visualize patterns of perfect collinearity, and subsequently 2) global measures to assess the overall burden of multi-collinearity irrespective of data dimensions. We applied these measures to the human X chromosome data to understand similarity and differences in linkage disequilibrium structure due to sex and genetic features. The measures can highlight gene regions of excessive multi-collinearity and contrast the severity of perfect collinearity between different sexes. Utility of these measures to high-dimensional statistical application were also discussed.
期刊介绍:
Statistical Applications in Genetics and Molecular Biology seeks to publish significant research on the application of statistical ideas to problems arising from computational biology. The focus of the papers should be on the relevant statistical issues but should contain a succinct description of the relevant biological problem being considered. The range of topics is wide and will include topics such as linkage mapping, association studies, gene finding and sequence alignment, protein structure prediction, design and analysis of microarray data, molecular evolution and phylogenetic trees, DNA topology, and data base search strategies. Both original research and review articles will be warmly received.