Mostafa Rezapour, Patrick M McNutt, David A Ornelles, Stephen J Walker, Sean V Murphy, Anthony Atala, Metin Nafi Gurcan
{"title":"Cross-modal predictive modeling of multi-omic data in 3D airway organ tissue equivalents during viral infection.","authors":"Mostafa Rezapour, Patrick M McNutt, David A Ornelles, Stephen J Walker, Sean V Murphy, Anthony Atala, Metin Nafi Gurcan","doi":"10.3389/fgene.2025.1658577","DOIUrl":null,"url":null,"abstract":"<p><strong>Introduction: </strong>Developing robust predictive models from multi-omics data is challenging because sample sizes are typically small (often fewer than 100) while the feature space is vast (over 20,000 molecular features such as genes, transcripts, and proteins), which increases the risk of overfitting and limits generalizability. To address this challenge, this study introduces the Magnitude-Altitude Score Analysis for Tracking Infection and Time-Dependent Genes (MASIT), a novel method adept at filtering out irrelevant features/genes while focusing on important ones.</p><p><strong>Methods: </strong>Applied to the 3D airway organ tissue equivalent model that mimics human airway physiology, MASIT employed both RNA-Seq and NanoString technologies for a comprehensive analysis. RNA-Seq offered a transcriptomic overview of 19,671 protein coding genes, whereas NanoString targeted 773 specific genes. We used MASIT to analyze gene expression changes in the airway tissue equivalent after exposure to Influenza A virus, Human metapneumovirus, and Parainfluenza virus type 3 at 24- and 72-hour post-infection. MASIT was trained and validated on NanoString data, tested on the held-out RNA-Seq test set, and benchmarked against widely used feature selection approaches, including Fisher score, minimum Redundancy Maximum Relevance, embedded Lasso regression, and Boruta feature importance.</p><p><strong>Results: </strong>MASIT achieved a 92% accuracy in differentiating eight groups of infected samples. Our findings showed that MASIT outperformed models using the full gene set, notably in algorithms like Random Forest, XGBoost, and AdaBoost. Selected genes such as IFIT1, IFIT2, IFIT3, OASL, IFI44, and OAS3 were particularly effective in categorizing samples by viral type and infection stage. Benchmarking further demonstrated that MASIT not only exceeded the performance of existing feature selection methods within NanoString data but also uniquely maintained high accuracy and stability when applied to held-out RNA-Seq data.</p><p><strong>Discussion: </strong>These results provide insights into the host's molecular response to viral infections and highlight MASIT as a robust tool for analyzing high-dimensional, small-sample multi-omics datasets.</p>","PeriodicalId":12750,"journal":{"name":"Frontiers in Genetics","volume":"16 ","pages":"1658577"},"PeriodicalIF":2.8000,"publicationDate":"2025-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12507369/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in Genetics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.3389/fgene.2025.1658577","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}
引用次数: 0
Abstract
Introduction: Developing robust predictive models from multi-omics data is challenging because sample sizes are typically small (often fewer than 100) while the feature space is vast (over 20,000 molecular features such as genes, transcripts, and proteins), which increases the risk of overfitting and limits generalizability. To address this challenge, this study introduces the Magnitude-Altitude Score Analysis for Tracking Infection and Time-Dependent Genes (MASIT), a novel method adept at filtering out irrelevant features/genes while focusing on important ones.
Methods: Applied to the 3D airway organ tissue equivalent model that mimics human airway physiology, MASIT employed both RNA-Seq and NanoString technologies for a comprehensive analysis. RNA-Seq offered a transcriptomic overview of 19,671 protein coding genes, whereas NanoString targeted 773 specific genes. We used MASIT to analyze gene expression changes in the airway tissue equivalent after exposure to Influenza A virus, Human metapneumovirus, and Parainfluenza virus type 3 at 24- and 72-hour post-infection. MASIT was trained and validated on NanoString data, tested on the held-out RNA-Seq test set, and benchmarked against widely used feature selection approaches, including Fisher score, minimum Redundancy Maximum Relevance, embedded Lasso regression, and Boruta feature importance.
Results: MASIT achieved a 92% accuracy in differentiating eight groups of infected samples. Our findings showed that MASIT outperformed models using the full gene set, notably in algorithms like Random Forest, XGBoost, and AdaBoost. Selected genes such as IFIT1, IFIT2, IFIT3, OASL, IFI44, and OAS3 were particularly effective in categorizing samples by viral type and infection stage. Benchmarking further demonstrated that MASIT not only exceeded the performance of existing feature selection methods within NanoString data but also uniquely maintained high accuracy and stability when applied to held-out RNA-Seq data.
Discussion: These results provide insights into the host's molecular response to viral infections and highlight MASIT as a robust tool for analyzing high-dimensional, small-sample multi-omics datasets.
Frontiers in GeneticsBiochemistry, Genetics and Molecular Biology-Molecular Medicine
CiteScore
5.50
自引率
8.10%
发文量
3491
审稿时长
14 weeks
期刊介绍:
Frontiers in Genetics publishes rigorously peer-reviewed research on genes and genomes relating to all the domains of life, from humans to plants to livestock and other model organisms. Led by an outstanding Editorial Board of the world’s leading experts, this multidisciplinary, open-access journal is at the forefront of communicating cutting-edge research to researchers, academics, clinicians, policy makers and the public.
The study of inheritance and the impact of the genome on various biological processes is well documented. However, the majority of discoveries are still to come. A new era is seeing major developments in the function and variability of the genome, the use of genetic and genomic tools and the analysis of the genetic basis of various biological phenomena.