Cross-modal predictive modeling of multi-omic data in 3D airway organ tissue equivalents during viral infection.

IF 2.8 3区生物学 Q2 GENETICS & HEREDITY

Frontiers in Genetics Pub Date : 2025-09-25 eCollection Date: 2025-01-01 DOI:10.3389/fgene.2025.1658577

Mostafa Rezapour, Patrick M McNutt, David A Ornelles, Stephen J Walker, Sean V Murphy, Anthony Atala, Metin Nafi Gurcan

{"title":"Cross-modal predictive modeling of multi-omic data in 3D airway organ tissue equivalents during viral infection.","authors":"Mostafa Rezapour, Patrick M McNutt, David A Ornelles, Stephen J Walker, Sean V Murphy, Anthony Atala, Metin Nafi Gurcan","doi":"10.3389/fgene.2025.1658577","DOIUrl":null,"url":null,"abstract":"Introduction: Developing robust predictive models from multi-omics data is challenging because sample sizes are typically small (often fewer than 100) while the feature space is vast (over 20,000 molecular features such as genes, transcripts, and proteins), which increases the risk of overfitting and limits generalizability. To address this challenge, this study introduces the Magnitude-Altitude Score Analysis for Tracking Infection and Time-Dependent Genes (MASIT), a novel method adept at filtering out irrelevant features/genes while focusing on important ones.Methods: Applied to the 3D airway organ tissue equivalent model that mimics human airway physiology, MASIT employed both RNA-Seq and NanoString technologies for a comprehensive analysis. RNA-Seq offered a transcriptomic overview of 19,671 protein coding genes, whereas NanoString targeted 773 specific genes. We used MASIT to analyze gene expression changes in the airway tissue equivalent after exposure to Influenza A virus, Human metapneumovirus, and Parainfluenza virus type 3 at 24- and 72-hour post-infection. MASIT was trained and validated on NanoString data, tested on the held-out RNA-Seq test set, and benchmarked against widely used feature selection approaches, including Fisher score, minimum Redundancy Maximum Relevance, embedded Lasso regression, and Boruta feature importance.Results: MASIT achieved a 92% accuracy in differentiating eight groups of infected samples. Our findings showed that MASIT outperformed models using the full gene set, notably in algorithms like Random Forest, XGBoost, and AdaBoost. Selected genes such as IFIT1, IFIT2, IFIT3, OASL, IFI44, and OAS3 were particularly effective in categorizing samples by viral type and infection stage. Benchmarking further demonstrated that MASIT not only exceeded the performance of existing feature selection methods within NanoString data but also uniquely maintained high accuracy and stability when applied to held-out RNA-Seq data.Discussion: These results provide insights into the host's molecular response to viral infections and highlight MASIT as a robust tool for analyzing high-dimensional, small-sample multi-omics datasets.","PeriodicalId":12750,"journal":{"name":"Frontiers in Genetics","volume":"16 ","pages":"1658577"},"PeriodicalIF":2.8000,"publicationDate":"2025-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12507369/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in Genetics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.3389/fgene.2025.1658577","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}

引用次数: 0

Abstract

Introduction: Developing robust predictive models from multi-omics data is challenging because sample sizes are typically small (often fewer than 100) while the feature space is vast (over 20,000 molecular features such as genes, transcripts, and proteins), which increases the risk of overfitting and limits generalizability. To address this challenge, this study introduces the Magnitude-Altitude Score Analysis for Tracking Infection and Time-Dependent Genes (MASIT), a novel method adept at filtering out irrelevant features/genes while focusing on important ones.

Methods: Applied to the 3D airway organ tissue equivalent model that mimics human airway physiology, MASIT employed both RNA-Seq and NanoString technologies for a comprehensive analysis. RNA-Seq offered a transcriptomic overview of 19,671 protein coding genes, whereas NanoString targeted 773 specific genes. We used MASIT to analyze gene expression changes in the airway tissue equivalent after exposure to Influenza A virus, Human metapneumovirus, and Parainfluenza virus type 3 at 24- and 72-hour post-infection. MASIT was trained and validated on NanoString data, tested on the held-out RNA-Seq test set, and benchmarked against widely used feature selection approaches, including Fisher score, minimum Redundancy Maximum Relevance, embedded Lasso regression, and Boruta feature importance.

Results: MASIT achieved a 92% accuracy in differentiating eight groups of infected samples. Our findings showed that MASIT outperformed models using the full gene set, notably in algorithms like Random Forest, XGBoost, and AdaBoost. Selected genes such as IFIT1, IFIT2, IFIT3, OASL, IFI44, and OAS3 were particularly effective in categorizing samples by viral type and infection stage. Benchmarking further demonstrated that MASIT not only exceeded the performance of existing feature selection methods within NanoString data but also uniquely maintained high accuracy and stability when applied to held-out RNA-Seq data.

Discussion: These results provide insights into the host's molecular response to viral infections and highlight MASIT as a robust tool for analyzing high-dimensional, small-sample multi-omics datasets.

查看原文本刊更多论文

病毒感染期间气道器官组织等效物三维多组学数据的跨模态预测建模。

从多组学数据中开发稳健的预测模型是具有挑战性的，因为样本量通常很小（通常小于100），而特征空间很大（超过20,000个分子特征，如基因、转录本和蛋白质），这增加了过度拟合的风险并限制了可泛化性。为了应对这一挑战，本研究引入了用于跟踪感染和时间依赖性基因的大小海拔分数分析（MASIT），这是一种新的方法，可以过滤掉无关的特征/基因，同时关注重要的特征/基因。方法：MASIT应用于模拟人类气道生理的三维气道器官组织等效模型，采用RNA-Seq和NanoString技术进行综合分析。RNA-Seq提供了19671个蛋白质编码基因的转录组学概述，而NanoString则针对773个特定基因。我们使用MASIT分析感染后24和72小时暴露于甲型流感病毒、人偏肺病毒和3型副流感病毒后气道组织当量的基因表达变化。MASIT在NanoString数据上进行了训练和验证，在hold -out RNA-Seq测试集上进行了测试，并对广泛使用的特征选择方法进行了基准测试，包括Fisher评分、最小冗余最大相关性、嵌入式Lasso回归和Boruta特征重要性。结果：MASIT对8组感染样本的鉴别准确率达到92%。我们的研究结果表明，MASIT优于使用完整基因集的模型，特别是在Random Forest， XGBoost和AdaBoost等算法中。选定的基因如IFIT1、IFIT2、IFIT3、OASL、IFI44和OAS3在根据病毒类型和感染阶段对样本进行分类方面特别有效。基准测试进一步表明，MASIT不仅在NanoString数据中超过了现有特征选择方法的性能，而且在hold -out RNA-Seq数据中也保持了较高的准确性和稳定性。讨论：这些结果提供了宿主对病毒感染的分子反应的见解，并突出了MASIT作为分析高维、小样本多组学数据集的强大工具。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Frontiers in Genetics Biochemistry, Genetics and Molecular Biology-Molecular Medicine

CiteScore

5.50

自引率

8.10%

发文量

3491

审稿时长

14 weeks

期刊介绍： Frontiers in Genetics publishes rigorously peer-reviewed research on genes and genomes relating to all the domains of life, from humans to plants to livestock and other model organisms. Led by an outstanding Editorial Board of the world’s leading experts, this multidisciplinary, open-access journal is at the forefront of communicating cutting-edge research to researchers, academics, clinicians, policy makers and the public. The study of inheritance and the impact of the genome on various biological processes is well documented. However, the majority of discoveries are still to come. A new era is seeing major developments in the function and variability of the genome, the use of genetic and genomic tools and the analysis of the genetic basis of various biological phenomena.