Cross-validation for training and testing co-occurrence network inference algorithms.

IF 2.9 3区生物学 Q2 BIOCHEMICAL RESEARCH METHODS

BMC Bioinformatics Pub Date : 2025-03-06 DOI:10.1186/s12859-025-06083-7

Daniel Agyapong, Jeffrey Ryan Propster, Jane Marks, Toby Dylan Hocking

{"title":"Cross-validation for training and testing co-occurrence network inference algorithms.","authors":"Daniel Agyapong, Jeffrey Ryan Propster, Jane Marks, Toby Dylan Hocking","doi":"10.1186/s12859-025-06083-7","DOIUrl":null,"url":null,"abstract":"Background: Microorganisms are found in almost every environment, including soil, water, air and inside other organisms, such as animals and plants. While some microorganisms cause diseases, most of them help in biological processes such as decomposition, fermentation and nutrient cycling. Much research has been conducted on the study of microbial communities in various environments and how their interactions and relationships can provide insight into various diseases. Co-occurrence network inference algorithms help us understand the complex associations of micro-organisms, especially bacteria. Existing network inference algorithms employ techniques such as correlation, regularized linear regression, and conditional dependence, which have different hyper-parameters that determine the sparsity of the network. These complex microbial communities form intricate ecological networks that are fundamental to ecosystem functioning and host health. Understanding these networks is crucial for developing targeted interventions in both environmental and clinical settings. The emergence of high-throughput sequencing technologies has generated unprecedented amounts of microbiome data, necessitating robust computational methods for network inference and validation.Results: Previous methods for evaluating the quality of the inferred network include using external data, and network consistency across sub-samples, both of which have several drawbacks that limit their applicability in real microbiome composition data sets. We propose a novel cross-validation method to evaluate co-occurrence network inference algorithms, and new methods for applying existing algorithms to predict on test data. Our method demonstrates superior performance in handling compositional data and addressing the challenges of high dimensionality and sparsity inherent in real microbiome datasets. The proposed framework also provides robust estimates of network stability.Conclusions: Our empirical study shows that the proposed cross-validation method is useful for hyper-parameter selection (training) and comparing the quality of inferred networks between different algorithms (testing). This advancement represents a significant step forward in microbiome network analysis, providing researchers with a reliable tool for understanding complex microbial interactions. The method's applicability extends beyond microbiome studies to other fields where network inference from high-dimensional compositional data is crucial, such as gene regulatory networks and ecological food webs. Our framework establishes a new standard for validation in network inference, potentially accelerating discoveries in microbial ecology and human health.","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"74"},"PeriodicalIF":2.9000,"publicationDate":"2025-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11883995/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s12859-025-06083-7","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Microorganisms are found in almost every environment, including soil, water, air and inside other organisms, such as animals and plants. While some microorganisms cause diseases, most of them help in biological processes such as decomposition, fermentation and nutrient cycling. Much research has been conducted on the study of microbial communities in various environments and how their interactions and relationships can provide insight into various diseases. Co-occurrence network inference algorithms help us understand the complex associations of micro-organisms, especially bacteria. Existing network inference algorithms employ techniques such as correlation, regularized linear regression, and conditional dependence, which have different hyper-parameters that determine the sparsity of the network. These complex microbial communities form intricate ecological networks that are fundamental to ecosystem functioning and host health. Understanding these networks is crucial for developing targeted interventions in both environmental and clinical settings. The emergence of high-throughput sequencing technologies has generated unprecedented amounts of microbiome data, necessitating robust computational methods for network inference and validation.

Results: Previous methods for evaluating the quality of the inferred network include using external data, and network consistency across sub-samples, both of which have several drawbacks that limit their applicability in real microbiome composition data sets. We propose a novel cross-validation method to evaluate co-occurrence network inference algorithms, and new methods for applying existing algorithms to predict on test data. Our method demonstrates superior performance in handling compositional data and addressing the challenges of high dimensionality and sparsity inherent in real microbiome datasets. The proposed framework also provides robust estimates of network stability.

Conclusions: Our empirical study shows that the proposed cross-validation method is useful for hyper-parameter selection (training) and comparing the quality of inferred networks between different algorithms (testing). This advancement represents a significant step forward in microbiome network analysis, providing researchers with a reliable tool for understanding complex microbial interactions. The method's applicability extends beyond microbiome studies to other fields where network inference from high-dimensional compositional data is crucial, such as gene regulatory networks and ecological food webs. Our framework establishes a new standard for validation in network inference, potentially accelerating discoveries in microbial ecology and human health.

查看原文本刊更多论文

交叉验证的训练和测试共现网络推理算法。

背景：微生物几乎存在于每一种环境中，包括土壤、水、空气和其他生物体内，如动物和植物。虽然有些微生物引起疾病，但大多数微生物有助于分解、发酵和营养循环等生物过程。人们对各种环境中的微生物群落以及它们之间的相互作用和关系如何为了解各种疾病提供了很多研究。共现网络推理算法帮助我们理解微生物，尤其是细菌的复杂关联。现有的网络推理算法采用相关性、正则化线性回归和条件依赖等技术，这些技术具有不同的超参数，这些超参数决定了网络的稀疏性。这些复杂的微生物群落形成了复杂的生态网络，是生态系统功能和宿主健康的基础。了解这些网络对于在环境和临床环境中制定有针对性的干预措施至关重要。高通量测序技术的出现产生了前所未有的微生物组数据，需要强大的计算方法来进行网络推断和验证。结果：先前评估推断网络质量的方法包括使用外部数据和跨子样本的网络一致性，这两种方法都有一些缺点，限制了它们在真实微生物组组成数据集中的适用性。我们提出了一种新的交叉验证方法来评估共现网络推理算法，以及应用现有算法对测试数据进行预测的新方法。我们的方法在处理成分数据和解决真实微生物组数据集固有的高维数和稀疏性挑战方面表现出卓越的性能。该框架还提供了对网络稳定性的稳健估计。结论：我们的实证研究表明，所提出的交叉验证方法可用于超参数选择（训练）和比较不同算法之间推断网络的质量（测试）。这一进展代表了微生物组网络分析向前迈出的重要一步，为研究人员了解复杂的微生物相互作用提供了可靠的工具。该方法的适用性从微生物组研究扩展到其他领域，在这些领域，从高维成分数据推断网络是至关重要的，如基因调控网络和生态食物网。我们的框架为网络推理的验证建立了一个新的标准，有可能加速微生物生态学和人类健康的发现。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

BMC Bioinformatics 生物-生化研究方法

CiteScore

5.70

自引率

3.30%

发文量

506

审稿时长

4.3 months

期刊介绍： BMC Bioinformatics is an open access, peer-reviewed journal that considers articles on all aspects of the development, testing and novel application of computational and statistical methods for the modeling and analysis of all kinds of biological data, as well as other areas of computational biology. BMC Bioinformatics is part of the BMC series which publishes subject-specific journals focused on the needs of individual research communities across all areas of biology and medicine. We offer an efficient, fair and friendly peer review service, and are committed to publishing all sound science, provided that there is some advance in knowledge presented by the work.