多代谢网络的共形新颖性检测

IF 2.9 3区生物学 Q2 BIOCHEMICAL RESEARCH METHODS

BMC Bioinformatics Pub Date : 2024-11-16 DOI:10.1186/s12859-024-05971-8

Ariane Marandon, Tabea Rebafka, Nataliya Sokolovska, Hédi Soula

{"title":"多代谢网络的共形新颖性检测","authors":"Ariane Marandon, Tabea Rebafka, Nataliya Sokolovska, Hédi Soula","doi":"10.1186/s12859-024-05971-8","DOIUrl":null,"url":null,"abstract":"Background: Graphical representations are useful to model complex data in general and biological interactions in particular. Our main motivation is the comparison of metabolic networks in the wider context of developing noninvasive accurate diagnostic tools. However, comparison and classification of graphs is still extremely challenging, although a number of highly efficient methods such as graph neural networks were developed in the recent decade. Important aspects are still lacking in graph classification: interpretability and guarantees on classification quality, i.e., control of the risk level or false discovery rate control.Results: In our contribution, we introduce a statistically sound approach to control the false discovery rate in a classification task for graphs in a semi-supervised setting. Our procedure identifies novelties in a dataset, where a graph is considered to be a novelty when its topology is significantly different from those in the reference class. It is noteworthy that the procedure is a conformal prediction approach, which does not make any distributional assumptions on the data and that can be seen as a wrapper around traditional machine learning models, so that it takes full advantage of existing methods. The performance of the proposed method is assessed on several standard benchmarks. It is also adapted and applied to the difficult task of classifying metabolic networks, where each graph is a representation of all metabolic reactions of a bacterium and to real task from a cancer data repository.Conclusions: Our approach efficiently controls - in highly complex data - the false discovery rate, while maximizing the true discovery rate to get the most reasonable predictive performance. This contribution is focused on confident classification of complex data, what can be further used to explore complex human pathologies and their mechanisms.","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"25 1","pages":"358"},"PeriodicalIF":2.9000,"publicationDate":"2024-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11569617/pdf/","citationCount":"0","resultStr":"{\"title\":\"Conformal novelty detection for multiple metabolic networks.\",\"authors\":\"Ariane Marandon, Tabea Rebafka, Nataliya Sokolovska, Hédi Soula\",\"doi\":\"10.1186/s12859-024-05971-8\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Background: Graphical representations are useful to model complex data in general and biological interactions in particular. Our main motivation is the comparison of metabolic networks in the wider context of developing noninvasive accurate diagnostic tools. However, comparison and classification of graphs is still extremely challenging, although a number of highly efficient methods such as graph neural networks were developed in the recent decade. Important aspects are still lacking in graph classification: interpretability and guarantees on classification quality, i.e., control of the risk level or false discovery rate control.Results: In our contribution, we introduce a statistically sound approach to control the false discovery rate in a classification task for graphs in a semi-supervised setting. Our procedure identifies novelties in a dataset, where a graph is considered to be a novelty when its topology is significantly different from those in the reference class. It is noteworthy that the procedure is a conformal prediction approach, which does not make any distributional assumptions on the data and that can be seen as a wrapper around traditional machine learning models, so that it takes full advantage of existing methods. The performance of the proposed method is assessed on several standard benchmarks. It is also adapted and applied to the difficult task of classifying metabolic networks, where each graph is a representation of all metabolic reactions of a bacterium and to real task from a cancer data repository.Conclusions: Our approach efficiently controls - in highly complex data - the false discovery rate, while maximizing the true discovery rate to get the most reasonable predictive performance. This contribution is focused on confident classification of complex data, what can be further used to explore complex human pathologies and their mechanisms.\",\"PeriodicalId\":8958,\"journal\":{\"name\":\"BMC Bioinformatics\",\"volume\":\"25 1\",\"pages\":\"358\"},\"PeriodicalIF\":2.9000,\"publicationDate\":\"2024-11-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11569617/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"BMC Bioinformatics\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1186/s12859-024-05971-8\",\"RegionNum\":3,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"BIOCHEMICAL RESEARCH METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s12859-024-05971-8","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}

引用次数: 0

摘要

背景：图形表示法对于复杂数据建模，特别是生物相互作用建模非常有用。我们的主要动机是在开发非侵入性精确诊断工具的大背景下对代谢网络进行比较。然而，尽管近十年来开发出了许多高效方法（如图神经网络），但图的比较和分类仍然极具挑战性。图分类仍缺乏重要方面：可解释性和分类质量保证，即风险水平控制或误诊率控制：在我们的贡献中，我们介绍了一种在半监督环境下控制图分类任务中错误发现率的合理统计方法。我们的程序可以识别数据集中的新奇事物，当一个图的拓扑结构与参考类中的图有显著不同时，该图就会被视为新奇事物。值得注意的是，该程序是一种保形预测方法，它不对数据做任何分布假设，可被视为传统机器学习模型的包装，从而充分利用现有方法的优势。我们在几个标准基准上对所提出方法的性能进行了评估。该方法还适用于代谢网络分类这一艰巨任务，其中每个图都代表了细菌的所有代谢反应，并适用于癌症数据存储库中的实际任务：我们的方法在高度复杂的数据中有效地控制了错误发现率，同时最大限度地提高了真实发现率，从而获得最合理的预测性能。这一贡献的重点是对复杂数据进行有把握的分类，可进一步用于探索复杂的人类病理及其机制。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Conformal novelty detection for multiple metabolic networks.

Background: Graphical representations are useful to model complex data in general and biological interactions in particular. Our main motivation is the comparison of metabolic networks in the wider context of developing noninvasive accurate diagnostic tools. However, comparison and classification of graphs is still extremely challenging, although a number of highly efficient methods such as graph neural networks were developed in the recent decade. Important aspects are still lacking in graph classification: interpretability and guarantees on classification quality, i.e., control of the risk level or false discovery rate control.

Results: In our contribution, we introduce a statistically sound approach to control the false discovery rate in a classification task for graphs in a semi-supervised setting. Our procedure identifies novelties in a dataset, where a graph is considered to be a novelty when its topology is significantly different from those in the reference class. It is noteworthy that the procedure is a conformal prediction approach, which does not make any distributional assumptions on the data and that can be seen as a wrapper around traditional machine learning models, so that it takes full advantage of existing methods. The performance of the proposed method is assessed on several standard benchmarks. It is also adapted and applied to the difficult task of classifying metabolic networks, where each graph is a representation of all metabolic reactions of a bacterium and to real task from a cancer data repository.

Conclusions: Our approach efficiently controls - in highly complex data - the false discovery rate, while maximizing the true discovery rate to get the most reasonable predictive performance. This contribution is focused on confident classification of complex data, what can be further used to explore complex human pathologies and their mechanisms.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

BMC Bioinformatics 生物-生化研究方法

CiteScore

5.70

自引率

3.30%

发文量

506

审稿时长

4.3 months

期刊介绍： BMC Bioinformatics is an open access, peer-reviewed journal that considers articles on all aspects of the development, testing and novel application of computational and statistical methods for the modeling and analysis of all kinds of biological data, as well as other areas of computational biology. BMC Bioinformatics is part of the BMC series which publishes subject-specific journals focused on the needs of individual research communities across all areas of biology and medicine. We offer an efficient, fair and friendly peer review service, and are committed to publishing all sound science, provided that there is some advance in knowledge presented by the work.