多代谢网络的共形新颖性检测

IF 2.9 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS
Ariane Marandon, Tabea Rebafka, Nataliya Sokolovska, Hédi Soula
{"title":"多代谢网络的共形新颖性检测","authors":"Ariane Marandon, Tabea Rebafka, Nataliya Sokolovska, Hédi Soula","doi":"10.1186/s12859-024-05971-8","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Graphical representations are useful to model complex data in general and biological interactions in particular. Our main motivation is the comparison of metabolic networks in the wider context of developing noninvasive accurate diagnostic tools. However, comparison and classification of graphs is still extremely challenging, although a number of highly efficient methods such as graph neural networks were developed in the recent decade. Important aspects are still lacking in graph classification: interpretability and guarantees on classification quality, i.e., control of the risk level or false discovery rate control.</p><p><strong>Results: </strong>In our contribution, we introduce a statistically sound approach to control the false discovery rate in a classification task for graphs in a semi-supervised setting. Our procedure identifies novelties in a dataset, where a graph is considered to be a novelty when its topology is significantly different from those in the reference class. It is noteworthy that the procedure is a conformal prediction approach, which does not make any distributional assumptions on the data and that can be seen as a wrapper around traditional machine learning models, so that it takes full advantage of existing methods. The performance of the proposed method is assessed on several standard benchmarks. It is also adapted and applied to the difficult task of classifying metabolic networks, where each graph is a representation of all metabolic reactions of a bacterium and to real task from a cancer data repository.</p><p><strong>Conclusions: </strong>Our approach efficiently controls - in highly complex data - the false discovery rate, while maximizing the true discovery rate to get the most reasonable predictive performance. This contribution is focused on confident classification of complex data, what can be further used to explore complex human pathologies and their mechanisms.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"25 1","pages":"358"},"PeriodicalIF":2.9000,"publicationDate":"2024-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11569617/pdf/","citationCount":"0","resultStr":"{\"title\":\"Conformal novelty detection for multiple metabolic networks.\",\"authors\":\"Ariane Marandon, Tabea Rebafka, Nataliya Sokolovska, Hédi Soula\",\"doi\":\"10.1186/s12859-024-05971-8\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Graphical representations are useful to model complex data in general and biological interactions in particular. Our main motivation is the comparison of metabolic networks in the wider context of developing noninvasive accurate diagnostic tools. However, comparison and classification of graphs is still extremely challenging, although a number of highly efficient methods such as graph neural networks were developed in the recent decade. Important aspects are still lacking in graph classification: interpretability and guarantees on classification quality, i.e., control of the risk level or false discovery rate control.</p><p><strong>Results: </strong>In our contribution, we introduce a statistically sound approach to control the false discovery rate in a classification task for graphs in a semi-supervised setting. Our procedure identifies novelties in a dataset, where a graph is considered to be a novelty when its topology is significantly different from those in the reference class. It is noteworthy that the procedure is a conformal prediction approach, which does not make any distributional assumptions on the data and that can be seen as a wrapper around traditional machine learning models, so that it takes full advantage of existing methods. The performance of the proposed method is assessed on several standard benchmarks. It is also adapted and applied to the difficult task of classifying metabolic networks, where each graph is a representation of all metabolic reactions of a bacterium and to real task from a cancer data repository.</p><p><strong>Conclusions: </strong>Our approach efficiently controls - in highly complex data - the false discovery rate, while maximizing the true discovery rate to get the most reasonable predictive performance. This contribution is focused on confident classification of complex data, what can be further used to explore complex human pathologies and their mechanisms.</p>\",\"PeriodicalId\":8958,\"journal\":{\"name\":\"BMC Bioinformatics\",\"volume\":\"25 1\",\"pages\":\"358\"},\"PeriodicalIF\":2.9000,\"publicationDate\":\"2024-11-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11569617/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"BMC Bioinformatics\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1186/s12859-024-05971-8\",\"RegionNum\":3,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"BIOCHEMICAL RESEARCH METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s12859-024-05971-8","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0

摘要

背景:图形表示法对于复杂数据建模,特别是生物相互作用建模非常有用。我们的主要动机是在开发非侵入性精确诊断工具的大背景下对代谢网络进行比较。然而,尽管近十年来开发出了许多高效方法(如图神经网络),但图的比较和分类仍然极具挑战性。图分类仍缺乏重要方面:可解释性和分类质量保证,即风险水平控制或误诊率控制:在我们的贡献中,我们介绍了一种在半监督环境下控制图分类任务中错误发现率的合理统计方法。我们的程序可以识别数据集中的新奇事物,当一个图的拓扑结构与参考类中的图有显著不同时,该图就会被视为新奇事物。值得注意的是,该程序是一种保形预测方法,它不对数据做任何分布假设,可被视为传统机器学习模型的包装,从而充分利用现有方法的优势。我们在几个标准基准上对所提出方法的性能进行了评估。该方法还适用于代谢网络分类这一艰巨任务,其中每个图都代表了细菌的所有代谢反应,并适用于癌症数据存储库中的实际任务:我们的方法在高度复杂的数据中有效地控制了错误发现率,同时最大限度地提高了真实发现率,从而获得最合理的预测性能。这一贡献的重点是对复杂数据进行有把握的分类,可进一步用于探索复杂的人类病理及其机制。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Conformal novelty detection for multiple metabolic networks.

Background: Graphical representations are useful to model complex data in general and biological interactions in particular. Our main motivation is the comparison of metabolic networks in the wider context of developing noninvasive accurate diagnostic tools. However, comparison and classification of graphs is still extremely challenging, although a number of highly efficient methods such as graph neural networks were developed in the recent decade. Important aspects are still lacking in graph classification: interpretability and guarantees on classification quality, i.e., control of the risk level or false discovery rate control.

Results: In our contribution, we introduce a statistically sound approach to control the false discovery rate in a classification task for graphs in a semi-supervised setting. Our procedure identifies novelties in a dataset, where a graph is considered to be a novelty when its topology is significantly different from those in the reference class. It is noteworthy that the procedure is a conformal prediction approach, which does not make any distributional assumptions on the data and that can be seen as a wrapper around traditional machine learning models, so that it takes full advantage of existing methods. The performance of the proposed method is assessed on several standard benchmarks. It is also adapted and applied to the difficult task of classifying metabolic networks, where each graph is a representation of all metabolic reactions of a bacterium and to real task from a cancer data repository.

Conclusions: Our approach efficiently controls - in highly complex data - the false discovery rate, while maximizing the true discovery rate to get the most reasonable predictive performance. This contribution is focused on confident classification of complex data, what can be further used to explore complex human pathologies and their mechanisms.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
BMC Bioinformatics
BMC Bioinformatics 生物-生化研究方法
CiteScore
5.70
自引率
3.30%
发文量
506
审稿时长
4.3 months
期刊介绍: BMC Bioinformatics is an open access, peer-reviewed journal that considers articles on all aspects of the development, testing and novel application of computational and statistical methods for the modeling and analysis of all kinds of biological data, as well as other areas of computational biology. BMC Bioinformatics is part of the BMC series which publishes subject-specific journals focused on the needs of individual research communities across all areas of biology and medicine. We offer an efficient, fair and friendly peer review service, and are committed to publishing all sound science, provided that there is some advance in knowledge presented by the work.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信