Towards the genome-scale discovery of bivariate monotonic classifiers.

IF 3.3 3区生物学 Q2 BIOCHEMICAL RESEARCH METHODS

BMC Bioinformatics Pub Date : 2025-09-02 DOI:10.1186/s12859-025-06253-7

Océane Fourquet, Martin S Krejca, Carola Doerr, Benno Schwikowski

{"title":"Towards the genome-scale discovery of bivariate monotonic classifiers.","authors":"Océane Fourquet, Martin S Krejca, Carola Doerr, Benno Schwikowski","doi":"10.1186/s12859-025-06253-7","DOIUrl":null,"url":null,"abstract":"Background: Bivariate monotonic classifiers (BMCs) are based on pairs of input features. Like many other models used for machine learning, they can capture nonlinear patterns in high-dimensional data. At the same time, they are simple and easy to interpret. Until now, the use of BMCs on a genome scale was hampered by the high computational complexity of the search for pairs of features with a high leave-one-out performance estimate.Results: We introduce the fastBMC algorithm, which drastically speeds up the identification of BMCs. The algorithm is based on a mathematical bound for the BMC performance estimate while maintaining optimality. We show empirically that fastBMC speeds up the computation by a factor of at least 15 already for a small number of features, compared to the traditional approach. For two of the three smaller biomedical datasets that we consider here, the resulting possibility of considering much larger sets of features translates into significantly improved classification performance. As an example of the high degree of interpretability of BMCs, we discuss a straightforward interpretation of a BMC glioblastoma survival predictor, an immediate novel biomedical hypothesis, options for biomedical validation, and treatment implications. In addition, we study the performance of fastBMC on a larger and well-known breast cancer dataset, validating the benefits of the BMCs for biomarker identification and biomedical hypothesis generation.Conclusion: fastBMC enables the rapid construction of robust and interpretable ensemble models using BMC, facilitating the discovery of gene pairs predictive of relevant phenotypes and their interaction in that context.Availability: We provide the first open-source implementation for learning BMCs, a Python implementation of fastBMC in particular, and Python code to reproduce the fastBMC results on real and simulated data in this paper, at https://github.com/oceanefrqt/fastBMC .","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"228"},"PeriodicalIF":3.3000,"publicationDate":"2025-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12403431/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s12859-025-06253-7","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Bivariate monotonic classifiers (BMCs) are based on pairs of input features. Like many other models used for machine learning, they can capture nonlinear patterns in high-dimensional data. At the same time, they are simple and easy to interpret. Until now, the use of BMCs on a genome scale was hampered by the high computational complexity of the search for pairs of features with a high leave-one-out performance estimate.

Results: We introduce the fastBMC algorithm, which drastically speeds up the identification of BMCs. The algorithm is based on a mathematical bound for the BMC performance estimate while maintaining optimality. We show empirically that fastBMC speeds up the computation by a factor of at least 15 already for a small number of features, compared to the traditional approach. For two of the three smaller biomedical datasets that we consider here, the resulting possibility of considering much larger sets of features translates into significantly improved classification performance. As an example of the high degree of interpretability of BMCs, we discuss a straightforward interpretation of a BMC glioblastoma survival predictor, an immediate novel biomedical hypothesis, options for biomedical validation, and treatment implications. In addition, we study the performance of fastBMC on a larger and well-known breast cancer dataset, validating the benefits of the BMCs for biomarker identification and biomedical hypothesis generation.

Conclusion: fastBMC enables the rapid construction of robust and interpretable ensemble models using BMC, facilitating the discovery of gene pairs predictive of relevant phenotypes and their interaction in that context.

Availability: We provide the first open-source implementation for learning BMCs, a Python implementation of fastBMC in particular, and Python code to reproduce the fastBMC results on real and simulated data in this paper, at https://github.com/oceanefrqt/fastBMC .

Abstract Image

查看原文本刊更多论文

迈向双变量单调分类器的基因组尺度发现。

背景：二元单调分类器（BMCs）是基于对输入特征。像许多其他用于机器学习的模型一样，它们可以捕获高维数据中的非线性模式。同时，它们简单易懂。到目前为止，在基因组规模上使用bmc受到搜索具有高留一性能估计的特征对的高计算复杂性的阻碍。结果：我们引入了fastBMC算法，大大加快了bmc的识别速度。该算法基于BMC性能估计的数学边界，同时保持最优性。我们的经验表明，与传统方法相比，fastBMC在少数特征上的计算速度至少提高了15倍。对于我们在这里考虑的三个较小的生物医学数据集中的两个，考虑更大的特征集的可能性转化为显著提高的分类性能。作为BMC高度可解释性的一个例子，我们讨论了BMC胶质母细胞瘤生存预测因子的直接解释，一个即时的新的生物医学假设，生物医学验证的选择和治疗意义。此外，我们研究了fastBMC在一个更大的知名乳腺癌数据集上的性能，验证了fastBMC在生物标志物识别和生物医学假设生成方面的优势。结论：fastBMC可以使用BMC快速构建健壮且可解释的集成模型，促进发现预测相关表型的基因对及其在该背景下的相互作用。可用性：我们提供了学习bmc的第一个开源实现，特别是fastBMC的Python实现，以及在本文中在真实和模拟数据上重现fastBMC结果的Python代码，网址为https://github.com/oceanefrqt/fastBMC。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

BMC Bioinformatics 生物-生化研究方法

CiteScore

5.70

自引率

3.30%

发文量

506

审稿时长

4.3 months

期刊介绍： BMC Bioinformatics is an open access, peer-reviewed journal that considers articles on all aspects of the development, testing and novel application of computational and statistical methods for the modeling and analysis of all kinds of biological data, as well as other areas of computational biology. BMC Bioinformatics is part of the BMC series which publishes subject-specific journals focused on the needs of individual research communities across all areas of biology and medicine. We offer an efficient, fair and friendly peer review service, and are committed to publishing all sound science, provided that there is some advance in knowledge presented by the work.