BC-predict:挖掘信号生物标志物,建立早期乳腺癌亚型和预后模型。

IF 3.9 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY
Frontiers in bioinformatics Pub Date : 2025-09-18 eCollection Date: 2025-01-01 DOI:10.3389/fbinf.2025.1644695
Sangeetha Muthamilselvan, Natarajan Vaithilingam, Ashok Palaniappan
{"title":"BC-predict:挖掘信号生物标志物,建立早期乳腺癌亚型和预后模型。","authors":"Sangeetha Muthamilselvan, Natarajan Vaithilingam, Ashok Palaniappan","doi":"10.3389/fbinf.2025.1644695","DOIUrl":null,"url":null,"abstract":"<p><strong>Introduction: </strong>Disease heterogeneity is the hallmark of breast cancer, which is the most common female malignancy. With a disturbing increase in mortality and disease burden, there remains a need for effective early-stage theragnostic and prognostic biomarkers. In this work, we improved on BrcaDx (https://apalania.shinyapps.io/brcadx/) for cancer vs control screening and examined a cluster of adjoining learning problems in breast cancer heterogeneity: (i) identification of metastatic cancers; (ii) molecular subtyping (TNBC, HER2, or luminal); and (iii) histological subtyping (invasive ductal or invasive lobular).</p><p><strong>Methods: </strong>We analyzed the transcriptomic profiles of breast cancer patients from public-domain databases such as the TCGA using stage-encoded problem-specific statistical models of gene expression and unveiled stage-salient and progression-significant genes. Using a consensus approach, we identified potential machine learning features, and considered six model classes for each learning problem, with hyperparameter optimization on a training dataset and evaluation on a holdout test dataset. A nested approach enabled us to identify the best model class for each learning problem.</p><p><strong>Results: </strong>External validation of the best models yielded balanced accuracies of 97.42% for cancer vs normal; 88.22% for metastatic v/s non metastatic; 88.79% for ternary molecular subtyping; and ensemble accuracy of 94.23% for histological subtyping. The model for molecular subtyping was validated on a 26-sample TNBC-only out-of-distribution cohort, yielding 25 correct predictions. We performed a late integration of multi-omics datasets by validating the feature space used in each problem with miRNA profiles, methylation profiles, and commercial breast cancer panels.</p><p><strong>Discussion: </strong>Pending prospective studies, we have translated the models into BC-Predict that forks the best models developed for each problem in a unified interface and provides a complete readout for input instances of expression data, including uncertainty estimates. BC-Predict is freely available for non-commercial purposes at: https://apalania.shinyapps.io/BC-Predict.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1644695"},"PeriodicalIF":3.9000,"publicationDate":"2025-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12488574/pdf/","citationCount":"0","resultStr":"{\"title\":\"BC-predict: mining of signal biomarkers and production of models for early-stage breast cancer subtyping and prognosis.\",\"authors\":\"Sangeetha Muthamilselvan, Natarajan Vaithilingam, Ashok Palaniappan\",\"doi\":\"10.3389/fbinf.2025.1644695\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Introduction: </strong>Disease heterogeneity is the hallmark of breast cancer, which is the most common female malignancy. With a disturbing increase in mortality and disease burden, there remains a need for effective early-stage theragnostic and prognostic biomarkers. In this work, we improved on BrcaDx (https://apalania.shinyapps.io/brcadx/) for cancer vs control screening and examined a cluster of adjoining learning problems in breast cancer heterogeneity: (i) identification of metastatic cancers; (ii) molecular subtyping (TNBC, HER2, or luminal); and (iii) histological subtyping (invasive ductal or invasive lobular).</p><p><strong>Methods: </strong>We analyzed the transcriptomic profiles of breast cancer patients from public-domain databases such as the TCGA using stage-encoded problem-specific statistical models of gene expression and unveiled stage-salient and progression-significant genes. Using a consensus approach, we identified potential machine learning features, and considered six model classes for each learning problem, with hyperparameter optimization on a training dataset and evaluation on a holdout test dataset. A nested approach enabled us to identify the best model class for each learning problem.</p><p><strong>Results: </strong>External validation of the best models yielded balanced accuracies of 97.42% for cancer vs normal; 88.22% for metastatic v/s non metastatic; 88.79% for ternary molecular subtyping; and ensemble accuracy of 94.23% for histological subtyping. The model for molecular subtyping was validated on a 26-sample TNBC-only out-of-distribution cohort, yielding 25 correct predictions. We performed a late integration of multi-omics datasets by validating the feature space used in each problem with miRNA profiles, methylation profiles, and commercial breast cancer panels.</p><p><strong>Discussion: </strong>Pending prospective studies, we have translated the models into BC-Predict that forks the best models developed for each problem in a unified interface and provides a complete readout for input instances of expression data, including uncertainty estimates. BC-Predict is freely available for non-commercial purposes at: https://apalania.shinyapps.io/BC-Predict.</p>\",\"PeriodicalId\":73066,\"journal\":{\"name\":\"Frontiers in bioinformatics\",\"volume\":\"5 \",\"pages\":\"1644695\"},\"PeriodicalIF\":3.9000,\"publicationDate\":\"2025-09-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12488574/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Frontiers in bioinformatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3389/fbinf.2025.1644695\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q2\",\"JCRName\":\"MATHEMATICAL & COMPUTATIONAL BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in bioinformatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/fbinf.2025.1644695","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}
引用次数: 0

摘要

乳腺癌是最常见的女性恶性肿瘤,疾病异质性是其特征。随着死亡率和疾病负担的令人不安的增加,仍然需要有效的早期诊断和预后生物标志物。在这项工作中,我们改进了BrcaDx (https://apalania.shinyapps.io/brcadx/)用于癌症与对照筛查,并检查了乳腺癌异质性中一系列相邻的学习问题:(i)转移性癌症的识别;(ii)分子分型(TNBC、HER2或luminal);组织学分型(浸润性导管或浸润性小叶)。方法:我们使用分期编码的问题特异性基因表达统计模型,从公共领域数据库(如TCGA)中分析乳腺癌患者的转录组谱,并揭示分期显著性和进展显著性基因。使用共识方法,我们确定了潜在的机器学习特征,并为每个学习问题考虑了六个模型类,在训练数据集上进行了超参数优化,并在holdout测试数据集上进行了评估。嵌套方法使我们能够为每个学习问题确定最佳的模型类。结果:最佳模型的外部验证获得了97.42%的癌症与正常的平衡精度;转移vs非转移率为88.22%;三元分子分型占88.79%;组织学分型的集合准确率为94.23%。分子分型模型在26个样本中进行了验证,得到了25个正确的预测。我们通过验证miRNA图谱、甲基化图谱和商业乳腺癌小组在每个问题中使用的特征空间,进行了多组学数据集的后期整合。讨论:在进行前瞻性研究之前,我们已经将模型翻译成BC-Predict,该模型在统一的界面中为每个问题开发了最佳模型,并为表达式数据的输入实例提供了完整的读数,包括不确定性估计。BC-Predict免费用于非商业目的:https://apalania.shinyapps.io/BC-Predict。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

BC-predict: mining of signal biomarkers and production of models for early-stage breast cancer subtyping and prognosis.

BC-predict: mining of signal biomarkers and production of models for early-stage breast cancer subtyping and prognosis.

BC-predict: mining of signal biomarkers and production of models for early-stage breast cancer subtyping and prognosis.

BC-predict: mining of signal biomarkers and production of models for early-stage breast cancer subtyping and prognosis.

Introduction: Disease heterogeneity is the hallmark of breast cancer, which is the most common female malignancy. With a disturbing increase in mortality and disease burden, there remains a need for effective early-stage theragnostic and prognostic biomarkers. In this work, we improved on BrcaDx (https://apalania.shinyapps.io/brcadx/) for cancer vs control screening and examined a cluster of adjoining learning problems in breast cancer heterogeneity: (i) identification of metastatic cancers; (ii) molecular subtyping (TNBC, HER2, or luminal); and (iii) histological subtyping (invasive ductal or invasive lobular).

Methods: We analyzed the transcriptomic profiles of breast cancer patients from public-domain databases such as the TCGA using stage-encoded problem-specific statistical models of gene expression and unveiled stage-salient and progression-significant genes. Using a consensus approach, we identified potential machine learning features, and considered six model classes for each learning problem, with hyperparameter optimization on a training dataset and evaluation on a holdout test dataset. A nested approach enabled us to identify the best model class for each learning problem.

Results: External validation of the best models yielded balanced accuracies of 97.42% for cancer vs normal; 88.22% for metastatic v/s non metastatic; 88.79% for ternary molecular subtyping; and ensemble accuracy of 94.23% for histological subtyping. The model for molecular subtyping was validated on a 26-sample TNBC-only out-of-distribution cohort, yielding 25 correct predictions. We performed a late integration of multi-omics datasets by validating the feature space used in each problem with miRNA profiles, methylation profiles, and commercial breast cancer panels.

Discussion: Pending prospective studies, we have translated the models into BC-Predict that forks the best models developed for each problem in a unified interface and provides a complete readout for input instances of expression data, including uncertainty estimates. BC-Predict is freely available for non-commercial purposes at: https://apalania.shinyapps.io/BC-Predict.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
2.60
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信