Improving platelet-RNA-based diagnostics: a comparative analysis of machine learning models for cancer detection and multiclass classification.

IF 6.6 2区 医学 Q1 Biochemistry, Genetics and Molecular Biology
Molecular Oncology Pub Date : 2024-11-01 Epub Date: 2024-06-17 DOI:10.1002/1878-0261.13689
Maksym A Jopek, Krzysztof Pastuszak, Michał Sieczczyński, Sebastian Cygert, Anna J Żaczek, Matthew T Rondina, Anna Supernat
{"title":"Improving platelet-RNA-based diagnostics: a comparative analysis of machine learning models for cancer detection and multiclass classification.","authors":"Maksym A Jopek, Krzysztof Pastuszak, Michał Sieczczyński, Sebastian Cygert, Anna J Żaczek, Matthew T Rondina, Anna Supernat","doi":"10.1002/1878-0261.13689","DOIUrl":null,"url":null,"abstract":"<p><p>Liquid biopsy demonstrates excellent potential in patient management by providing a minimally invasive and cost-effective approach to detecting and monitoring cancer, even at its early stages. Due to the complexity of liquid biopsy data, machine-learning techniques are increasingly gaining attention in sample analysis, especially for multidimensional data such as RNA expression profiles. Yet, there is no agreement in the community on which methods are the most effective or how to process the data. To circumvent this, we performed a large-scale study using various machine-learning techniques. First, we took a closer look at existing datasets and filtered out some patients to assert data collection quality. The final data collection included platelet RNA samples acquired from 1397 cancer patients (17 types of cancer) and 354 asymptomatic, presumed healthy, donors. Then, we assessed an array of different machine-learning models and techniques (e.g., feature selection of RNA transcripts) in pan-cancer detection and multiclass classification. Our results show that simple logistic regression performs the best, reaching a 68% cancer detection rate at a 99% specificity level, and multiclass classification accuracy of 79.38% when distinguishing between five cancer types. In summary, by revisiting classical machine-learning models, we have exceeded the previously used method by 5% and 9.65% in cancer detection and multiclass classification, respectively. To ease further research, we open-source our code and data processing pipelines (https://gitlab.com/jopekmaksym/improving-platelet-rna-based-diagnostics), which we hope will serve the community as a strong baseline.</p>","PeriodicalId":18764,"journal":{"name":"Molecular Oncology","volume":" ","pages":"2743-2754"},"PeriodicalIF":6.6000,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11547247/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Molecular Oncology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1002/1878-0261.13689","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/6/17 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"Biochemistry, Genetics and Molecular Biology","Score":null,"Total":0}
引用次数: 0

Abstract

Liquid biopsy demonstrates excellent potential in patient management by providing a minimally invasive and cost-effective approach to detecting and monitoring cancer, even at its early stages. Due to the complexity of liquid biopsy data, machine-learning techniques are increasingly gaining attention in sample analysis, especially for multidimensional data such as RNA expression profiles. Yet, there is no agreement in the community on which methods are the most effective or how to process the data. To circumvent this, we performed a large-scale study using various machine-learning techniques. First, we took a closer look at existing datasets and filtered out some patients to assert data collection quality. The final data collection included platelet RNA samples acquired from 1397 cancer patients (17 types of cancer) and 354 asymptomatic, presumed healthy, donors. Then, we assessed an array of different machine-learning models and techniques (e.g., feature selection of RNA transcripts) in pan-cancer detection and multiclass classification. Our results show that simple logistic regression performs the best, reaching a 68% cancer detection rate at a 99% specificity level, and multiclass classification accuracy of 79.38% when distinguishing between five cancer types. In summary, by revisiting classical machine-learning models, we have exceeded the previously used method by 5% and 9.65% in cancer detection and multiclass classification, respectively. To ease further research, we open-source our code and data processing pipelines (https://gitlab.com/jopekmaksym/improving-platelet-rna-based-diagnostics), which we hope will serve the community as a strong baseline.

改进基于血小板-RNA 的诊断:癌症检测和多类分类机器学习模型的比较分析。
液体活检为检测和监测癌症(即使是早期癌症)提供了一种微创且经济有效的方法,在患者管理方面显示出巨大的潜力。由于液体活检数据的复杂性,机器学习技术在样本分析中日益受到重视,尤其是对于 RNA 表达谱等多维数据。然而,对于哪种方法最有效或如何处理数据,业界尚未达成一致。为了避免这种情况,我们利用各种机器学习技术进行了大规模研究。首先,我们仔细研究了现有的数据集,过滤掉了一些患者,以确保数据收集的质量。最终收集的数据包括从 1397 名癌症患者(17 种癌症)和 354 名无症状、假定健康的捐献者那里获得的血小板 RNA 样本。然后,我们对泛癌检测和多类分类中的一系列不同机器学习模型和技术(如 RNA 转录本的特征选择)进行了评估。结果表明,简单的逻辑回归表现最佳,在特异性水平为 99% 的情况下,癌症检测率达到 68%,在区分五种癌症类型时,多类分类准确率为 79.38%。总之,通过重新审视经典机器学习模型,我们在癌症检测和多类分类方面分别比以前使用的方法高出 5% 和 9.65%。为了方便进一步研究,我们开源了我们的代码和数据处理管道(https://gitlab.com/jopekmaksym/improving-platelet-rna-based-diagnostics),希望能为社区提供一个强大的基线。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Molecular Oncology
Molecular Oncology Biochemistry, Genetics and Molecular Biology-Molecular Medicine
CiteScore
11.80
自引率
1.50%
发文量
203
审稿时长
10 weeks
期刊介绍: Molecular Oncology highlights new discoveries, approaches, and technical developments, in basic, clinical and discovery-driven translational cancer research. It publishes research articles, reviews (by invitation only), and timely science policy articles. The journal is now fully Open Access with all articles published over the past 10 years freely available.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信