Maksym A Jopek, Krzysztof Pastuszak, Michał Sieczczyński, Sebastian Cygert, Anna J Żaczek, Matthew T Rondina, Anna Supernat
{"title":"Improving platelet-RNA-based diagnostics: a comparative analysis of machine learning models for cancer detection and multiclass classification.","authors":"Maksym A Jopek, Krzysztof Pastuszak, Michał Sieczczyński, Sebastian Cygert, Anna J Żaczek, Matthew T Rondina, Anna Supernat","doi":"10.1002/1878-0261.13689","DOIUrl":null,"url":null,"abstract":"<p><p>Liquid biopsy demonstrates excellent potential in patient management by providing a minimally invasive and cost-effective approach to detecting and monitoring cancer, even at its early stages. Due to the complexity of liquid biopsy data, machine-learning techniques are increasingly gaining attention in sample analysis, especially for multidimensional data such as RNA expression profiles. Yet, there is no agreement in the community on which methods are the most effective or how to process the data. To circumvent this, we performed a large-scale study using various machine-learning techniques. First, we took a closer look at existing datasets and filtered out some patients to assert data collection quality. The final data collection included platelet RNA samples acquired from 1397 cancer patients (17 types of cancer) and 354 asymptomatic, presumed healthy, donors. Then, we assessed an array of different machine-learning models and techniques (e.g., feature selection of RNA transcripts) in pan-cancer detection and multiclass classification. Our results show that simple logistic regression performs the best, reaching a 68% cancer detection rate at a 99% specificity level, and multiclass classification accuracy of 79.38% when distinguishing between five cancer types. In summary, by revisiting classical machine-learning models, we have exceeded the previously used method by 5% and 9.65% in cancer detection and multiclass classification, respectively. To ease further research, we open-source our code and data processing pipelines (https://gitlab.com/jopekmaksym/improving-platelet-rna-based-diagnostics), which we hope will serve the community as a strong baseline.</p>","PeriodicalId":18764,"journal":{"name":"Molecular Oncology","volume":" ","pages":"2743-2754"},"PeriodicalIF":6.6000,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11547247/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Molecular Oncology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1002/1878-0261.13689","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/6/17 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"Biochemistry, Genetics and Molecular Biology","Score":null,"Total":0}
引用次数: 0
Abstract
Liquid biopsy demonstrates excellent potential in patient management by providing a minimally invasive and cost-effective approach to detecting and monitoring cancer, even at its early stages. Due to the complexity of liquid biopsy data, machine-learning techniques are increasingly gaining attention in sample analysis, especially for multidimensional data such as RNA expression profiles. Yet, there is no agreement in the community on which methods are the most effective or how to process the data. To circumvent this, we performed a large-scale study using various machine-learning techniques. First, we took a closer look at existing datasets and filtered out some patients to assert data collection quality. The final data collection included platelet RNA samples acquired from 1397 cancer patients (17 types of cancer) and 354 asymptomatic, presumed healthy, donors. Then, we assessed an array of different machine-learning models and techniques (e.g., feature selection of RNA transcripts) in pan-cancer detection and multiclass classification. Our results show that simple logistic regression performs the best, reaching a 68% cancer detection rate at a 99% specificity level, and multiclass classification accuracy of 79.38% when distinguishing between five cancer types. In summary, by revisiting classical machine-learning models, we have exceeded the previously used method by 5% and 9.65% in cancer detection and multiclass classification, respectively. To ease further research, we open-source our code and data processing pipelines (https://gitlab.com/jopekmaksym/improving-platelet-rna-based-diagnostics), which we hope will serve the community as a strong baseline.
Molecular OncologyBiochemistry, Genetics and Molecular Biology-Molecular Medicine
CiteScore
11.80
自引率
1.50%
发文量
203
审稿时长
10 weeks
期刊介绍:
Molecular Oncology highlights new discoveries, approaches, and technical developments, in basic, clinical and discovery-driven translational cancer research. It publishes research articles, reviews (by invitation only), and timely science policy articles.
The journal is now fully Open Access with all articles published over the past 10 years freely available.