Amy R Zhao, Valentina L Kouznetsova, Santosh Kesari, Igor F Tsigelny
{"title":"使用piRNA生物标志物的乳腺癌机器学习诊断。","authors":"Amy R Zhao, Valentina L Kouznetsova, Santosh Kesari, Igor F Tsigelny","doi":"10.1080/1354750X.2025.2461067","DOIUrl":null,"url":null,"abstract":"<p><strong>Background and objectives: </strong>Prior studies have shown that small non-coding RNAs (sncRNAs) are associated with cancer occurrence or development. Recently, a newly discovered class of small ncRNAs known as PIWI-interacting RNAs (piRNAs) have been found to play a vital role in physiological processes and cancer initiation. This study aims to utilize piRNAs as innovative, noninvasive diagnostic biomarkers for breast cancer. Our objective is to develop computational methods that leverage piRNA attributes for breast cancer prediction and its application in diagnostics.</p><p><strong>Methods: </strong>We created a set of piRNA sequence descriptors using information extracted from the piRNA sequences. To ensure accuracy, we found a path to convert non-standard piRNA names to standard ones to enable precise identification of these sequences. Using these descriptors, we applied machine-learning (ML) techniques in WEKA (Waikato Environment for Knowledge Analysis) to a dataset of piRNA to assess the predictive accuracy of the following classifiers: Logistic Regression model, Sequential Minimal Optimization (SMO), Random Forest classifier, and Logistic Model Tree (LMT). Furthermore, we performed Shapley additive explanations (SHAP) Analysis to understand which descriptors were the most relevant to the prediction accuracy. The ML models were then validated on an independent dataset to evaluate their effectiveness in predicting breast cancer.</p><p><strong>Results: </strong>The top three performing classifiers in WEKA were Logistic Regression, SMO, and LMT. The Logistic Regression model achieved an accuracy of 90.7% in predicting breast cancer, while SMO and LMT attained 89.7% and 85.65%, respectively.</p><p><strong>Conclusions: </strong>Our study demonstrates the effectiveness of using ML-based piRNA classifiers in diagnosing breast cancer and contributes to the growing body of evidence supporting piRNAs as biomarkers in cancer diagnosis. However, additional research is needed to validate these findings and further assess the clinical applicability of this approach.</p>","PeriodicalId":8921,"journal":{"name":"Biomarkers","volume":" ","pages":"167-177"},"PeriodicalIF":2.0000,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Machine-learning diagnostics of breast cancer using piRNA biomarkers.\",\"authors\":\"Amy R Zhao, Valentina L Kouznetsova, Santosh Kesari, Igor F Tsigelny\",\"doi\":\"10.1080/1354750X.2025.2461067\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background and objectives: </strong>Prior studies have shown that small non-coding RNAs (sncRNAs) are associated with cancer occurrence or development. Recently, a newly discovered class of small ncRNAs known as PIWI-interacting RNAs (piRNAs) have been found to play a vital role in physiological processes and cancer initiation. This study aims to utilize piRNAs as innovative, noninvasive diagnostic biomarkers for breast cancer. Our objective is to develop computational methods that leverage piRNA attributes for breast cancer prediction and its application in diagnostics.</p><p><strong>Methods: </strong>We created a set of piRNA sequence descriptors using information extracted from the piRNA sequences. To ensure accuracy, we found a path to convert non-standard piRNA names to standard ones to enable precise identification of these sequences. Using these descriptors, we applied machine-learning (ML) techniques in WEKA (Waikato Environment for Knowledge Analysis) to a dataset of piRNA to assess the predictive accuracy of the following classifiers: Logistic Regression model, Sequential Minimal Optimization (SMO), Random Forest classifier, and Logistic Model Tree (LMT). Furthermore, we performed Shapley additive explanations (SHAP) Analysis to understand which descriptors were the most relevant to the prediction accuracy. The ML models were then validated on an independent dataset to evaluate their effectiveness in predicting breast cancer.</p><p><strong>Results: </strong>The top three performing classifiers in WEKA were Logistic Regression, SMO, and LMT. The Logistic Regression model achieved an accuracy of 90.7% in predicting breast cancer, while SMO and LMT attained 89.7% and 85.65%, respectively.</p><p><strong>Conclusions: </strong>Our study demonstrates the effectiveness of using ML-based piRNA classifiers in diagnosing breast cancer and contributes to the growing body of evidence supporting piRNAs as biomarkers in cancer diagnosis. However, additional research is needed to validate these findings and further assess the clinical applicability of this approach.</p>\",\"PeriodicalId\":8921,\"journal\":{\"name\":\"Biomarkers\",\"volume\":\" \",\"pages\":\"167-177\"},\"PeriodicalIF\":2.0000,\"publicationDate\":\"2025-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Biomarkers\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1080/1354750X.2025.2461067\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/3/4 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q3\",\"JCRName\":\"BIOTECHNOLOGY & APPLIED MICROBIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biomarkers","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1080/1354750X.2025.2461067","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/3/4 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"BIOTECHNOLOGY & APPLIED MICROBIOLOGY","Score":null,"Total":0}
引用次数: 0
摘要
背景和目的:先前的研究表明,小分子非编码rna (sncRNAs)与癌症的发生或发展有关。最近,新发现的一类被称为piwi相互作用rna (piRNAs)的小ncrna在生理过程和癌症发生中起着至关重要的作用。本研究旨在利用pirna作为创新的、无创的乳腺癌诊断生物标志物。我们的目标是开发利用piRNA属性进行乳腺癌预测及其在诊断中的应用的计算方法。方法:我们利用从piRNA序列中提取的信息创建了一组piRNA序列描述符。为了确保准确性,我们找到了将非标准piRNA转换为标准名称的路径,以便精确识别这些序列。使用这些描述符,我们将WEKA (Waikato Environment for Knowledge Analysis)中的机器学习(ML)技术应用于piRNA数据集,以评估以下分类器的预测准确性:逻辑回归模型、顺序最小优化(SMO)、随机森林分类器和逻辑模型树(LMT)。此外,我们进行了Shapley加性解释(SHAP)分析,以了解哪些描述符与预测精度最相关。然后在独立数据集上验证ML模型,以评估其预测乳腺癌的有效性。结果:WEKA中表现最好的三个分类器分别是Logistic回归、SMO和LMT。Logistic回归模型预测乳腺癌的准确率为90.7%,SMO和LMT预测准确率分别为89.7%和85.65%。结论:我们的研究证明了使用基于ml的piRNA分类器诊断乳腺癌的有效性,并为越来越多的证据支持piRNA作为癌症诊断的生物标志物做出了贡献。然而,需要进一步的研究来验证这些发现并进一步评估该方法的临床适用性。
Machine-learning diagnostics of breast cancer using piRNA biomarkers.
Background and objectives: Prior studies have shown that small non-coding RNAs (sncRNAs) are associated with cancer occurrence or development. Recently, a newly discovered class of small ncRNAs known as PIWI-interacting RNAs (piRNAs) have been found to play a vital role in physiological processes and cancer initiation. This study aims to utilize piRNAs as innovative, noninvasive diagnostic biomarkers for breast cancer. Our objective is to develop computational methods that leverage piRNA attributes for breast cancer prediction and its application in diagnostics.
Methods: We created a set of piRNA sequence descriptors using information extracted from the piRNA sequences. To ensure accuracy, we found a path to convert non-standard piRNA names to standard ones to enable precise identification of these sequences. Using these descriptors, we applied machine-learning (ML) techniques in WEKA (Waikato Environment for Knowledge Analysis) to a dataset of piRNA to assess the predictive accuracy of the following classifiers: Logistic Regression model, Sequential Minimal Optimization (SMO), Random Forest classifier, and Logistic Model Tree (LMT). Furthermore, we performed Shapley additive explanations (SHAP) Analysis to understand which descriptors were the most relevant to the prediction accuracy. The ML models were then validated on an independent dataset to evaluate their effectiveness in predicting breast cancer.
Results: The top three performing classifiers in WEKA were Logistic Regression, SMO, and LMT. The Logistic Regression model achieved an accuracy of 90.7% in predicting breast cancer, while SMO and LMT attained 89.7% and 85.65%, respectively.
Conclusions: Our study demonstrates the effectiveness of using ML-based piRNA classifiers in diagnosing breast cancer and contributes to the growing body of evidence supporting piRNAs as biomarkers in cancer diagnosis. However, additional research is needed to validate these findings and further assess the clinical applicability of this approach.
期刊介绍:
The journal Biomarkers brings together all aspects of the rapidly growing field of biomarker research, encompassing their various uses and applications in one essential source.
Biomarkers provides a vital forum for the exchange of ideas and concepts in all areas of biomarker research. High quality papers in four main areas are accepted and manuscripts describing novel biomarkers and their subsequent validation are especially encouraged:
• Biomarkers of disease
• Biomarkers of exposure
• Biomarkers of response
• Biomarkers of susceptibility
Manuscripts can describe biomarkers measured in humans or other animals in vivo or in vitro. Biomarkers will consider publishing negative data from studies of biomarkers of susceptibility in human populations.