SVM-DO: identification of tumor-discriminating mRNA signatures via support vector machines supported by Disease Ontology.

Turkish journal of biology = Turk biyoloji dergisi Pub Date : 2023-12-14 eCollection Date: 2023-01-01 DOI:10.55730/1300-0152.2670

Mustafa Erhan Özer, Pemra Özbek Sarica, Kazım Yalçın Arğa

{"title":"SVM-DO: identification of tumor-discriminating mRNA signatures via support vector machines supported by Disease Ontology.","authors":"Mustafa Erhan Özer, Pemra Özbek Sarica, Kazım Yalçın Arğa","doi":"10.55730/1300-0152.2670","DOIUrl":null,"url":null,"abstract":"Background/aim: The complicated nature of tumor formation makes it difficult to identify discriminatory genes. Recently, transcriptome-based supervised classification methods using support vector machines (SVMs) have become popular in this field. However, the inclusion of less significant variables in the construction of classification models can lead to misclassification. To improve model performance, feature selection methods such as enrichment analysis can be used to extract useful variable sets. The detection of genes that can discriminate between normal and tumor samples in the association of cancer and disease remains an area of limited information. We therefore aimed to discover novel and practical sets of discriminatory biomarkers by utilizing the association of cancer and disease.Materials and methods: In this study, we employed an SVM classification method for differentially expressed genes enriched by Disease Ontology and filtered nondiscriminatory features using Wilk's lambda criterion prior to classification. Our approach uses the discovery of disease-associated genes as a viable strategy to identify gene sets that discriminate between tumor and normal states. We analyzed the performance of our algorithm using comprehensive RNA-Seq data for adenocarcinoma of the colon, squamous cell carcinoma of the lung, and adenocarcinoma of the lung. The classification performance of the obtained gene sets was analyzed by comparison with different expression datasets and previous studies using the same datasets.Results: It was found that our algorithm extracts stable small gene sets that provide high accuracy in predicting cancer status. In addition, the gene sets generated by our method perform well in survival analyses, indicating their potential for prognosis.Conclusion: By combining gene sets for both diagnosis and prognosis, our method can improve clinical applications in cancer research. Our algorithm is available as an R package with a graphical user interface in Bioconductor (https://doi.org/10.18129/B9.bioc.SVMDO) and GitHub (https://github.com/robogeno/SVMDO).","PeriodicalId":94363,"journal":{"name":"Turkish journal of biology = Turk biyoloji dergisi","volume":"47 6","pages":"349-365"},"PeriodicalIF":0.0000,"publicationDate":"2023-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11045210/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Turkish journal of biology = Turk biyoloji dergisi","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.55730/1300-0152.2670","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2023/1/1 0:00:00","PubModel":"eCollection","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Background/aim: The complicated nature of tumor formation makes it difficult to identify discriminatory genes. Recently, transcriptome-based supervised classification methods using support vector machines (SVMs) have become popular in this field. However, the inclusion of less significant variables in the construction of classification models can lead to misclassification. To improve model performance, feature selection methods such as enrichment analysis can be used to extract useful variable sets. The detection of genes that can discriminate between normal and tumor samples in the association of cancer and disease remains an area of limited information. We therefore aimed to discover novel and practical sets of discriminatory biomarkers by utilizing the association of cancer and disease.

Materials and methods: In this study, we employed an SVM classification method for differentially expressed genes enriched by Disease Ontology and filtered nondiscriminatory features using Wilk's lambda criterion prior to classification. Our approach uses the discovery of disease-associated genes as a viable strategy to identify gene sets that discriminate between tumor and normal states. We analyzed the performance of our algorithm using comprehensive RNA-Seq data for adenocarcinoma of the colon, squamous cell carcinoma of the lung, and adenocarcinoma of the lung. The classification performance of the obtained gene sets was analyzed by comparison with different expression datasets and previous studies using the same datasets.

Results: It was found that our algorithm extracts stable small gene sets that provide high accuracy in predicting cancer status. In addition, the gene sets generated by our method perform well in survival analyses, indicating their potential for prognosis.

Conclusion: By combining gene sets for both diagnosis and prognosis, our method can improve clinical applications in cancer research. Our algorithm is available as an R package with a graphical user interface in Bioconductor (https://doi.org/10.18129/B9.bioc.SVMDO) and GitHub (https://github.com/robogeno/SVMDO).

查看原文本刊更多论文

SVM-DO：通过疾病本体支持的支持向量机识别肿瘤鉴别mRNA特征。

背景/目的：肿瘤形成的复杂性使得识别鉴别基因变得十分困难。最近，使用支持向量机（SVM）的基于转录组的监督分类方法在这一领域大受欢迎。然而，在构建分类模型时，如果将不太重要的变量纳入其中，可能会导致分类错误。为了提高模型性能，可以使用富集分析等特征选择方法来提取有用的变量集。在癌症与疾病的关联中，检测能够区分正常样本和肿瘤样本的基因仍然是一个信息有限的领域。因此，我们的目标是利用癌症与疾病的关联发现新颖实用的判别生物标志物集：在这项研究中，我们采用 SVM 分类方法对疾病本体富集的差异表达基因进行分类，并在分类前使用 Wilk's lambda 标准过滤非歧视性特征。我们的方法将发现疾病相关基因作为一种可行的策略，以识别区分肿瘤和正常状态的基因集。我们利用结肠腺癌、肺鳞癌和肺腺癌的全面 RNA-Seq 数据分析了我们算法的性能。通过与不同的表达数据集和之前使用相同数据集的研究进行比较，分析了所获得基因集的分类性能：结果：研究发现，我们的算法能提取稳定的小基因集，在预测癌症状态方面具有很高的准确性。此外，我们的方法生成的基因组在生存分析中表现良好，表明它们在预后方面具有潜力：结论：通过结合基因组进行诊断和预后分析，我们的方法可以改善癌症研究的临床应用。我们的算法以 R 软件包的形式在 Bioconductor (https://doi.org/10.18129/B9.bioc.SVMDO) 和 GitHub (https://github.com/robogeno/SVMDO) 提供，并带有图形用户界面。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Turkish journal of biology = Turk biyoloji dergisi

自引率

0.00%

发文量