从突变和网络分析中预测卵巢癌驱动基因的机器学习方法

IF 1.7 4区 计算机科学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS
Rucha Wadapurkar, S. Bapat, Rupali A. Mahajan, R. Vyas
{"title":"从突变和网络分析中预测卵巢癌驱动基因的机器学习方法","authors":"Rucha Wadapurkar, S. Bapat, Rupali A. Mahajan, R. Vyas","doi":"10.1108/dta-03-2022-0096","DOIUrl":null,"url":null,"abstract":"PurposeOvarian cancer (OC) is the most common type of gynecologic cancer in the world with a high rate of mortality. Due to manifestation of generic symptoms and absence of specific biomarkers, OC is usually diagnosed at a late stage. Machine learning models can be employed to predict driver genes implicated in causative mutations.Design/methodology/approachIn the present study, a comprehensive next generation sequencing (NGS) analysis of whole exome sequences of 47 OC patients was carried out to identify clinically significant mutations. Nine functional features of 708 mutations identified were input into a machine learning classification model by employing the eXtreme Gradient Boosting (XGBoost) classifier method for prediction of OC driver genes.FindingsThe XGBoost classifier model yielded a classification accuracy of 0.946, which was superior to that obtained by other classifiers such as decision tree, Naive Bayes, random forest and support vector machine. Further, an interaction network was generated to identify and establish correlations with cancer-associated pathways and gene ontology data.Originality/valueThe final results revealed 12 putative candidate cancer driver genes, namely LAMA3, LAMC3, COL6A1, COL5A1, COL2A1, UGT1A1, BDNF, ANK1, WNT10A, FZD4, PLEKHG5 and CYP2C9, that may have implications in clinical diagnosis.","PeriodicalId":56156,"journal":{"name":"Data Technologies and Applications","volume":" ","pages":""},"PeriodicalIF":1.7000,"publicationDate":"2023-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Machine learning approaches for prediction of ovarian cancer driver genes from mutational and network analysis\",\"authors\":\"Rucha Wadapurkar, S. Bapat, Rupali A. Mahajan, R. Vyas\",\"doi\":\"10.1108/dta-03-2022-0096\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"PurposeOvarian cancer (OC) is the most common type of gynecologic cancer in the world with a high rate of mortality. Due to manifestation of generic symptoms and absence of specific biomarkers, OC is usually diagnosed at a late stage. Machine learning models can be employed to predict driver genes implicated in causative mutations.Design/methodology/approachIn the present study, a comprehensive next generation sequencing (NGS) analysis of whole exome sequences of 47 OC patients was carried out to identify clinically significant mutations. Nine functional features of 708 mutations identified were input into a machine learning classification model by employing the eXtreme Gradient Boosting (XGBoost) classifier method for prediction of OC driver genes.FindingsThe XGBoost classifier model yielded a classification accuracy of 0.946, which was superior to that obtained by other classifiers such as decision tree, Naive Bayes, random forest and support vector machine. Further, an interaction network was generated to identify and establish correlations with cancer-associated pathways and gene ontology data.Originality/valueThe final results revealed 12 putative candidate cancer driver genes, namely LAMA3, LAMC3, COL6A1, COL5A1, COL2A1, UGT1A1, BDNF, ANK1, WNT10A, FZD4, PLEKHG5 and CYP2C9, that may have implications in clinical diagnosis.\",\"PeriodicalId\":56156,\"journal\":{\"name\":\"Data Technologies and Applications\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":1.7000,\"publicationDate\":\"2023-04-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Data Technologies and Applications\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1108/dta-03-2022-0096\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Data Technologies and Applications","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1108/dta-03-2022-0096","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

摘要

目的癌症是世界上最常见的妇科癌症,死亡率高。由于一般症状的表现和缺乏特定的生物标志物,OC通常在晚期被诊断出来。机器学习模型可以用来预测与致病突变有关的驱动基因。设计/方法/方法在本研究中,对47名OC患者的全外显子组序列进行了全面的下一代测序(NGS)分析,以确定具有临床意义的突变。通过使用用于预测OC驱动基因的极限梯度增强(XGBoost)分类器方法,将所识别的708个突变的9个功能特征输入到机器学习分类模型中。结果XGBoost分类器模型的分类精度为0.946,优于决策树、朴素贝叶斯、随机森林和支持向量机等其他分类器。此外,生成了一个交互网络,以识别和建立与癌症相关途径和基因本体数据的相关性。原创性/价值最终结果揭示了12个推定的癌症驱动基因候选,即LAMA3、LAMC3、COL6A1、COL5A1、COL2A1、UGT1A1、BDNF、ANK1、WNT10A、FZD4、PLEKHG5和CYP2C9,这些基因可能对临床诊断有影响。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Machine learning approaches for prediction of ovarian cancer driver genes from mutational and network analysis
PurposeOvarian cancer (OC) is the most common type of gynecologic cancer in the world with a high rate of mortality. Due to manifestation of generic symptoms and absence of specific biomarkers, OC is usually diagnosed at a late stage. Machine learning models can be employed to predict driver genes implicated in causative mutations.Design/methodology/approachIn the present study, a comprehensive next generation sequencing (NGS) analysis of whole exome sequences of 47 OC patients was carried out to identify clinically significant mutations. Nine functional features of 708 mutations identified were input into a machine learning classification model by employing the eXtreme Gradient Boosting (XGBoost) classifier method for prediction of OC driver genes.FindingsThe XGBoost classifier model yielded a classification accuracy of 0.946, which was superior to that obtained by other classifiers such as decision tree, Naive Bayes, random forest and support vector machine. Further, an interaction network was generated to identify and establish correlations with cancer-associated pathways and gene ontology data.Originality/valueThe final results revealed 12 putative candidate cancer driver genes, namely LAMA3, LAMC3, COL6A1, COL5A1, COL2A1, UGT1A1, BDNF, ANK1, WNT10A, FZD4, PLEKHG5 and CYP2C9, that may have implications in clinical diagnosis.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Data Technologies and Applications
Data Technologies and Applications Social Sciences-Library and Information Sciences
CiteScore
3.80
自引率
6.20%
发文量
29
期刊介绍: Previously published as: Program Online from: 2018 Subject Area: Information & Knowledge Management, Library Studies
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信