Functional Annotations of Novel Cancer-Associated lncRNAs Identified Using Machine Learning Algorithms

Luis Diego Mora-Jimenez, Oscar Azofeifa-Segura, J. Guevara-Coto
{"title":"Functional Annotations of Novel Cancer-Associated lncRNAs Identified Using Machine Learning Algorithms","authors":"Luis Diego Mora-Jimenez, Oscar Azofeifa-Segura, J. Guevara-Coto","doi":"10.1109/CSCI49370.2019.00274","DOIUrl":null,"url":null,"abstract":"Cancer consists of a set of diseases that result from deregulated cell growth and invasion of adjacent tissues. Due to an increase in research, more information has become available regarding the potential causes for cancer, including non-coding elements such as lncRNAs. This new knowledge can be discovered through machine learning methods that can extract new information from data such as gene expression profiles and identify new cancer-associated genes. For this work we use two different machine learning algorithms, random forests and support vector machines. The models were trained and we tested fine-tuning methods including: balancing and feature selection. The predictors with the highest metrics were: balanced RF with Boruta (AUC-ROC: 0.9696) and the balanced SVM with recursive feature elimination (AUC-ROC: 0.9710). These models were used to identify new potential lncRNA driver-like genes from protein coding expression data. The predicted candidates were then functionally annotated using disease ontologies and molecular function ontologies to determine their enrichment in cancer related processes. These processes included prostate cancer and glycosaminglycan binding, a potential tumor therapeutic target.","PeriodicalId":103662,"journal":{"name":"2019 International Conference on Computational Science and Computational Intelligence (CSCI)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 International Conference on Computational Science and Computational Intelligence (CSCI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CSCI49370.2019.00274","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Cancer consists of a set of diseases that result from deregulated cell growth and invasion of adjacent tissues. Due to an increase in research, more information has become available regarding the potential causes for cancer, including non-coding elements such as lncRNAs. This new knowledge can be discovered through machine learning methods that can extract new information from data such as gene expression profiles and identify new cancer-associated genes. For this work we use two different machine learning algorithms, random forests and support vector machines. The models were trained and we tested fine-tuning methods including: balancing and feature selection. The predictors with the highest metrics were: balanced RF with Boruta (AUC-ROC: 0.9696) and the balanced SVM with recursive feature elimination (AUC-ROC: 0.9710). These models were used to identify new potential lncRNA driver-like genes from protein coding expression data. The predicted candidates were then functionally annotated using disease ontologies and molecular function ontologies to determine their enrichment in cancer related processes. These processes included prostate cancer and glycosaminglycan binding, a potential tumor therapeutic target.
使用机器学习算法鉴定的新型癌症相关lncrna的功能注释
癌症由一系列疾病组成,这些疾病是由于细胞生长失控和邻近组织的侵入而引起的。由于研究的增加,关于癌症的潜在原因的信息越来越多,包括lncrna等非编码元件。这种新知识可以通过机器学习方法发现,机器学习方法可以从基因表达谱等数据中提取新信息,并识别新的癌症相关基因。在这项工作中,我们使用了两种不同的机器学习算法,随机森林和支持向量机。我们对模型进行了训练,并测试了包括平衡和特征选择在内的微调方法。预测指标最高的分别是:Boruta的平衡SVM (AUC-ROC: 0.9696)和递归特征消除的平衡SVM (AUC-ROC: 0.9710)。这些模型用于从蛋白质编码表达数据中鉴定新的潜在的lncRNA驱动样基因。然后使用疾病本体论和分子功能本体论对预测的候选物进行功能注释,以确定它们在癌症相关过程中的富集程度。这些过程包括前列腺癌和糖saminglycan结合,一个潜在的肿瘤治疗靶点。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信