基于机器学习算法的遗传变异集成叠加分类

Y. Jahnavi, Poongothai Elango, S. Raja, P. Kumar
{"title":"基于机器学习算法的遗传变异集成叠加分类","authors":"Y. Jahnavi, Poongothai Elango, S. Raja, P. Kumar","doi":"10.1142/s0219467823500158","DOIUrl":null,"url":null,"abstract":"Genetics is the clinical review of congenital mutation, where the principal advantage of analyzing genetic mutation of humans is the exploration, analysis, interpretation and description of the genetic transmitted and inherited effect of several diseases such as cancer, diabetes and heart diseases. Cancer is the most troublesome and disordered affliction as the proportion of cancer sufferers is growing massively. Identification and discrimination of the mutations that impart to the enlargement of tumor from the unbiased mutations is difficult, as majority tumors of cancer are able to exercise genetic mutations. The genetic mutations are systematized and categorized to sort the cancer by way of medical observations and considering clinical studies. At the present time, genetic mutations are being annotated and these interpretations are being accomplished either manually or using the existing primary algorithms. Evaluation and classification of each and every individual genetic mutation was basically predicated on evidence from documented content built on medical literature. Consequently, as a means to build genetic mutations, basically, depending on the clinical evidences persists a challenging task. There exist various algorithms such as one hot encoding technique is used to derive features from genes and their variations, TF-IDF is used to extract features from the clinical text data. In order to increase the accuracy of the classification, machine learning algorithms such as support vector machine, logistic regression, Naive Bayes, etc., are experimented. A stacking model classifier has been developed to increase the accuracy. The proposed stacking model classifier has obtained the log loss 0.8436 and 0.8572 for cross-validation data set and test data set, respectively. By the experimentation, it has been proved that the proposed stacking model classifier outperforms the existing algorithms in terms of log loss. Basically, minimum log loss refers to the efficient model. Here the log loss has been reduced to less than 1 by using the proposed stacking model classifier. The performance of these algorithms can be gauged on the basis of the various measures like multi-class log loss.","PeriodicalId":177479,"journal":{"name":"Int. J. Image Graph.","volume":"79 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"A Novel Ensemble Stacking Classification of Genetic Variations Using Machine Learning Algorithms\",\"authors\":\"Y. Jahnavi, Poongothai Elango, S. Raja, P. Kumar\",\"doi\":\"10.1142/s0219467823500158\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Genetics is the clinical review of congenital mutation, where the principal advantage of analyzing genetic mutation of humans is the exploration, analysis, interpretation and description of the genetic transmitted and inherited effect of several diseases such as cancer, diabetes and heart diseases. Cancer is the most troublesome and disordered affliction as the proportion of cancer sufferers is growing massively. Identification and discrimination of the mutations that impart to the enlargement of tumor from the unbiased mutations is difficult, as majority tumors of cancer are able to exercise genetic mutations. The genetic mutations are systematized and categorized to sort the cancer by way of medical observations and considering clinical studies. At the present time, genetic mutations are being annotated and these interpretations are being accomplished either manually or using the existing primary algorithms. Evaluation and classification of each and every individual genetic mutation was basically predicated on evidence from documented content built on medical literature. Consequently, as a means to build genetic mutations, basically, depending on the clinical evidences persists a challenging task. There exist various algorithms such as one hot encoding technique is used to derive features from genes and their variations, TF-IDF is used to extract features from the clinical text data. In order to increase the accuracy of the classification, machine learning algorithms such as support vector machine, logistic regression, Naive Bayes, etc., are experimented. A stacking model classifier has been developed to increase the accuracy. The proposed stacking model classifier has obtained the log loss 0.8436 and 0.8572 for cross-validation data set and test data set, respectively. By the experimentation, it has been proved that the proposed stacking model classifier outperforms the existing algorithms in terms of log loss. Basically, minimum log loss refers to the efficient model. Here the log loss has been reduced to less than 1 by using the proposed stacking model classifier. The performance of these algorithms can be gauged on the basis of the various measures like multi-class log loss.\",\"PeriodicalId\":177479,\"journal\":{\"name\":\"Int. J. Image Graph.\",\"volume\":\"79 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-12-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Int. J. Image Graph.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1142/s0219467823500158\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Int. J. Image Graph.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1142/s0219467823500158","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

遗传学是先天性突变的临床综述,对人类基因突变进行分析的主要优点是对癌症、糖尿病、心脏病等几种疾病的遗传传递和遗传作用进行探索、分析、解释和描述。随着癌症患者比例的大幅增长,癌症是最麻烦、最混乱的疾病。从无偏突变中识别和区分导致肿瘤扩大的突变是困难的,因为大多数癌症肿瘤都能够进行基因突变。结合医学观察和临床研究,对基因突变进行系统化分类,对癌症进行分类。目前,基因突变正在被注释,这些解释要么是手动完成的,要么是使用现有的主要算法完成的。每个个体基因突变的评估和分类基本上都是基于医学文献中记录的证据。因此,作为构建基因突变的手段,基本上依靠临床证据仍然是一项具有挑战性的任务。目前存在多种算法,如一种热编码技术用于从基因及其变异中提取特征,TF-IDF用于从临床文本数据中提取特征。为了提高分类的准确率,实验了支持向量机、逻辑回归、朴素贝叶斯等机器学习算法。为了提高分类精度,开发了一种堆叠模型分类器。本文提出的叠加模型分类器在交叉验证数据集和测试数据集上的对数损失分别为0.8436和0.8572。通过实验证明,本文提出的叠加模型分类器在对数损失方面优于现有算法。基本上,最小对数损失指的是高效模型。在这里,通过使用所提出的堆叠模型分类器,日志损失减少到小于1。这些算法的性能可以根据多类日志损失等各种度量来衡量。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
A Novel Ensemble Stacking Classification of Genetic Variations Using Machine Learning Algorithms
Genetics is the clinical review of congenital mutation, where the principal advantage of analyzing genetic mutation of humans is the exploration, analysis, interpretation and description of the genetic transmitted and inherited effect of several diseases such as cancer, diabetes and heart diseases. Cancer is the most troublesome and disordered affliction as the proportion of cancer sufferers is growing massively. Identification and discrimination of the mutations that impart to the enlargement of tumor from the unbiased mutations is difficult, as majority tumors of cancer are able to exercise genetic mutations. The genetic mutations are systematized and categorized to sort the cancer by way of medical observations and considering clinical studies. At the present time, genetic mutations are being annotated and these interpretations are being accomplished either manually or using the existing primary algorithms. Evaluation and classification of each and every individual genetic mutation was basically predicated on evidence from documented content built on medical literature. Consequently, as a means to build genetic mutations, basically, depending on the clinical evidences persists a challenging task. There exist various algorithms such as one hot encoding technique is used to derive features from genes and their variations, TF-IDF is used to extract features from the clinical text data. In order to increase the accuracy of the classification, machine learning algorithms such as support vector machine, logistic regression, Naive Bayes, etc., are experimented. A stacking model classifier has been developed to increase the accuracy. The proposed stacking model classifier has obtained the log loss 0.8436 and 0.8572 for cross-validation data set and test data set, respectively. By the experimentation, it has been proved that the proposed stacking model classifier outperforms the existing algorithms in terms of log loss. Basically, minimum log loss refers to the efficient model. Here the log loss has been reduced to less than 1 by using the proposed stacking model classifier. The performance of these algorithms can be gauged on the basis of the various measures like multi-class log loss.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信