利用多种交叉验证技术实现DNA数据集成方法

B. Bawankar, Kotadi Chinnaiah
{"title":"利用多种交叉验证技术实现DNA数据集成方法","authors":"B. Bawankar, Kotadi Chinnaiah","doi":"10.17993/3ctecno.2022.v11n2e42.59-69","DOIUrl":null,"url":null,"abstract":"Due to the growing size of datasets, which contain hundreds or thousands of features, feature selection has drawn the interest of many scholars in recent years. Usually, not all columns show important values. As a result, the machine learning models may perform poorly since the noise or unnecessary columns may confound the algorithms. To address this issue, various feature selection methods have been developed to evaluate large dimensional datasets and identify their subsets of pertinent features. The data, however, frequently skews feature selection algorithms. As a result, ensemble approaches have emerged as a substitute that incorporates the benefits of single feature selection algorithms and makes up for their drawbacks. In order to handle feature selection on datasets with large dimensionality, this research aims to grasp the key ideas and links in the process of aggregating feature selection methods. The suggested idea is tested by creating a cross-validation implementation that combines a number of Python packages with functionality to enable the feature selection techniques. By identifying pertinent features in the human, chimpanzee, and dog DNA datasets, the performance of the implementation was demonstrated.","PeriodicalId":210685,"journal":{"name":"3C Tecnología_Glosas de innovación aplicadas a la pyme","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Implementation of Ensemble Method on DNA Data Using Various Cross Validation Techniques\",\"authors\":\"B. Bawankar, Kotadi Chinnaiah\",\"doi\":\"10.17993/3ctecno.2022.v11n2e42.59-69\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Due to the growing size of datasets, which contain hundreds or thousands of features, feature selection has drawn the interest of many scholars in recent years. Usually, not all columns show important values. As a result, the machine learning models may perform poorly since the noise or unnecessary columns may confound the algorithms. To address this issue, various feature selection methods have been developed to evaluate large dimensional datasets and identify their subsets of pertinent features. The data, however, frequently skews feature selection algorithms. As a result, ensemble approaches have emerged as a substitute that incorporates the benefits of single feature selection algorithms and makes up for their drawbacks. In order to handle feature selection on datasets with large dimensionality, this research aims to grasp the key ideas and links in the process of aggregating feature selection methods. The suggested idea is tested by creating a cross-validation implementation that combines a number of Python packages with functionality to enable the feature selection techniques. By identifying pertinent features in the human, chimpanzee, and dog DNA datasets, the performance of the implementation was demonstrated.\",\"PeriodicalId\":210685,\"journal\":{\"name\":\"3C Tecnología_Glosas de innovación aplicadas a la pyme\",\"volume\":\"10 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-12-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"3C Tecnología_Glosas de innovación aplicadas a la pyme\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.17993/3ctecno.2022.v11n2e42.59-69\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"3C Tecnología_Glosas de innovación aplicadas a la pyme","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.17993/3ctecno.2022.v11n2e42.59-69","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

摘要

由于数据集的规模越来越大,其中包含成百上千个特征,特征选择近年来引起了许多学者的兴趣。通常,并非所有列都显示重要的值。因此,机器学习模型可能表现不佳,因为噪声或不必要的列可能会混淆算法。为了解决这个问题,已经开发了各种特征选择方法来评估大维度数据集并识别其相关特征子集。然而,这些数据经常会扭曲特征选择算法。因此,集成方法作为一种替代品出现,它结合了单一特征选择算法的优点并弥补了它们的缺点。为了处理大维数据集的特征选择,本研究旨在掌握特征选择方法聚合过程中的关键思想和环节。通过创建一个交叉验证实现来测试建议的想法,该实现将许多Python包与功能结合起来,以启用特征选择技术。通过识别人类、黑猩猩和狗DNA数据集中的相关特征,演示了实现的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Implementation of Ensemble Method on DNA Data Using Various Cross Validation Techniques
Due to the growing size of datasets, which contain hundreds or thousands of features, feature selection has drawn the interest of many scholars in recent years. Usually, not all columns show important values. As a result, the machine learning models may perform poorly since the noise or unnecessary columns may confound the algorithms. To address this issue, various feature selection methods have been developed to evaluate large dimensional datasets and identify their subsets of pertinent features. The data, however, frequently skews feature selection algorithms. As a result, ensemble approaches have emerged as a substitute that incorporates the benefits of single feature selection algorithms and makes up for their drawbacks. In order to handle feature selection on datasets with large dimensionality, this research aims to grasp the key ideas and links in the process of aggregating feature selection methods. The suggested idea is tested by creating a cross-validation implementation that combines a number of Python packages with functionality to enable the feature selection techniques. By identifying pertinent features in the human, chimpanzee, and dog DNA datasets, the performance of the implementation was demonstrated.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信