基于峰度的对称不确定性特征选择方法预测空气质量指数

Comput. Sci. J. Moldova Pub Date : 2022-12-01 DOI:10.56415/csjm.v30.19

Usharani Bhimavarapu, M. Sreedevi

{"title":"基于峰度的对称不确定性特征选择方法预测空气质量指数","authors":"Usharani Bhimavarapu, M. Sreedevi","doi":"10.56415/csjm.v30.19","DOIUrl":null,"url":null,"abstract":"Feature selection is vital in data pre-processing in machine learning, and it is prominent in datasets with many features. Feature selection analyses the relevant, irrelevant, and redundant features in the dataset. Feature selection removes the irrelevant features, which improves both the accuracy and prediction performance. The significant advantages of reducing the number of features from the dataset are reducing the training time, reducing overfitting, decreasing the curse of dimensionality, and simplifying the prediction model. The filter feature selection techniques can handle the issues with the high number of features, and this paper uses the symmetric uncertainty coefficient to verify the relevance of the independent features. In this paper, a new feature selection method named as kurtosis-based feature selection has been proposed to select the relevant features which affect the air pollution. Kurtosis-based feature selection is compared with seven filter feature selection techniques on air pollution dataset and validated the performance of the proposed algorithm. It has been observed that the kurtosis-based feature selection extracts only PM2.5 as the key feature and has been compared to the accuracy of the five existing methods. The experimental results illustrate that the kurtosis-based feature selection algorithm reduces the original feature set up to 91.66\\%, but the existing filter feature selection techniques reduce the feature set to only 50\\%.","PeriodicalId":262087,"journal":{"name":"Comput. Sci. J. Moldova","volume":"111 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Kurtosis-Based Feature Selection Method using Symmetric Uncertainty to Predict the Air Quality Index\",\"authors\":\"Usharani Bhimavarapu, M. Sreedevi\",\"doi\":\"10.56415/csjm.v30.19\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Feature selection is vital in data pre-processing in machine learning, and it is prominent in datasets with many features. Feature selection analyses the relevant, irrelevant, and redundant features in the dataset. Feature selection removes the irrelevant features, which improves both the accuracy and prediction performance. The significant advantages of reducing the number of features from the dataset are reducing the training time, reducing overfitting, decreasing the curse of dimensionality, and simplifying the prediction model. The filter feature selection techniques can handle the issues with the high number of features, and this paper uses the symmetric uncertainty coefficient to verify the relevance of the independent features. In this paper, a new feature selection method named as kurtosis-based feature selection has been proposed to select the relevant features which affect the air pollution. Kurtosis-based feature selection is compared with seven filter feature selection techniques on air pollution dataset and validated the performance of the proposed algorithm. It has been observed that the kurtosis-based feature selection extracts only PM2.5 as the key feature and has been compared to the accuracy of the five existing methods. The experimental results illustrate that the kurtosis-based feature selection algorithm reduces the original feature set up to 91.66\\\\%, but the existing filter feature selection techniques reduce the feature set to only 50\\\\%.\",\"PeriodicalId\":262087,\"journal\":{\"name\":\"Comput. Sci. J. Moldova\",\"volume\":\"111 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Comput. Sci. J. Moldova\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.56415/csjm.v30.19\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Comput. Sci. J. Moldova","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.56415/csjm.v30.19","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

特征选择在机器学习的数据预处理中起着至关重要的作用，在特征较多的数据集中尤为突出。特征选择分析数据集中的相关、不相关和冗余特征。特征选择去除不相关的特征，提高了预测精度和预测性能。减少数据集特征数量的显著优点是减少了训练时间，减少了过拟合，减少了维数的诅咒，简化了预测模型。滤波特征选择技术可以处理特征数量较多的问题，本文采用对称不确定系数来验证独立特征的相关性。本文提出了一种新的特征选择方法——基于峰度的特征选择，用于选择影响空气污染的相关特征。将基于峭度的特征选择与7种滤波特征选择技术在空气污染数据集上进行了比较，验证了所提算法的性能。观察到，基于峰度的特征选择只提取PM2.5作为关键特征，并与现有五种方法的准确性进行了比较。实验结果表明，基于峭度的特征选择算法将原始特征集减少了91.66%，而现有的滤波器特征选择技术仅将特征集减少了50%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Kurtosis-Based Feature Selection Method using Symmetric Uncertainty to Predict the Air Quality Index

Feature selection is vital in data pre-processing in machine learning, and it is prominent in datasets with many features. Feature selection analyses the relevant, irrelevant, and redundant features in the dataset. Feature selection removes the irrelevant features, which improves both the accuracy and prediction performance. The significant advantages of reducing the number of features from the dataset are reducing the training time, reducing overfitting, decreasing the curse of dimensionality, and simplifying the prediction model. The filter feature selection techniques can handle the issues with the high number of features, and this paper uses the symmetric uncertainty coefficient to verify the relevance of the independent features. In this paper, a new feature selection method named as kurtosis-based feature selection has been proposed to select the relevant features which affect the air pollution. Kurtosis-based feature selection is compared with seven filter feature selection techniques on air pollution dataset and validated the performance of the proposed algorithm. It has been observed that the kurtosis-based feature selection extracts only PM2.5 as the key feature and has been compared to the accuracy of the five existing methods. The experimental results illustrate that the kurtosis-based feature selection algorithm reduces the original feature set up to 91.66\%, but the existing filter feature selection techniques reduce the feature set to only 50\%.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Comput. Sci. J. Moldova

自引率

0.00%

发文量