{"title":"分类问题的特征选择技术分析","authors":"A. Adamov","doi":"10.1109/AICT52784.2021.9620226","DOIUrl":null,"url":null,"abstract":"Feature Selection problem is increasingly recognized as one of the key areas in academic research that employs data-driven or computationally-intensive approaches. Huge collection of data available for the research community and access to data aggregation techniques that help to easily get vast amounts of data, bring number of other problems. More data is not always better. Data mainly comes with noise distracting from what is more important. Data cleaning and pre-processing is long and resource consuming process. When it comes to widely used supervised machine learning approach, more data requires more time and more computation power for the training. This is why right data that represents entire population of cases is better than just more data. Feature selection is the process that helps to address two key issues associated with data-driven systems: dimensionality reduction of the data to increase performance and selecting features that produce most accurate model. The main purpose of this study is to review feature selection methods applied on real data.","PeriodicalId":150606,"journal":{"name":"2021 IEEE 15th International Conference on Application of Information and Communication Technologies (AICT)","volume":"89 3","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Analysis of Feature Selection Techniques for Classification Problems\",\"authors\":\"A. Adamov\",\"doi\":\"10.1109/AICT52784.2021.9620226\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Feature Selection problem is increasingly recognized as one of the key areas in academic research that employs data-driven or computationally-intensive approaches. Huge collection of data available for the research community and access to data aggregation techniques that help to easily get vast amounts of data, bring number of other problems. More data is not always better. Data mainly comes with noise distracting from what is more important. Data cleaning and pre-processing is long and resource consuming process. When it comes to widely used supervised machine learning approach, more data requires more time and more computation power for the training. This is why right data that represents entire population of cases is better than just more data. Feature selection is the process that helps to address two key issues associated with data-driven systems: dimensionality reduction of the data to increase performance and selecting features that produce most accurate model. The main purpose of this study is to review feature selection methods applied on real data.\",\"PeriodicalId\":150606,\"journal\":{\"name\":\"2021 IEEE 15th International Conference on Application of Information and Communication Technologies (AICT)\",\"volume\":\"89 3\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-10-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE 15th International Conference on Application of Information and Communication Technologies (AICT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/AICT52784.2021.9620226\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 15th International Conference on Application of Information and Communication Technologies (AICT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AICT52784.2021.9620226","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Analysis of Feature Selection Techniques for Classification Problems
Feature Selection problem is increasingly recognized as one of the key areas in academic research that employs data-driven or computationally-intensive approaches. Huge collection of data available for the research community and access to data aggregation techniques that help to easily get vast amounts of data, bring number of other problems. More data is not always better. Data mainly comes with noise distracting from what is more important. Data cleaning and pre-processing is long and resource consuming process. When it comes to widely used supervised machine learning approach, more data requires more time and more computation power for the training. This is why right data that represents entire population of cases is better than just more data. Feature selection is the process that helps to address two key issues associated with data-driven systems: dimensionality reduction of the data to increase performance and selecting features that produce most accurate model. The main purpose of this study is to review feature selection methods applied on real data.