Luthfiah Azizah, P. Khotimah, Andria Arisal, A. Rozie, D. Munandar, D. Riswantini, Ekasari Nugraheni, W. Suwarningsih, D. Kurniasari
{"title":"面向不平衡文本数据的深度学习分类器研究","authors":"Luthfiah Azizah, P. Khotimah, Andria Arisal, A. Rozie, D. Munandar, D. Riswantini, Ekasari Nugraheni, W. Suwarningsih, D. Kurniasari","doi":"10.1109/NISS55057.2022.10085611","DOIUrl":null,"url":null,"abstract":"Class imbalance is an important classification problem where failure to identify events can be hazardous due to failure of solution preparation or opportune handling. Minorities are mostly more consequential in such cases. It is necessary to know a reliable classifier for imbalanced classes. This study examines several conventional machine learning and deep learning methods to compare the performance of each method on dataset with imbalanced classes. We use COVID-19 online news titles to simulate different class imbalance ratios. The results of our study demonstrate the superiority of the CNN with embedding layer method on a news titles dataset of 16,844 data points towards imbalance ratios of 37%, 30%, 20%, 10%, and 1%. However, CNN with embedding layer showed a noticeable performance degradation at an imbalance ratio of 1%.","PeriodicalId":138637,"journal":{"name":"2022 5th International Conference on Networking, Information Systems and Security: Envisage Intelligent Systems in 5g//6G-based Interconnected Digital Worlds (NISS)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"The Investigation into Deep Learning Classifiers Towards Imbalanced Text Data\",\"authors\":\"Luthfiah Azizah, P. Khotimah, Andria Arisal, A. Rozie, D. Munandar, D. Riswantini, Ekasari Nugraheni, W. Suwarningsih, D. Kurniasari\",\"doi\":\"10.1109/NISS55057.2022.10085611\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Class imbalance is an important classification problem where failure to identify events can be hazardous due to failure of solution preparation or opportune handling. Minorities are mostly more consequential in such cases. It is necessary to know a reliable classifier for imbalanced classes. This study examines several conventional machine learning and deep learning methods to compare the performance of each method on dataset with imbalanced classes. We use COVID-19 online news titles to simulate different class imbalance ratios. The results of our study demonstrate the superiority of the CNN with embedding layer method on a news titles dataset of 16,844 data points towards imbalance ratios of 37%, 30%, 20%, 10%, and 1%. However, CNN with embedding layer showed a noticeable performance degradation at an imbalance ratio of 1%.\",\"PeriodicalId\":138637,\"journal\":{\"name\":\"2022 5th International Conference on Networking, Information Systems and Security: Envisage Intelligent Systems in 5g//6G-based Interconnected Digital Worlds (NISS)\",\"volume\":\"20 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-03-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 5th International Conference on Networking, Information Systems and Security: Envisage Intelligent Systems in 5g//6G-based Interconnected Digital Worlds (NISS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/NISS55057.2022.10085611\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 5th International Conference on Networking, Information Systems and Security: Envisage Intelligent Systems in 5g//6G-based Interconnected Digital Worlds (NISS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NISS55057.2022.10085611","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
The Investigation into Deep Learning Classifiers Towards Imbalanced Text Data
Class imbalance is an important classification problem where failure to identify events can be hazardous due to failure of solution preparation or opportune handling. Minorities are mostly more consequential in such cases. It is necessary to know a reliable classifier for imbalanced classes. This study examines several conventional machine learning and deep learning methods to compare the performance of each method on dataset with imbalanced classes. We use COVID-19 online news titles to simulate different class imbalance ratios. The results of our study demonstrate the superiority of the CNN with embedding layer method on a news titles dataset of 16,844 data points towards imbalance ratios of 37%, 30%, 20%, 10%, and 1%. However, CNN with embedding layer showed a noticeable performance degradation at an imbalance ratio of 1%.