非平衡数据情感分析的类分解与增强

2021 International Joint Conference on Neural Networks (IJCNN) Pub Date : 2021-07-18 DOI:10.1109/IJCNN52387.2021.9533603

C. Moreno-García, Chrisina Jayne, Eyad Elyan

{"title":"非平衡数据情感分析的类分解与增强","authors":"C. Moreno-García, Chrisina Jayne, Eyad Elyan","doi":"10.1109/IJCNN52387.2021.9533603","DOIUrl":null,"url":null,"abstract":"Significant progress has been made in the area of text classification and natural language processing. However, like many other datasets from across different domains, text-based datasets may suffer from class-imbalance. This problem leads to model's bias toward the majority class instances. In this paper, we present a new approach to handle class-imbalance in text data by means of unsupervised learning algorithms. We present class-decomposition using two different unsupervised methods, namely k-means and Density-Based Spatial Clustering of Applications with Noise, applied to two different sentiment analysis data sets. The experimental results show that utilizing clustering to find within-class similarities can lead to significant improvement in learning algorithm's performances as well as reducing the dominance of the majority class instances without causing information loss.","PeriodicalId":396583,"journal":{"name":"2021 International Joint Conference on Neural Networks (IJCNN)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Class-Decomposition and Augmentation for Imbalanced Data Sentiment Analysis\",\"authors\":\"C. Moreno-García, Chrisina Jayne, Eyad Elyan\",\"doi\":\"10.1109/IJCNN52387.2021.9533603\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Significant progress has been made in the area of text classification and natural language processing. However, like many other datasets from across different domains, text-based datasets may suffer from class-imbalance. This problem leads to model's bias toward the majority class instances. In this paper, we present a new approach to handle class-imbalance in text data by means of unsupervised learning algorithms. We present class-decomposition using two different unsupervised methods, namely k-means and Density-Based Spatial Clustering of Applications with Noise, applied to two different sentiment analysis data sets. The experimental results show that utilizing clustering to find within-class similarities can lead to significant improvement in learning algorithm's performances as well as reducing the dominance of the majority class instances without causing information loss.\",\"PeriodicalId\":396583,\"journal\":{\"name\":\"2021 International Joint Conference on Neural Networks (IJCNN)\",\"volume\":\"41 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-07-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 International Joint Conference on Neural Networks (IJCNN)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IJCNN52387.2021.9533603\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Joint Conference on Neural Networks (IJCNN)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IJCNN52387.2021.9533603","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

在文本分类和自然语言处理领域取得了重大进展。然而，与来自不同领域的许多其他数据集一样，基于文本的数据集可能存在类不平衡的问题。这个问题导致模型偏向于大多数类实例。本文提出了一种利用无监督学习算法处理文本数据类不平衡的新方法。我们提出了两种不同的无监督方法，即k-means和基于密度的带噪声应用空间聚类，应用于两种不同的情感分析数据集的类分解。实验结果表明，利用聚类来寻找类内相似性可以显著提高学习算法的性能，并在不造成信息损失的情况下降低大多数类实例的优势。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Class-Decomposition and Augmentation for Imbalanced Data Sentiment Analysis

Significant progress has been made in the area of text classification and natural language processing. However, like many other datasets from across different domains, text-based datasets may suffer from class-imbalance. This problem leads to model's bias toward the majority class instances. In this paper, we present a new approach to handle class-imbalance in text data by means of unsupervised learning algorithms. We present class-decomposition using two different unsupervised methods, namely k-means and Density-Based Spatial Clustering of Applications with Noise, applied to two different sentiment analysis data sets. The experimental results show that utilizing clustering to find within-class similarities can lead to significant improvement in learning algorithm's performances as well as reducing the dominance of the majority class instances without causing information loss.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2021 International Joint Conference on Neural Networks (IJCNN)

自引率

0.00%

发文量