Twitter中应力水平检测的改进:基于SMOTE的不平衡分类

2022 IEEE International Conference on Computing (ICOCO) Pub Date : 2022-11-14 DOI:10.1109/ICOCO56118.2022.10031684

Mohd Shahrul Nizam Mohd Danuri, Rohizah Abd Rahman, I. Mohamed, Azzan Amin

{"title":"Twitter中应力水平检测的改进:基于SMOTE的不平衡分类","authors":"Mohd Shahrul Nizam Mohd Danuri, Rohizah Abd Rahman, I. Mohamed, Azzan Amin","doi":"10.1109/ICOCO56118.2022.10031684","DOIUrl":null,"url":null,"abstract":"This study developed a model to improve stress level detection using Synthetic Minority Oversampling Technique (SMOTE) imbalanced data classification. SMOTE is a method to address imbalanced datasets to oversample the minority class. The data collected from Twitter may seem vague mainly due to the massive amount of data. This research used the framework model of Data, Experts Data Annotation, Text Pre-processing, and Text Representation and Classification. The Bag of Word (BoW), Term Frequency-Inverse Document Frequency (TFIDF), and Lemma were used for the text representation. The data were collected only from Twitter under certain circumstances. The Subject Matter Experts (SMEs) on mental health problems have annotated the text from the tweets based on four levels: Normal, Mild, Moderate, and Severe. The data group for the Normal stress level was relatively large compared to the other groups. Due to the imbalanced data group, the SMOTE technique was used for data argumentation. The result showed that the model classification using Support Vector Machine with SMOTE increased by improving the cardinality of the minority class label through the significant Macro Avg Recall and Macro Avg F1-Score analysis results compared to the baseline.","PeriodicalId":319652,"journal":{"name":"2022 IEEE International Conference on Computing (ICOCO)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"The Improvement of Stress Level Detection in Twitter: Imbalance Classification Using SMOTE\",\"authors\":\"Mohd Shahrul Nizam Mohd Danuri, Rohizah Abd Rahman, I. Mohamed, Azzan Amin\",\"doi\":\"10.1109/ICOCO56118.2022.10031684\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This study developed a model to improve stress level detection using Synthetic Minority Oversampling Technique (SMOTE) imbalanced data classification. SMOTE is a method to address imbalanced datasets to oversample the minority class. The data collected from Twitter may seem vague mainly due to the massive amount of data. This research used the framework model of Data, Experts Data Annotation, Text Pre-processing, and Text Representation and Classification. The Bag of Word (BoW), Term Frequency-Inverse Document Frequency (TFIDF), and Lemma were used for the text representation. The data were collected only from Twitter under certain circumstances. The Subject Matter Experts (SMEs) on mental health problems have annotated the text from the tweets based on four levels: Normal, Mild, Moderate, and Severe. The data group for the Normal stress level was relatively large compared to the other groups. Due to the imbalanced data group, the SMOTE technique was used for data argumentation. The result showed that the model classification using Support Vector Machine with SMOTE increased by improving the cardinality of the minority class label through the significant Macro Avg Recall and Macro Avg F1-Score analysis results compared to the baseline.\",\"PeriodicalId\":319652,\"journal\":{\"name\":\"2022 IEEE International Conference on Computing (ICOCO)\",\"volume\":\"16 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-11-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE International Conference on Computing (ICOCO)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICOCO56118.2022.10031684\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Conference on Computing (ICOCO)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICOCO56118.2022.10031684","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

本文提出了一种基于合成少数派过采样技术(SMOTE)的不平衡数据分类改进应力水平检测的模型。SMOTE是一种解决不平衡数据集的方法，用于对少数类进行过采样。从Twitter上收集的数据可能看起来很模糊，主要是因为数据量很大。本研究采用数据、专家数据标注、文本预处理和文本表示与分类的框架模型。使用词包(BoW)、词频-逆文档频率(TFIDF)和引理进行文本表示。这些数据仅在特定情况下从Twitter收集。心理健康问题主题专家(sme)根据正常、轻度、中度和严重四个级别对推文中的文本进行了注释。与其他组相比，正常压力水平的数据组相对较大。由于数据组不平衡，采用SMOTE技术进行数据论证。结果表明，与基线相比，通过显著的Macro Avg Recall和Macro Avg F1-Score分析结果，使用SMOTE的支持向量机(Support Vector Machine)提高了少数类标签的基数，从而提高了模型分类的效率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

The Improvement of Stress Level Detection in Twitter: Imbalance Classification Using SMOTE

This study developed a model to improve stress level detection using Synthetic Minority Oversampling Technique (SMOTE) imbalanced data classification. SMOTE is a method to address imbalanced datasets to oversample the minority class. The data collected from Twitter may seem vague mainly due to the massive amount of data. This research used the framework model of Data, Experts Data Annotation, Text Pre-processing, and Text Representation and Classification. The Bag of Word (BoW), Term Frequency-Inverse Document Frequency (TFIDF), and Lemma were used for the text representation. The data were collected only from Twitter under certain circumstances. The Subject Matter Experts (SMEs) on mental health problems have annotated the text from the tweets based on four levels: Normal, Mild, Moderate, and Severe. The data group for the Normal stress level was relatively large compared to the other groups. Due to the imbalanced data group, the SMOTE technique was used for data argumentation. The result showed that the model classification using Support Vector Machine with SMOTE increased by improving the cardinality of the minority class label through the significant Macro Avg Recall and Macro Avg F1-Score analysis results compared to the baseline.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 IEEE International Conference on Computing (ICOCO)

自引率

0.00%

发文量