The Improvement of Stress Level Detection in Twitter: Imbalance Classification Using SMOTE

Mohd Shahrul Nizam Mohd Danuri, Rohizah Abd Rahman, I. Mohamed, Azzan Amin
{"title":"The Improvement of Stress Level Detection in Twitter: Imbalance Classification Using SMOTE","authors":"Mohd Shahrul Nizam Mohd Danuri, Rohizah Abd Rahman, I. Mohamed, Azzan Amin","doi":"10.1109/ICOCO56118.2022.10031684","DOIUrl":null,"url":null,"abstract":"This study developed a model to improve stress level detection using Synthetic Minority Oversampling Technique (SMOTE) imbalanced data classification. SMOTE is a method to address imbalanced datasets to oversample the minority class. The data collected from Twitter may seem vague mainly due to the massive amount of data. This research used the framework model of Data, Experts Data Annotation, Text Pre-processing, and Text Representation and Classification. The Bag of Word (BoW), Term Frequency-Inverse Document Frequency (TFIDF), and Lemma were used for the text representation. The data were collected only from Twitter under certain circumstances. The Subject Matter Experts (SMEs) on mental health problems have annotated the text from the tweets based on four levels: Normal, Mild, Moderate, and Severe. The data group for the Normal stress level was relatively large compared to the other groups. Due to the imbalanced data group, the SMOTE technique was used for data argumentation. The result showed that the model classification using Support Vector Machine with SMOTE increased by improving the cardinality of the minority class label through the significant Macro Avg Recall and Macro Avg F1-Score analysis results compared to the baseline.","PeriodicalId":319652,"journal":{"name":"2022 IEEE International Conference on Computing (ICOCO)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Conference on Computing (ICOCO)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICOCO56118.2022.10031684","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

This study developed a model to improve stress level detection using Synthetic Minority Oversampling Technique (SMOTE) imbalanced data classification. SMOTE is a method to address imbalanced datasets to oversample the minority class. The data collected from Twitter may seem vague mainly due to the massive amount of data. This research used the framework model of Data, Experts Data Annotation, Text Pre-processing, and Text Representation and Classification. The Bag of Word (BoW), Term Frequency-Inverse Document Frequency (TFIDF), and Lemma were used for the text representation. The data were collected only from Twitter under certain circumstances. The Subject Matter Experts (SMEs) on mental health problems have annotated the text from the tweets based on four levels: Normal, Mild, Moderate, and Severe. The data group for the Normal stress level was relatively large compared to the other groups. Due to the imbalanced data group, the SMOTE technique was used for data argumentation. The result showed that the model classification using Support Vector Machine with SMOTE increased by improving the cardinality of the minority class label through the significant Macro Avg Recall and Macro Avg F1-Score analysis results compared to the baseline.
Twitter中应力水平检测的改进:基于SMOTE的不平衡分类
本文提出了一种基于合成少数派过采样技术(SMOTE)的不平衡数据分类改进应力水平检测的模型。SMOTE是一种解决不平衡数据集的方法,用于对少数类进行过采样。从Twitter上收集的数据可能看起来很模糊,主要是因为数据量很大。本研究采用数据、专家数据标注、文本预处理和文本表示与分类的框架模型。使用词包(BoW)、词频-逆文档频率(TFIDF)和引理进行文本表示。这些数据仅在特定情况下从Twitter收集。心理健康问题主题专家(sme)根据正常、轻度、中度和严重四个级别对推文中的文本进行了注释。与其他组相比,正常压力水平的数据组相对较大。由于数据组不平衡,采用SMOTE技术进行数据论证。结果表明,与基线相比,通过显著的Macro Avg Recall和Macro Avg F1-Score分析结果,使用SMOTE的支持向量机(Support Vector Machine)提高了少数类标签的基数,从而提高了模型分类的效率。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信