Sentiment Analysis and Text Classification for Depression Detection

Iffah Nadhirah Joharee, Nik Nur Wahidah Nik Hashim, Nur Syahirah Mohd Shah
{"title":"Sentiment Analysis and Text Classification for Depression Detection","authors":"Iffah Nadhirah Joharee, Nik Nur Wahidah Nik Hashim, Nur Syahirah Mohd Shah","doi":"10.51662/jiae.v3i1.86","DOIUrl":null,"url":null,"abstract":"Depression is an illness that can harm someone's life. However, many people still do not know that they are having depression and tend to express their feelings through text or social media. Thus, text-based depression detection could help in identifying the early detection of the illness. Therefore, the research aims to build a depression detection that can identify possible depression cues based on Bahasa Malaysia text. The data, in the form of text, has been collected from depressed and healthy people via a google form. There are three questions asked which are “Apakah kenangan manis yang anda ingat?”, “Apakah rutin harian anda?” and “Apakah keadaan yang membuatkan anda stress?” which obtained 172, 169 and 170 responses for each question respectively. All the datasets are stored in a CSV file. Using Python, TF-IDF was extracted as the feature and pipeline into several classifier models such as Random Forest, Multinomial Naïve Bayes, and Logistic Regression. The results were presented using the classification metrics of confusion matrix, accuracy, and F1-score. Also, another method has been conducted using the text sentiment techniques Vader and Text Blob onto the datasets to identify whether depressive text falls under negative sentiment or vice versa. The percentage differences were determined between the actual sentiment compared to Vader and Text Blob sentiment. From the experiment, the highest score is achieved by AdaBoost Classifier with a 0.66-F1 score. The best model is chosen to be utilized in the Graphical User Interface (GUI).","PeriodicalId":424190,"journal":{"name":"Journal of Integrated and Advanced Engineering (JIAE)","volume":"401 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Integrated and Advanced Engineering (JIAE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.51662/jiae.v3i1.86","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Depression is an illness that can harm someone's life. However, many people still do not know that they are having depression and tend to express their feelings through text or social media. Thus, text-based depression detection could help in identifying the early detection of the illness. Therefore, the research aims to build a depression detection that can identify possible depression cues based on Bahasa Malaysia text. The data, in the form of text, has been collected from depressed and healthy people via a google form. There are three questions asked which are “Apakah kenangan manis yang anda ingat?”, “Apakah rutin harian anda?” and “Apakah keadaan yang membuatkan anda stress?” which obtained 172, 169 and 170 responses for each question respectively. All the datasets are stored in a CSV file. Using Python, TF-IDF was extracted as the feature and pipeline into several classifier models such as Random Forest, Multinomial Naïve Bayes, and Logistic Regression. The results were presented using the classification metrics of confusion matrix, accuracy, and F1-score. Also, another method has been conducted using the text sentiment techniques Vader and Text Blob onto the datasets to identify whether depressive text falls under negative sentiment or vice versa. The percentage differences were determined between the actual sentiment compared to Vader and Text Blob sentiment. From the experiment, the highest score is achieved by AdaBoost Classifier with a 0.66-F1 score. The best model is chosen to be utilized in the Graphical User Interface (GUI).
情感分析和文本分类的抑郁症检测
抑郁症是一种可以危害某人生命的疾病。然而,许多人仍然不知道自己患有抑郁症,并倾向于通过文字或社交媒体来表达自己的感受。因此,基于文本的抑郁症检测可以帮助识别疾病的早期检测。因此,本研究旨在建立一个基于马来文文本的抑郁症检测系统,以识别可能的抑郁症线索。这些文本形式的数据是通过谷歌表格从抑郁和健康人群中收集的。有三个问题是“Apakah kenangan manis yang anda ingat”?、“阿帕卡?”和“Apakah keadaan yang membuatkan anda stress?”,每个问题分别得到172、169和170个回答。所有数据集都存储在一个CSV文件中。使用Python提取TF-IDF作为特征,并将其输送到随机森林、多项式Naïve贝叶斯和逻辑回归等几种分类器模型中。结果采用混淆矩阵、准确性和f1评分的分类指标来呈现。此外,使用文本情感技术Vader和文本Blob对数据集进行了另一种方法,以确定抑郁文本是否属于消极情绪,反之亦然。百分比差异是由实际情绪与维德情绪和文本Blob情绪之间的差异决定的。从实验来看,AdaBoost分类器的得分最高,为0.66-F1。选择最佳模型用于图形用户界面(GUI)。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信