Iffah Nadhirah Joharee, Nik Nur Wahidah Nik Hashim, Nur Syahirah Mohd Shah
{"title":"Sentiment Analysis and Text Classification for Depression Detection","authors":"Iffah Nadhirah Joharee, Nik Nur Wahidah Nik Hashim, Nur Syahirah Mohd Shah","doi":"10.51662/jiae.v3i1.86","DOIUrl":null,"url":null,"abstract":"Depression is an illness that can harm someone's life. However, many people still do not know that they are having depression and tend to express their feelings through text or social media. Thus, text-based depression detection could help in identifying the early detection of the illness. Therefore, the research aims to build a depression detection that can identify possible depression cues based on Bahasa Malaysia text. The data, in the form of text, has been collected from depressed and healthy people via a google form. There are three questions asked which are “Apakah kenangan manis yang anda ingat?”, “Apakah rutin harian anda?” and “Apakah keadaan yang membuatkan anda stress?” which obtained 172, 169 and 170 responses for each question respectively. All the datasets are stored in a CSV file. Using Python, TF-IDF was extracted as the feature and pipeline into several classifier models such as Random Forest, Multinomial Naïve Bayes, and Logistic Regression. The results were presented using the classification metrics of confusion matrix, accuracy, and F1-score. Also, another method has been conducted using the text sentiment techniques Vader and Text Blob onto the datasets to identify whether depressive text falls under negative sentiment or vice versa. The percentage differences were determined between the actual sentiment compared to Vader and Text Blob sentiment. From the experiment, the highest score is achieved by AdaBoost Classifier with a 0.66-F1 score. The best model is chosen to be utilized in the Graphical User Interface (GUI).","PeriodicalId":424190,"journal":{"name":"Journal of Integrated and Advanced Engineering (JIAE)","volume":"401 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Integrated and Advanced Engineering (JIAE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.51662/jiae.v3i1.86","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Depression is an illness that can harm someone's life. However, many people still do not know that they are having depression and tend to express their feelings through text or social media. Thus, text-based depression detection could help in identifying the early detection of the illness. Therefore, the research aims to build a depression detection that can identify possible depression cues based on Bahasa Malaysia text. The data, in the form of text, has been collected from depressed and healthy people via a google form. There are three questions asked which are “Apakah kenangan manis yang anda ingat?”, “Apakah rutin harian anda?” and “Apakah keadaan yang membuatkan anda stress?” which obtained 172, 169 and 170 responses for each question respectively. All the datasets are stored in a CSV file. Using Python, TF-IDF was extracted as the feature and pipeline into several classifier models such as Random Forest, Multinomial Naïve Bayes, and Logistic Regression. The results were presented using the classification metrics of confusion matrix, accuracy, and F1-score. Also, another method has been conducted using the text sentiment techniques Vader and Text Blob onto the datasets to identify whether depressive text falls under negative sentiment or vice versa. The percentage differences were determined between the actual sentiment compared to Vader and Text Blob sentiment. From the experiment, the highest score is achieved by AdaBoost Classifier with a 0.66-F1 score. The best model is chosen to be utilized in the Graphical User Interface (GUI).
抑郁症是一种可以危害某人生命的疾病。然而,许多人仍然不知道自己患有抑郁症,并倾向于通过文字或社交媒体来表达自己的感受。因此,基于文本的抑郁症检测可以帮助识别疾病的早期检测。因此,本研究旨在建立一个基于马来文文本的抑郁症检测系统,以识别可能的抑郁症线索。这些文本形式的数据是通过谷歌表格从抑郁和健康人群中收集的。有三个问题是“Apakah kenangan manis yang anda ingat”?、“阿帕卡?”和“Apakah keadaan yang membuatkan anda stress?”,每个问题分别得到172、169和170个回答。所有数据集都存储在一个CSV文件中。使用Python提取TF-IDF作为特征,并将其输送到随机森林、多项式Naïve贝叶斯和逻辑回归等几种分类器模型中。结果采用混淆矩阵、准确性和f1评分的分类指标来呈现。此外,使用文本情感技术Vader和文本Blob对数据集进行了另一种方法,以确定抑郁文本是否属于消极情绪,反之亦然。百分比差异是由实际情绪与维德情绪和文本Blob情绪之间的差异决定的。从实验来看,AdaBoost分类器的得分最高,为0.66-F1。选择最佳模型用于图形用户界面(GUI)。