Natural Language Processing for Depression Prediction on Sina Weibo: Method Study and Analysis.

IF 4.8 2区 医学 Q1 PSYCHIATRY
Jmir Mental Health Pub Date : 2024-09-04 DOI:10.2196/58259
Zhenwen Zhang, Jianghong Zhu, Zhihua Guo, Yu Zhang, Zepeng Li, Bin Hu
{"title":"Natural Language Processing for Depression Prediction on Sina Weibo: Method Study and Analysis.","authors":"Zhenwen Zhang, Jianghong Zhu, Zhihua Guo, Yu Zhang, Zepeng Li, Bin Hu","doi":"10.2196/58259","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Depression represents a pressing global public health concern, impacting the physical and mental well-being of hundreds of millions worldwide. Notwithstanding advances in clinical practice, an alarming number of individuals at risk for depression continue to face significant barriers to timely diagnosis and effective treatment, thereby exacerbating a burgeoning social health crisis.</p><p><strong>Objective: </strong>This study seeks to develop a novel online depression risk detection method using natural language processing technology to identify individuals at risk of depression on the Chinese social media platform Sina Weibo.</p><p><strong>Methods: </strong>First, we collected approximately 527,333 posts publicly shared over 1 year from 1600 individuals with depression and 1600 individuals without depression on the Sina Weibo platform. We then developed a hierarchical transformer network for learning user-level semantic representations, which consists of 3 primary components: a word-level encoder, a post-level encoder, and a semantic aggregation encoder. The word-level encoder learns semantic embeddings from individual posts, while the post-level encoder explores features in user post sequences. The semantic aggregation encoder aggregates post sequence semantics to generate a user-level semantic representation that can be classified as depressed or nondepressed. Next, a classifier is employed to predict the risk of depression. Finally, we conducted statistical and linguistic analyses of the post content from individuals with and without depression using the Chinese Linguistic Inquiry and Word Count.</p><p><strong>Results: </strong>We divided the original data set into training, validation, and test sets. The training set consisted of 1000 individuals with depression and 1000 individuals without depression. Similarly, each validation and test set comprised 600 users, with 300 individuals from both cohorts (depression and nondepression). Our method achieved an accuracy of 84.62%, precision of 84.43%, recall of 84.50%, and F1-score of 84.32% on the test set without employing sampling techniques. However, by applying our proposed retrieval-based sampling strategy, we observed significant improvements in performance: an accuracy of 95.46%, precision of 95.30%, recall of 95.70%, and F1-score of 95.43%. These outstanding results clearly demonstrate the effectiveness and superiority of our proposed depression risk detection model and retrieval-based sampling technique. This breakthrough provides new insights for large-scale depression detection through social media. Through language behavior analysis, we discovered that individuals with depression are more likely to use negation words (the value of \"swear\" is 0.001253). This may indicate the presence of negative emotions, rejection, doubt, disagreement, or aversion in individuals with depression. Additionally, our analysis revealed that individuals with depression tend to use negative emotional vocabulary in their expressions (\"NegEmo\": 0.022306; \"Anx\": 0.003829; \"Anger\": 0.004327; \"Sad\": 0.005740), which may reflect their internal negative emotions and psychological state. This frequent use of negative vocabulary could be a way for individuals with depression to express negative feelings toward life, themselves, or their surrounding environment.</p><p><strong>Conclusions: </strong>The research results indicate the feasibility and effectiveness of using deep learning methods to detect the risk of depression. These findings provide insights into the potential for large-scale, automated, and noninvasive prediction of depression among online social media users.</p>","PeriodicalId":48616,"journal":{"name":"Jmir Mental Health","volume":null,"pages":null},"PeriodicalIF":4.8000,"publicationDate":"2024-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11391090/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Jmir Mental Health","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.2196/58259","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PSYCHIATRY","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Depression represents a pressing global public health concern, impacting the physical and mental well-being of hundreds of millions worldwide. Notwithstanding advances in clinical practice, an alarming number of individuals at risk for depression continue to face significant barriers to timely diagnosis and effective treatment, thereby exacerbating a burgeoning social health crisis.

Objective: This study seeks to develop a novel online depression risk detection method using natural language processing technology to identify individuals at risk of depression on the Chinese social media platform Sina Weibo.

Methods: First, we collected approximately 527,333 posts publicly shared over 1 year from 1600 individuals with depression and 1600 individuals without depression on the Sina Weibo platform. We then developed a hierarchical transformer network for learning user-level semantic representations, which consists of 3 primary components: a word-level encoder, a post-level encoder, and a semantic aggregation encoder. The word-level encoder learns semantic embeddings from individual posts, while the post-level encoder explores features in user post sequences. The semantic aggregation encoder aggregates post sequence semantics to generate a user-level semantic representation that can be classified as depressed or nondepressed. Next, a classifier is employed to predict the risk of depression. Finally, we conducted statistical and linguistic analyses of the post content from individuals with and without depression using the Chinese Linguistic Inquiry and Word Count.

Results: We divided the original data set into training, validation, and test sets. The training set consisted of 1000 individuals with depression and 1000 individuals without depression. Similarly, each validation and test set comprised 600 users, with 300 individuals from both cohorts (depression and nondepression). Our method achieved an accuracy of 84.62%, precision of 84.43%, recall of 84.50%, and F1-score of 84.32% on the test set without employing sampling techniques. However, by applying our proposed retrieval-based sampling strategy, we observed significant improvements in performance: an accuracy of 95.46%, precision of 95.30%, recall of 95.70%, and F1-score of 95.43%. These outstanding results clearly demonstrate the effectiveness and superiority of our proposed depression risk detection model and retrieval-based sampling technique. This breakthrough provides new insights for large-scale depression detection through social media. Through language behavior analysis, we discovered that individuals with depression are more likely to use negation words (the value of "swear" is 0.001253). This may indicate the presence of negative emotions, rejection, doubt, disagreement, or aversion in individuals with depression. Additionally, our analysis revealed that individuals with depression tend to use negative emotional vocabulary in their expressions ("NegEmo": 0.022306; "Anx": 0.003829; "Anger": 0.004327; "Sad": 0.005740), which may reflect their internal negative emotions and psychological state. This frequent use of negative vocabulary could be a way for individuals with depression to express negative feelings toward life, themselves, or their surrounding environment.

Conclusions: The research results indicate the feasibility and effectiveness of using deep learning methods to detect the risk of depression. These findings provide insights into the potential for large-scale, automated, and noninvasive prediction of depression among online social media users.

用于新浪微博抑郁预测的自然语言处理:方法研究与分析
背景:抑郁症是一个紧迫的全球公共卫生问题,影响着全球数亿人的身心健康。尽管临床实践在不断进步,但数量惊人的抑郁症高危人群在及时诊断和有效治疗方面仍然面临巨大障碍,从而加剧了日益严重的社会健康危机:本研究试图利用自然语言处理技术开发一种新型的在线抑郁症风险检测方法,以识别中国社交媒体平台新浪微博上的抑郁症高危人群:首先,我们收集了新浪微博平台上 1600 名抑郁症患者和 1600 名非抑郁症患者在一年内公开分享的约 527,333 条帖子。然后,我们开发了一个用于学习用户级语义表征的分层转换器网络,它由三个主要部分组成:词级编码器、帖子级编码器和语义聚合编码器。单词级编码器从单个帖子中学习语义嵌入,而帖子级编码器则探索用户帖子序列中的特征。语义聚合编码器对帖子序列语义进行聚合,生成用户级语义表示,可将其分为抑郁或非抑郁。接下来,分类器被用来预测抑郁风险。最后,我们使用中文语言学调查和词数统计对患有抑郁症和未患有抑郁症的用户的帖子内容进行了统计和语言学分析:我们将原始数据集分为训练集、验证集和测试集。训练集由 1000 名抑郁症患者和 1000 名非抑郁症患者组成。同样,验证集和测试集各由 600 名用户组成,其中 300 人来自两个群体(抑郁症和非抑郁症)。我们的方法在不使用抽样技术的情况下,测试集的准确率为 84.62%,精确率为 84.43%,召回率为 84.50%,F1 分数为 84.32%。然而,通过应用我们提出的基于检索的抽样策略,我们观察到性能有了显著提高:准确率达到 95.46%,精确率达到 95.30%,召回率达到 95.70%,F1 分数达到 95.43%。这些出色的结果清楚地证明了我们提出的抑郁风险检测模型和基于检索的抽样技术的有效性和优越性。这一突破为通过社交媒体进行大规模抑郁检测提供了新的思路。通过语言行为分析,我们发现抑郁症患者更倾向于使用否定词语("脏话 "的值为 0.001253)。这可能表明抑郁症患者存在负面情绪、拒绝、怀疑、分歧或厌恶。此外,我们的分析还发现,抑郁症患者在表达中倾向于使用负面情绪词汇("NegEmo":0.022306;"Anx":0.003829;"Anger":0.004327;"Sad":0.005740),这可能反映了他们内心的负面情绪和心理状态。频繁使用消极词汇可能是抑郁症患者表达对生活、自身或周围环境的消极情绪的一种方式:研究结果表明,使用深度学习方法检测抑郁症风险具有可行性和有效性。这些发现为大规模、自动化、非侵入式预测网络社交媒体用户抑郁症的潜力提供了启示。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Jmir Mental Health
Jmir Mental Health Medicine-Psychiatry and Mental Health
CiteScore
10.80
自引率
3.80%
发文量
104
审稿时长
16 weeks
期刊介绍: JMIR Mental Health (JMH, ISSN 2368-7959) is a PubMed-indexed, peer-reviewed sister journal of JMIR, the leading eHealth journal (Impact Factor 2016: 5.175). JMIR Mental Health focusses on digital health and Internet interventions, technologies and electronic innovations (software and hardware) for mental health, addictions, online counselling and behaviour change. This includes formative evaluation and system descriptions, theoretical papers, review papers, viewpoint/vision papers, and rigorous evaluations.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信