Enabling Early Health Care Intervention by Detecting Depression in Users of Web-Based Forums using Language Models: Longitudinal Analysis and Evaluation.

JMIR AI Pub Date : 2023-03-24 DOI:10.2196/41205
David Owen, Dimosthenis Antypas, Athanasios Hassoulas, Antonio F Pardiñas, Luis Espinosa-Anke, Jose Camacho Collados
{"title":"Enabling Early Health Care Intervention by Detecting Depression in Users of Web-Based Forums using Language Models: Longitudinal Analysis and Evaluation.","authors":"David Owen,&nbsp;Dimosthenis Antypas,&nbsp;Athanasios Hassoulas,&nbsp;Antonio F Pardiñas,&nbsp;Luis Espinosa-Anke,&nbsp;Jose Camacho Collados","doi":"10.2196/41205","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Major depressive disorder is a common mental disorder affecting 5% of adults worldwide. Early contact with health care services is critical for achieving accurate diagnosis and improving patient outcomes. Key symptoms of major depressive disorder (depression hereafter) such as cognitive distortions are observed in verbal communication, which can also manifest in the structure of written language. Thus, the automatic analysis of text outputs may provide opportunities for early intervention in settings where written communication is rich and regular, such as social media and web-based forums.</p><p><strong>Objective: </strong>The objective of this study was 2-fold. We sought to gauge the effectiveness of different machine learning approaches to identify users of the mass web-based forum Reddit, who eventually disclose a diagnosis of depression. We then aimed to determine whether the time between a forum post and a depression diagnosis date was a relevant factor in performing this detection.</p><p><strong>Methods: </strong>A total of 2 Reddit data sets containing posts belonging to users with and without a history of depression diagnosis were obtained. The intersection of these data sets provided users with an estimated date of depression diagnosis. This derived data set was used as an input for several machine learning classifiers, including transformer-based language models (LMs).</p><p><strong>Results: </strong>Bidirectional Encoder Representations from Transformers (BERT) and MentalBERT transformer-based LMs proved the most effective in distinguishing forum users with a known depression diagnosis from those without. They each obtained a mean <i>F</i><sub>1</sub>-score of 0.64 across the experimental setups used for binary classification. The results also suggested that the final 12 to 16 weeks (about 3-4 months) of posts before a depressed user's estimated diagnosis date are the most indicative of their illness, with data before that period not helping the models detect more accurately. Furthermore, in the 4- to 8-week period before the user's estimated diagnosis date, their posts exhibited more negative sentiment than any other 4-week period in their post history.</p><p><strong>Conclusions: </strong>Transformer-based LMs may be used on data from web-based social media forums to identify users at risk for psychiatric conditions such as depression. Language features picked up by these classifiers might predate depression onset by weeks to months, enabling proactive mental health care interventions to support those at risk for this condition.</p>","PeriodicalId":73551,"journal":{"name":"JMIR AI","volume":"2 ","pages":"e41205"},"PeriodicalIF":0.0000,"publicationDate":"2023-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7614849/pdf/","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"JMIR AI","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2196/41205","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

Background: Major depressive disorder is a common mental disorder affecting 5% of adults worldwide. Early contact with health care services is critical for achieving accurate diagnosis and improving patient outcomes. Key symptoms of major depressive disorder (depression hereafter) such as cognitive distortions are observed in verbal communication, which can also manifest in the structure of written language. Thus, the automatic analysis of text outputs may provide opportunities for early intervention in settings where written communication is rich and regular, such as social media and web-based forums.

Objective: The objective of this study was 2-fold. We sought to gauge the effectiveness of different machine learning approaches to identify users of the mass web-based forum Reddit, who eventually disclose a diagnosis of depression. We then aimed to determine whether the time between a forum post and a depression diagnosis date was a relevant factor in performing this detection.

Methods: A total of 2 Reddit data sets containing posts belonging to users with and without a history of depression diagnosis were obtained. The intersection of these data sets provided users with an estimated date of depression diagnosis. This derived data set was used as an input for several machine learning classifiers, including transformer-based language models (LMs).

Results: Bidirectional Encoder Representations from Transformers (BERT) and MentalBERT transformer-based LMs proved the most effective in distinguishing forum users with a known depression diagnosis from those without. They each obtained a mean F1-score of 0.64 across the experimental setups used for binary classification. The results also suggested that the final 12 to 16 weeks (about 3-4 months) of posts before a depressed user's estimated diagnosis date are the most indicative of their illness, with data before that period not helping the models detect more accurately. Furthermore, in the 4- to 8-week period before the user's estimated diagnosis date, their posts exhibited more negative sentiment than any other 4-week period in their post history.

Conclusions: Transformer-based LMs may be used on data from web-based social media forums to identify users at risk for psychiatric conditions such as depression. Language features picked up by these classifiers might predate depression onset by weeks to months, enabling proactive mental health care interventions to support those at risk for this condition.

Abstract Image

Abstract Image

Abstract Image

通过使用语言模型检测网络论坛用户的抑郁,实现早期卫生保健干预:纵向分析和评估。
背景:重度抑郁症是一种常见的精神障碍,影响全世界5%的成年人。早期接触卫生保健服务对于实现准确诊断和改善患者预后至关重要。重度抑郁症(以下简称抑郁症)的主要症状,如认知扭曲,可以在口头交流中观察到,这也可以表现在书面语言的结构上。因此,文本输出的自动分析可能为书面交流丰富和定期的环境(如社交媒体和基于网络的论坛)提供早期干预的机会。目的:本研究的目的是双重的。我们试图衡量不同机器学习方法的有效性,以识别大众网络论坛Reddit的用户,这些用户最终披露了抑郁症的诊断。然后,我们的目标是确定论坛发帖和抑郁症诊断日期之间的时间是否是执行此检测的相关因素。方法:共获得2个Reddit数据集,其中包含有和没有抑郁症病史的用户的帖子。这些数据集的交集为用户提供了抑郁症诊断的估计日期。该衍生数据集被用作几个机器学习分类器的输入,包括基于转换器的语言模型(lm)。结果:来自变压器的双向编码器表示(BERT)和基于MentalBERT变压器的lm证明在区分已知抑郁症诊断的论坛用户和没有抑郁症诊断的论坛用户方面最有效。在用于二元分类的实验设置中,他们每个人的平均f1得分为0.64。结果还表明,在抑郁症用户预计诊断日期之前的最后12至16周(约3-4个月)的帖子最能说明他们的病情,在此之前的数据无助于模型更准确地检测。此外,在用户估计诊断日期之前的4到8周期间,他们的帖子比他们的帖子历史中任何其他4周期间都表现出更多的负面情绪。结论:基于转换器的LMs可用于基于网络的社交媒体论坛的数据,以识别有精神疾病(如抑郁症)风险的用户。这些分类器收集的语言特征可能比抑郁症发作早几周到几个月,从而使积极的精神卫生保健干预措施能够支持那些有患抑郁症风险的人。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信