How does depression talk on social media? Modeling depression language with relevance-based statistical language models

IF 2.9 Q1 Social Sciences

Online Social Networks and Media Pub Date : 2025-12-01 Epub Date: 2025-10-22 DOI:10.1016/j.osnem.2025.100339

Eliseo Bao , Anxo Perez , David Otero , Javier Parapar

{"title":"How does depression talk on social media? Modeling depression language with relevance-based statistical language models","authors":"Eliseo Bao , Anxo Perez , David Otero , Javier Parapar","doi":"10.1016/j.osnem.2025.100339","DOIUrl":null,"url":null,"abstract":"<div><div>Many individuals with mental health problems turn to the internet and social media for information and support. The text generated on these platforms serves as a valuable resource for identifying mental health risks, driving interdisciplinary research to develop models for mental health analysis and prediction. In this paper, we model depression-related language using relevance-based statistical language models to create lexicons that characterize linguistic patterns associated with depression. We also propose a ranking method that leverages these lexicons to prioritize users exhibiting stronger signs of depressive language on social media. Our models integrate clinical markers from established depression questionnaires, particularly the Beck Depression Inventory-II (BDI-II), enhancing explainability, generalization, and performance. Experiments across multiple social media datasets show that incorporating clinical knowledge improves user ranking and generalizes effectively across platforms. Additionally, we refine existing depression lexicons by applying weights estimated from our models, achieving better performance in generating depression-related queries. A comparative analysis of our models highlights differences in language use between control users and those with depression, aligning with prior psycholinguistic findings. This work advances the understanding of depression-related language through statistical modeling, paving the way for scalable social media interventions to identify at-risk individuals.</div></div>","PeriodicalId":52228,"journal":{"name":"Online Social Networks and Media","volume":"50 ","pages":"Article 100339"},"PeriodicalIF":2.9000,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Online Social Networks and Media","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2468696425000400","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/10/22 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"Social Sciences","Score":null,"Total":0}

引用次数: 0

Abstract

Many individuals with mental health problems turn to the internet and social media for information and support. The text generated on these platforms serves as a valuable resource for identifying mental health risks, driving interdisciplinary research to develop models for mental health analysis and prediction. In this paper, we model depression-related language using relevance-based statistical language models to create lexicons that characterize linguistic patterns associated with depression. We also propose a ranking method that leverages these lexicons to prioritize users exhibiting stronger signs of depressive language on social media. Our models integrate clinical markers from established depression questionnaires, particularly the Beck Depression Inventory-II (BDI-II), enhancing explainability, generalization, and performance. Experiments across multiple social media datasets show that incorporating clinical knowledge improves user ranking and generalizes effectively across platforms. Additionally, we refine existing depression lexicons by applying weights estimated from our models, achieving better performance in generating depression-related queries. A comparative analysis of our models highlights differences in language use between control users and those with depression, aligning with prior psycholinguistic findings. This work advances the understanding of depression-related language through statistical modeling, paving the way for scalable social media interventions to identify at-risk individuals.

Abstract Image

查看原文本刊更多论文

抑郁症是如何在社交媒体上传播的？用基于相关性的统计语言模型建模抑郁语言

许多有心理健康问题的人转向互联网和社交媒体寻求信息和支持。在这些平台上生成的文本是识别心理健康风险的宝贵资源，推动跨学科研究开发心理健康分析和预测模型。在本文中，我们使用基于相关性的统计语言模型对抑郁症相关语言进行建模，以创建表征与抑郁症相关的语言模式的词汇。我们还提出了一种排序方法，利用这些词汇来优先考虑在社交媒体上表现出更强烈抑郁语言迹象的用户。我们的模型整合了来自已建立的抑郁症问卷的临床标记，特别是贝克抑郁量表ii (BDI-II)，增强了可解释性、泛化性和性能。跨多个社交媒体数据集的实验表明，结合临床知识可以提高用户排名，并有效地跨平台推广。此外，我们通过应用从我们的模型中估计的权重来改进现有的抑郁症词汇，从而在生成抑郁症相关查询方面获得更好的性能。我们的模型对比分析强调了控制组使用者和抑郁症患者在语言使用上的差异，这与先前的心理语言学研究结果一致。这项工作通过统计建模促进了对抑郁症相关语言的理解，为可扩展的社交媒体干预措施铺平了道路，以识别有风险的个体。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊