Eliseo Bao , Anxo Perez , David Otero , Javier Parapar
{"title":"How does depression talk on social media? Modeling depression language with relevance-based statistical language models","authors":"Eliseo Bao , Anxo Perez , David Otero , Javier Parapar","doi":"10.1016/j.osnem.2025.100339","DOIUrl":null,"url":null,"abstract":"<div><div>Many individuals with mental health problems turn to the internet and social media for information and support. The text generated on these platforms serves as a valuable resource for identifying mental health risks, driving interdisciplinary research to develop models for mental health analysis and prediction. In this paper, we model depression-related language using relevance-based statistical language models to create lexicons that characterize linguistic patterns associated with depression. We also propose a ranking method that leverages these lexicons to prioritize users exhibiting stronger signs of depressive language on social media. Our models integrate clinical markers from established depression questionnaires, particularly the Beck Depression Inventory-II (BDI-II), enhancing explainability, generalization, and performance. Experiments across multiple social media datasets show that incorporating clinical knowledge improves user ranking and generalizes effectively across platforms. Additionally, we refine existing depression lexicons by applying weights estimated from our models, achieving better performance in generating depression-related queries. A comparative analysis of our models highlights differences in language use between control users and those with depression, aligning with prior psycholinguistic findings. This work advances the understanding of depression-related language through statistical modeling, paving the way for scalable social media interventions to identify at-risk individuals.</div></div>","PeriodicalId":52228,"journal":{"name":"Online Social Networks and Media","volume":"50 ","pages":"Article 100339"},"PeriodicalIF":2.9000,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Online Social Networks and Media","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2468696425000400","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/10/22 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"Social Sciences","Score":null,"Total":0}
引用次数: 0
Abstract
Many individuals with mental health problems turn to the internet and social media for information and support. The text generated on these platforms serves as a valuable resource for identifying mental health risks, driving interdisciplinary research to develop models for mental health analysis and prediction. In this paper, we model depression-related language using relevance-based statistical language models to create lexicons that characterize linguistic patterns associated with depression. We also propose a ranking method that leverages these lexicons to prioritize users exhibiting stronger signs of depressive language on social media. Our models integrate clinical markers from established depression questionnaires, particularly the Beck Depression Inventory-II (BDI-II), enhancing explainability, generalization, and performance. Experiments across multiple social media datasets show that incorporating clinical knowledge improves user ranking and generalizes effectively across platforms. Additionally, we refine existing depression lexicons by applying weights estimated from our models, achieving better performance in generating depression-related queries. A comparative analysis of our models highlights differences in language use between control users and those with depression, aligning with prior psycholinguistic findings. This work advances the understanding of depression-related language through statistical modeling, paving the way for scalable social media interventions to identify at-risk individuals.