Using Natural Language Processing Methods to Build the Hypersexuality in Bipolar Reddit Corpus: Infodemiology Study of Reddit.

IF 3.5 Q1 HEALTH CARE SCIENCES & SERVICES
JMIR infodemiology Pub Date : 2025-03-06 DOI:10.2196/65632
Daisy Harvey, Paul Rayson, Fiona Lobban, Jasper Palmier-Claus, Clare Dolman, Anne Chataigné, Steven Jones
{"title":"Using Natural Language Processing Methods to Build the Hypersexuality in Bipolar Reddit Corpus: Infodemiology Study of Reddit.","authors":"Daisy Harvey, Paul Rayson, Fiona Lobban, Jasper Palmier-Claus, Clare Dolman, Anne Chataigné, Steven Jones","doi":"10.2196/65632","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Bipolar is a severe mental health condition affecting at least 2% of the global population, with clinical observations suggesting that individuals experiencing elevated mood states, such as mania or hypomania, may have an increased propensity for engaging in risk-taking behaviors, including hypersexuality. Hypersexuality has historically been stigmatized in society and in health care provision, which makes it more difficult for service users to talk about their behaviors. There is a need for greater understanding of hypersexuality to develop better evidence-based treatment, support, and training for health professionals.</p><p><strong>Objective: </strong>This study aimed to develop and assess effective methodologies for identifying posts on Reddit related to hypersexuality posted by people with a self-reported bipolar diagnosis. Using natural language processing techniques, this research presents a specialized dataset, the Talking About Bipolar on Reddit Corpus (TABoRC). We used various computational tools to filter and categorize posts that mentioned hypersexuality, forming the Hypersexuality in Bipolar Reddit Corpus (HiB-RC). This paper introduces a novel methodology for detecting hypersexuality-related conversations on Reddit and offers both methodological insights and preliminary findings, laying the groundwork for further research in this emerging field.</p><p><strong>Methods: </strong>A toolbox of computational linguistic methods was used to create the corpora and infer demographic variables for the Redditors in the dataset. The key psychological domains in the corpus were measured using Linguistic Inquiry and Word Count, and a topic model was built using BERTopic to identify salient language clusters. This paper also discusses ethical considerations associated with this type of analysis.</p><p><strong>Results: </strong>The TABoRC is a corpus of 6,679,485 posts from 5177 Redditors, and the HiB-RC is a corpus totaling 2146 posts from 816 Redditors. The results demonstrate that, between 2012 and 2021, there was a 91.65% average yearly increase in posts in the HiB-RC (SD 119.6%) compared to 48.14% in the TABoRC (SD 51.2%) and an 86.97% average yearly increase in users (SD 93.8%) compared to 27.17% in the TABoRC (SD 38.7%). These statistics suggest that there was an increase in posting activity related to hypersexuality that exceeded the increase in general Reddit use over the same period. Several key psychological domains were identified as significant in the HiB-RC (P<.001), including more negative tone, more discussion of sex, and less discussion of wellness compared to the TABoRC. Finally, BERTopic was used to identify 9 key topics from the dataset.</p><p><strong>Conclusions: </strong>Hypersexuality is an important symptom that is discussed by people with bipolar on Reddit and needs to be systematically recognized as a symptom of this illness. This research demonstrates the utility of a computational linguistic framework and offers a high-level overview of hypersexuality in bipolar, providing empirical evidence that paves the way for a deeper understanding of hypersexuality from a lived experience perspective.</p>","PeriodicalId":73554,"journal":{"name":"JMIR infodemiology","volume":"5 ","pages":"e65632"},"PeriodicalIF":3.5000,"publicationDate":"2025-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11926447/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"JMIR infodemiology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2196/65632","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Bipolar is a severe mental health condition affecting at least 2% of the global population, with clinical observations suggesting that individuals experiencing elevated mood states, such as mania or hypomania, may have an increased propensity for engaging in risk-taking behaviors, including hypersexuality. Hypersexuality has historically been stigmatized in society and in health care provision, which makes it more difficult for service users to talk about their behaviors. There is a need for greater understanding of hypersexuality to develop better evidence-based treatment, support, and training for health professionals.

Objective: This study aimed to develop and assess effective methodologies for identifying posts on Reddit related to hypersexuality posted by people with a self-reported bipolar diagnosis. Using natural language processing techniques, this research presents a specialized dataset, the Talking About Bipolar on Reddit Corpus (TABoRC). We used various computational tools to filter and categorize posts that mentioned hypersexuality, forming the Hypersexuality in Bipolar Reddit Corpus (HiB-RC). This paper introduces a novel methodology for detecting hypersexuality-related conversations on Reddit and offers both methodological insights and preliminary findings, laying the groundwork for further research in this emerging field.

Methods: A toolbox of computational linguistic methods was used to create the corpora and infer demographic variables for the Redditors in the dataset. The key psychological domains in the corpus were measured using Linguistic Inquiry and Word Count, and a topic model was built using BERTopic to identify salient language clusters. This paper also discusses ethical considerations associated with this type of analysis.

Results: The TABoRC is a corpus of 6,679,485 posts from 5177 Redditors, and the HiB-RC is a corpus totaling 2146 posts from 816 Redditors. The results demonstrate that, between 2012 and 2021, there was a 91.65% average yearly increase in posts in the HiB-RC (SD 119.6%) compared to 48.14% in the TABoRC (SD 51.2%) and an 86.97% average yearly increase in users (SD 93.8%) compared to 27.17% in the TABoRC (SD 38.7%). These statistics suggest that there was an increase in posting activity related to hypersexuality that exceeded the increase in general Reddit use over the same period. Several key psychological domains were identified as significant in the HiB-RC (P<.001), including more negative tone, more discussion of sex, and less discussion of wellness compared to the TABoRC. Finally, BERTopic was used to identify 9 key topics from the dataset.

Conclusions: Hypersexuality is an important symptom that is discussed by people with bipolar on Reddit and needs to be systematically recognized as a symptom of this illness. This research demonstrates the utility of a computational linguistic framework and offers a high-level overview of hypersexuality in bipolar, providing empirical evidence that paves the way for a deeper understanding of hypersexuality from a lived experience perspective.

用自然语言处理方法构建双极性Reddit语料库中的性欲亢进:Reddit的信息流行病学研究。
背景:双相情感障碍是一种严重的精神健康状况,影响全球至少2%的人口,临床观察表明,情绪状态升高的个体,如躁狂或轻躁狂,可能更倾向于从事冒险行为,包括性欲亢进。在历史上,性欲亢进在社会和医疗保健中一直被污名化,这使得服务使用者更难以谈论他们的行为。有必要对性欲亢进有更深入的了解,以便为卫生专业人员提供更好的循证治疗、支持和培训。目的:本研究旨在开发和评估有效的方法,以识别自报告双相诊断的人在Reddit上发布的与性欲亢进有关的帖子。利用自然语言处理技术,本研究提出了一个专门的数据集,即Reddit上谈论双相情感障碍语料库(TABoRC)。我们使用各种计算工具对提到性欲亢进的帖子进行过滤和分类,形成了躁郁症Reddit语料库中的性欲亢进(HiB-RC)。本文介绍了一种新的方法来检测Reddit上与性欲亢进相关的对话,并提供了方法上的见解和初步发现,为这一新兴领域的进一步研究奠定了基础。方法:使用计算语言学方法工具箱来创建语料库并推断数据集中redditor的人口统计变量。使用语言探究和词计数测量语料库中的关键心理领域,并使用BERTopic构建主题模型来识别突出的语言聚类。本文还讨论了与这种类型的分析相关的伦理考虑。结果:TABoRC是一个来自5177个redditor的6,679,485个帖子的语料库,HiB-RC是一个来自816个redditor的2146个帖子的语料库。结果表明,2012 - 2021年间,HiB-RC的岗位平均年增长率为91.65% (SD为119.6%),而TABoRC为48.14% (SD为51.2%);用户平均年增长率为86.97% (SD为93.8%),而TABoRC为27.17% (SD为38.7%)。这些统计数据表明,在同一时期,与性欲亢进有关的帖子活动的增加超过了Reddit一般用户的增加。几个关键的心理领域在HiB-RC中被确定为重要的(p结论:性欲亢进是双相情感障碍患者在Reddit上讨论的一个重要症状,需要被系统地识别为这种疾病的症状。本研究展示了计算语言框架的实用性,并提供了双相情感障碍中性欲亢进的高层次概述,提供了经验证据,为从生活经验的角度更深入地理解性欲亢进铺平了道路。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
4.80
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信