The Use of Natural Language Processing Methods in Reddit to Investigate Opioid Use: Scoping Review.

IF 3.5 Q1 HEALTH CARE SCIENCES & SERVICES
JMIR infodemiology Pub Date : 2024-09-13 DOI:10.2196/51156
Alexandra Almeida, Thomas Patton, Mike Conway, Amarnath Gupta, Steffanie A Strathdee, Annick Bórquez
{"title":"The Use of Natural Language Processing Methods in Reddit to Investigate Opioid Use: Scoping Review.","authors":"Alexandra Almeida, Thomas Patton, Mike Conway, Amarnath Gupta, Steffanie A Strathdee, Annick Bórquez","doi":"10.2196/51156","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>The growing availability of big data spontaneously generated by social media platforms allows us to leverage natural language processing (NLP) methods as valuable tools to understand the opioid crisis.</p><p><strong>Objective: </strong>We aimed to understand how NLP has been applied to Reddit (Reddit Inc) data to study opioid use.</p><p><strong>Methods: </strong>We systematically searched for peer-reviewed studies and conference abstracts in PubMed, Scopus, PsycINFO, ACL Anthology, IEEE Xplore, and Association for Computing Machinery data repositories up to July 19, 2022. Inclusion criteria were studies investigating opioid use, using NLP techniques to analyze the textual corpora, and using Reddit as the social media data source. We were specifically interested in mapping studies' overarching goals and findings, methodologies and software used, and main limitations.</p><p><strong>Results: </strong>In total, 30 studies were included, which were classified into 4 nonmutually exclusive overarching goal categories: methodological (n=6, 20% studies), infodemiology (n=22, 73% studies), infoveillance (n=7, 23% studies), and pharmacovigilance (n=3, 10% studies). NLP methods were used to identify content relevant to opioid use among vast quantities of textual data, to establish potential relationships between opioid use patterns or profiles and contextual factors or comorbidities, and to anticipate individuals' transitions between different opioid-related subreddits, likely revealing progression through opioid use stages. Most studies used an embedding technique (12/30, 40%), prediction or classification approach (12/30, 40%), topic modeling (9/30, 30%), and sentiment analysis (6/30, 20%). The most frequently used programming languages were Python (20/30, 67%) and R (2/30, 7%). Among the studies that reported limitations (20/30, 67%), the most cited was the uncertainty regarding whether redditors participating in these forums were representative of people who use opioids (8/20, 40%). The papers were very recent (28/30, 93%), from 2019 to 2022, with authors from a range of disciplines.</p><p><strong>Conclusions: </strong>This scoping review identified a wide variety of NLP techniques and applications used to support surveillance and social media interventions addressing the opioid crisis. Despite the clear potential of these methods to enable the identification of opioid-relevant content in Reddit and its analysis, there are limits to the degree of interpretive meaning that they can provide. Moreover, we identified the need for standardized ethical guidelines to govern the use of Reddit data to safeguard the anonymity and privacy of people using these forums.</p>","PeriodicalId":73554,"journal":{"name":"JMIR infodemiology","volume":null,"pages":null},"PeriodicalIF":3.5000,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"JMIR infodemiology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2196/51156","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
引用次数: 0

Abstract

Background: The growing availability of big data spontaneously generated by social media platforms allows us to leverage natural language processing (NLP) methods as valuable tools to understand the opioid crisis.

Objective: We aimed to understand how NLP has been applied to Reddit (Reddit Inc) data to study opioid use.

Methods: We systematically searched for peer-reviewed studies and conference abstracts in PubMed, Scopus, PsycINFO, ACL Anthology, IEEE Xplore, and Association for Computing Machinery data repositories up to July 19, 2022. Inclusion criteria were studies investigating opioid use, using NLP techniques to analyze the textual corpora, and using Reddit as the social media data source. We were specifically interested in mapping studies' overarching goals and findings, methodologies and software used, and main limitations.

Results: In total, 30 studies were included, which were classified into 4 nonmutually exclusive overarching goal categories: methodological (n=6, 20% studies), infodemiology (n=22, 73% studies), infoveillance (n=7, 23% studies), and pharmacovigilance (n=3, 10% studies). NLP methods were used to identify content relevant to opioid use among vast quantities of textual data, to establish potential relationships between opioid use patterns or profiles and contextual factors or comorbidities, and to anticipate individuals' transitions between different opioid-related subreddits, likely revealing progression through opioid use stages. Most studies used an embedding technique (12/30, 40%), prediction or classification approach (12/30, 40%), topic modeling (9/30, 30%), and sentiment analysis (6/30, 20%). The most frequently used programming languages were Python (20/30, 67%) and R (2/30, 7%). Among the studies that reported limitations (20/30, 67%), the most cited was the uncertainty regarding whether redditors participating in these forums were representative of people who use opioids (8/20, 40%). The papers were very recent (28/30, 93%), from 2019 to 2022, with authors from a range of disciplines.

Conclusions: This scoping review identified a wide variety of NLP techniques and applications used to support surveillance and social media interventions addressing the opioid crisis. Despite the clear potential of these methods to enable the identification of opioid-relevant content in Reddit and its analysis, there are limits to the degree of interpretive meaning that they can provide. Moreover, we identified the need for standardized ethical guidelines to govern the use of Reddit data to safeguard the anonymity and privacy of people using these forums.

在 Reddit 中使用自然语言处理方法调查阿片类药物使用情况:范围界定综述》(The Use of Natural Language Processing Methods in Reddit to Investigate Opioid Use: Scoping Review)。
背景:社交媒体平台自发产生的大数据越来越多,我们可以利用自然语言处理(NLP)方法作为了解阿片类药物危机的宝贵工具:我们旨在了解如何将 NLP 应用于 Reddit(Reddit 公司)数据,以研究阿片类药物的使用情况:我们在 PubMed、Scopus、PsycINFO、ACL Anthology、IEEE Xplore 和计算机械协会数据资源库中系统地搜索了截至 2022 年 7 月 19 日的同行评审研究和会议摘要。纳入标准是调查阿片类药物使用情况的研究,使用 NLP 技术分析文本语料库,并使用 Reddit 作为社交媒体数据源。我们特别关注研究的总体目标和发现、使用的方法和软件以及主要局限性:共纳入了 30 项研究,这些研究分为 4 个互不排斥的总体目标类别:方法学(6 项,占 20%)、信息病理学(22 项,占 73%)、信息监测(7 项,占 23%)和药物警戒(3 项,占 10%)。NLP 方法用于在大量文本数据中识别与阿片类药物使用相关的内容,建立阿片类药物使用模式或概况与背景因素或合并症之间的潜在关系,并预测个人在不同阿片类药物相关子论坛之间的转换,从而揭示阿片类药物使用阶段的进展。大多数研究使用了嵌入技术(12/30,40%)、预测或分类方法(12/30,40%)、主题建模(9/30,30%)和情感分析(6/30,20%)。最常用的编程语言是 Python(20/30,67%)和 R(2/30,7%)。在报告局限性的研究中(20/30,67%),提到最多的是不确定参与这些论坛的红人是否能代表阿片类药物使用者(8/20,40%)。这些论文都是近期发表的(28/30,93%),时间从2019年到2022年,作者来自不同学科:本次范围界定综述发现了用于支持应对阿片类药物危机的监控和社交媒体干预的各种 NLP 技术和应用。尽管这些方法在识别 Reddit 中与阿片类药物相关的内容并对其进行分析方面具有明显的潜力,但它们所能提供的解释性意义程度仍有局限。此外,我们还发现有必要制定标准化的道德准则来规范 Reddit 数据的使用,以保护使用这些论坛的用户的匿名性和隐私。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
4.80
自引率
0.00%
发文量
0
文献相关原料
公司名称 产品信息 采购帮参考价格
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信