A platform for connecting social media data to domain-specific topics using large language models: an application to student mental health.

IF 2.5 Q2 HEALTH CARE SCIENCES & SERVICES
JAMIA Open Pub Date : 2024-01-18 eCollection Date: 2024-04-01 DOI:10.1093/jamiaopen/ooae001
Leonard Ruocco, Yuqian Zhuang, Raymond Ng, Richard J Munthali, Kristen L Hudec, Angel Y Wang, Melissa Vereschagin, Daniel V Vigo
{"title":"A platform for connecting social media data to domain-specific topics using large language models: an application to student mental health.","authors":"Leonard Ruocco, Yuqian Zhuang, Raymond Ng, Richard J Munthali, Kristen L Hudec, Angel Y Wang, Melissa Vereschagin, Daniel V Vigo","doi":"10.1093/jamiaopen/ooae001","DOIUrl":null,"url":null,"abstract":"<p><strong>Objectives: </strong>To design a novel artificial intelligence-based software platform that allows users to analyze text data by identifying various coherent topics and parts of the data related to a specific research theme-of-interest (TOI).</p><p><strong>Materials and methods: </strong>Our platform uses state-of-the-art unsupervised natural language processing methods, building on top of a large language model, to analyze social media text data. At the center of the platform's functionality is BERTopic, which clusters social media posts, forming collections of words representing distinct topics. A key feature of our platform is its ability to identify whole sentences corresponding to topic words, vastly improving the platform's ability to perform downstream similarity operations with respect to a user-defined TOI.</p><p><strong>Results: </strong>Two case studies on mental health among university students are performed to demonstrate the utility of the platform, focusing on signals within social media (Reddit) data related to depression and their connection to various emergent themes within the data.</p><p><strong>Discussion and conclusion: </strong>Our platform provides researchers with a readily available and inexpensive tool to parse large quantities of unstructured, noisy data into coherent themes, as well as identifying portions of the data related to the research TOI. While the development process for the platform was focused on mental health themes, we believe it to be generalizable to other domains of research as well.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"7 1","pages":"ooae001"},"PeriodicalIF":2.5000,"publicationDate":"2024-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10799551/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"JAMIA Open","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/jamiaopen/ooae001","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/4/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
引用次数: 0

Abstract

Objectives: To design a novel artificial intelligence-based software platform that allows users to analyze text data by identifying various coherent topics and parts of the data related to a specific research theme-of-interest (TOI).

Materials and methods: Our platform uses state-of-the-art unsupervised natural language processing methods, building on top of a large language model, to analyze social media text data. At the center of the platform's functionality is BERTopic, which clusters social media posts, forming collections of words representing distinct topics. A key feature of our platform is its ability to identify whole sentences corresponding to topic words, vastly improving the platform's ability to perform downstream similarity operations with respect to a user-defined TOI.

Results: Two case studies on mental health among university students are performed to demonstrate the utility of the platform, focusing on signals within social media (Reddit) data related to depression and their connection to various emergent themes within the data.

Discussion and conclusion: Our platform provides researchers with a readily available and inexpensive tool to parse large quantities of unstructured, noisy data into coherent themes, as well as identifying portions of the data related to the research TOI. While the development process for the platform was focused on mental health themes, we believe it to be generalizable to other domains of research as well.

利用大型语言模型将社交媒体数据与特定领域主题联系起来的平台:学生心理健康应用。
目的:设计一种基于人工智能的新型软件平台,使用户能够通过识别与特定研究兴趣主题(TOI)相关的各种连贯主题和部分数据来分析文本数据:设计一个基于人工智能的新型软件平台,使用户能够通过识别与特定研究兴趣主题(TOI)相关的各种连贯主题和数据部分来分析文本数据:我们的平台采用最先进的无监督自然语言处理方法,在大型语言模型的基础上分析社交媒体文本数据。该平台的核心功能是 BERTopic,它可以对社交媒体帖子进行聚类,形成代表不同主题的词语集合。我们平台的一个主要特点是能够识别与主题词相对应的整句,从而大大提高了平台针对用户定义的TOI执行下游相似性操作的能力:结果:我们进行了两项关于大学生心理健康的案例研究,重点研究了社交媒体(Reddit)数据中与抑郁症相关的信号及其与数据中各种新兴主题的联系,从而展示了该平台的实用性:我们的平台为研究人员提供了一个随时可用且成本低廉的工具,用于将大量非结构化的嘈杂数据解析为连贯的主题,以及识别与研究TOI相关的数据部分。虽然该平台的开发过程侧重于心理健康主题,但我们相信它也可以推广到其他研究领域。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
JAMIA Open
JAMIA Open Medicine-Health Informatics
CiteScore
4.10
自引率
4.80%
发文量
102
审稿时长
16 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信