沃尔多：从非结构化自我报告中自动发现不良事件。

IF 7.7

PLOS digital health Pub Date : 2025-09-30 eCollection Date: 2025-09-01 DOI:10.1371/journal.pdig.0001011

Karan S Desai, Vijay M Tiyyala, Pranav Tiyyala, Atharva Yeola, Alejandra Gallegos-Rangel, Alejandro Montiel-Torres, Matthew R Allen, Mark Dredze, Ryan G Vandrey, Johannes Thrul, Eric C Leas, Mike Hogarth, Davey M Smith, John W Ayers

{"title":"沃尔多：从非结构化自我报告中自动发现不良事件。","authors":"Karan S Desai, Vijay M Tiyyala, Pranav Tiyyala, Atharva Yeola, Alejandra Gallegos-Rangel, Alejandro Montiel-Torres, Matthew R Allen, Mark Dredze, Ryan G Vandrey, Johannes Thrul, Eric C Leas, Mike Hogarth, Davey M Smith, John W Ayers","doi":"10.1371/journal.pdig.0001011","DOIUrl":null,"url":null,"abstract":"Adverse event (AE) detection is labor-intensive and costly given the task is to find rare events. Automated solutions to enhance efficiency, reduce costs, and capture unnoticed safety signals are needed. To develop and evaluate an automated machine learning tool, \"Waldo,\" for AE detection from unstructured social media text data, specifically targeting consumer health products that lack traditional post-market surveillance channels. We tested three models - (i) N-gram model, (ii) BERT (Bidirectional Encoder Representations from Transformers), and (iii) RoBERTa (Robustly optimized BERT approach) - trained on 10,000 previously published unstructured reports on cannabis-derived products (CDPs) annotated by humans for the presence of adverse events to determine the best-performing AE detection method. This method was then benchmarked against an AI chatbot (ChatGPT: gpt-3.5-turbo-0613) and applied to previously unstudied user narratives about CDPs from 20 subreddits.RoBERTa demonstrated the highest accuracy at 99.7%, hereafter referred to as Waldo, with 22 false positives and 12 false negatives, yielding an F1-score of 95.1% for the positive class. In contrast, the chatbot had an accuracy of 94.4%, with 401 false positives (18.23-fold more than Waldo) and 163 false negatives (13.58-fold more than Waldo), yielding an F1-score of 38% for the positive class. Applying Waldo to 437,132 posts identified 28,832 potential AEs. The subreddit r/Marijuana had the highest AE rate (12.7%), followed by r/weed (10.5%) and r/AskTrees (10.0%). r/weedstocks (0.1%), r/macrogrowery (0.2%), and r/weedbiz (0.2%) had the lowest rates of potential AEs. Waldo addresses critical gaps in safety surveillance for unregulated consumer health products by automatically detecting adverse events from social media-a capability absent in traditional industry systems. Unlike existing approaches limited to structured databases or narrow domains, Waldo processes informal user narratives at scale with high precision. We have open-sourced Waldo for immediate application by the health community [https://waldo-ae-detection.github.io/WALDO/].","PeriodicalId":74465,"journal":{"name":"PLOS digital health","volume":"4 9","pages":"e0001011"},"PeriodicalIF":7.7000,"publicationDate":"2025-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12483200/pdf/","citationCount":"0","resultStr":"{\"title\":\"Waldo: Automated discovery of adverse events from unstructured self reports.\",\"authors\":\"Karan S Desai, Vijay M Tiyyala, Pranav Tiyyala, Atharva Yeola, Alejandra Gallegos-Rangel, Alejandro Montiel-Torres, Matthew R Allen, Mark Dredze, Ryan G Vandrey, Johannes Thrul, Eric C Leas, Mike Hogarth, Davey M Smith, John W Ayers\",\"doi\":\"10.1371/journal.pdig.0001011\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Adverse event (AE) detection is labor-intensive and costly given the task is to find rare events. Automated solutions to enhance efficiency, reduce costs, and capture unnoticed safety signals are needed. To develop and evaluate an automated machine learning tool, \\\"Waldo,\\\" for AE detection from unstructured social media text data, specifically targeting consumer health products that lack traditional post-market surveillance channels. We tested three models - (i) N-gram model, (ii) BERT (Bidirectional Encoder Representations from Transformers), and (iii) RoBERTa (Robustly optimized BERT approach) - trained on 10,000 previously published unstructured reports on cannabis-derived products (CDPs) annotated by humans for the presence of adverse events to determine the best-performing AE detection method. This method was then benchmarked against an AI chatbot (ChatGPT: gpt-3.5-turbo-0613) and applied to previously unstudied user narratives about CDPs from 20 subreddits.RoBERTa demonstrated the highest accuracy at 99.7%, hereafter referred to as Waldo, with 22 false positives and 12 false negatives, yielding an F1-score of 95.1% for the positive class. In contrast, the chatbot had an accuracy of 94.4%, with 401 false positives (18.23-fold more than Waldo) and 163 false negatives (13.58-fold more than Waldo), yielding an F1-score of 38% for the positive class. Applying Waldo to 437,132 posts identified 28,832 potential AEs. The subreddit r/Marijuana had the highest AE rate (12.7%), followed by r/weed (10.5%) and r/AskTrees (10.0%). r/weedstocks (0.1%), r/macrogrowery (0.2%), and r/weedbiz (0.2%) had the lowest rates of potential AEs. Waldo addresses critical gaps in safety surveillance for unregulated consumer health products by automatically detecting adverse events from social media-a capability absent in traditional industry systems. Unlike existing approaches limited to structured databases or narrow domains, Waldo processes informal user narratives at scale with high precision. We have open-sourced Waldo for immediate application by the health community [https://waldo-ae-detection.github.io/WALDO/].\",\"PeriodicalId\":74465,\"journal\":{\"name\":\"PLOS digital health\",\"volume\":\"4 9\",\"pages\":\"e0001011\"},\"PeriodicalIF\":7.7000,\"publicationDate\":\"2025-09-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12483200/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"PLOS digital health\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1371/journal.pdig.0001011\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/9/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"PLOS digital health","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1371/journal.pdig.0001011","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/9/1 0:00:00","PubModel":"eCollection","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

不良事件（AE）检测是一项劳动密集型和昂贵的工作，因为它的任务是发现罕见的事件。自动化解决方案可以提高效率、降低成本，并捕捉未被注意到的安全信号。开发和评估一种自动机器学习工具“Waldo”，用于从非结构化社交媒体文本数据中检测AE，特别是针对缺乏传统上市后监测渠道的消费者健康产品。我们测试了三个模型- (i) N-gram模型，（ii） BERT（来自变形金刚的双向编码器表示）和（iii） RoBERTa（稳健优化的BERT方法）-训练了10,000个先前发表的关于人类注释的大麻衍生产品（cdp）的非结构化报告，以确定是否存在不良事件，以确定性能最佳的AE检测方法。然后将该方法与人工智能聊天机器人（ChatGPT: gpt-3.5-turbo-0613）进行基准测试，并将其应用于以前未研究过的来自20个子reddit的关于cdp的用户叙述。RoBERTa的准确率最高，达到99.7%，以下简称Waldo，有22个假阳性和12个假阴性，阳性类别的f1得分为95.1%。相比之下，聊天机器人的准确率为94.4%，有401个假阳性（比Waldo多18.23倍）和163个假阴性（比Waldo多13.58倍），阳性类的f1得分为38%。将Waldo应用于437,132个帖子，确定了28,832个潜在ae。次级reddit r/ cannabis的AE率最高（12.7%），其次是r/weed（10.5%）和r/AskTrees（10.0%）。r/weedstocks（0.1%）、r/macrogrowery（0.2%）和r/weedbiz（0.2%）的潜在ae发生率最低。Waldo通过自动检测来自社交媒体的不良事件（传统行业系统所缺乏的能力），解决了不受监管的消费者健康产品安全监测方面的关键空白。与现有的仅限于结构化数据库或狭窄领域的方法不同，Waldo可以高精度地大规模处理非正式的用户叙述。我们已经开源了Waldo，供健康社区立即应用[https://waldo-ae-detection.github.io/WALDO/]]。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Waldo: Automated discovery of adverse events from unstructured self reports.

查看原文本刊更多论文

Waldo: Automated discovery of adverse events from unstructured self reports.

Adverse event (AE) detection is labor-intensive and costly given the task is to find rare events. Automated solutions to enhance efficiency, reduce costs, and capture unnoticed safety signals are needed. To develop and evaluate an automated machine learning tool, "Waldo," for AE detection from unstructured social media text data, specifically targeting consumer health products that lack traditional post-market surveillance channels. We tested three models - (i) N-gram model, (ii) BERT (Bidirectional Encoder Representations from Transformers), and (iii) RoBERTa (Robustly optimized BERT approach) - trained on 10,000 previously published unstructured reports on cannabis-derived products (CDPs) annotated by humans for the presence of adverse events to determine the best-performing AE detection method. This method was then benchmarked against an AI chatbot (ChatGPT: gpt-3.5-turbo-0613) and applied to previously unstudied user narratives about CDPs from 20 subreddits.RoBERTa demonstrated the highest accuracy at 99.7%, hereafter referred to as Waldo, with 22 false positives and 12 false negatives, yielding an F1-score of 95.1% for the positive class. In contrast, the chatbot had an accuracy of 94.4%, with 401 false positives (18.23-fold more than Waldo) and 163 false negatives (13.58-fold more than Waldo), yielding an F1-score of 38% for the positive class. Applying Waldo to 437,132 posts identified 28,832 potential AEs. The subreddit r/Marijuana had the highest AE rate (12.7%), followed by r/weed (10.5%) and r/AskTrees (10.0%). r/weedstocks (0.1%), r/macrogrowery (0.2%), and r/weedbiz (0.2%) had the lowest rates of potential AEs. Waldo addresses critical gaps in safety surveillance for unregulated consumer health products by automatically detecting adverse events from social media-a capability absent in traditional industry systems. Unlike existing approaches limited to structured databases or narrow domains, Waldo processes informal user narratives at scale with high precision. We have open-sourced Waldo for immediate application by the health community [https://waldo-ae-detection.github.io/WALDO/].

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

PLOS digital health

自引率

0.00%

发文量