Deduplicating the FDA adverse event reporting system with a novel application of network-based grouping

IF 4 2区 医学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS
Kory Kreimeyer , Jonathan Spiker , Oanh Dang , Suranjan De , Robert Ball , Taxiarchis Botsis
{"title":"Deduplicating the FDA adverse event reporting system with a novel application of network-based grouping","authors":"Kory Kreimeyer ,&nbsp;Jonathan Spiker ,&nbsp;Oanh Dang ,&nbsp;Suranjan De ,&nbsp;Robert Ball ,&nbsp;Taxiarchis Botsis","doi":"10.1016/j.jbi.2025.104824","DOIUrl":null,"url":null,"abstract":"<div><h3>Objective</h3><div>To improve the reliability of data mining for product safety concerns in the Food and Drug Administration’s (FDA) Adverse Event Reporting System (FAERS) by robustly identifying duplicate reports describing the same patient experience.</div></div><div><h3>Materials and methods</h3><div>A duplicate detection algorithm based on a probabilistic record linkage algorithm, including features extracted from report narratives, and designed to support FAERS case safety review as part of the Information Visualization Platform (InfoViP) has been upgraded into a full deduplication pipeline for the entire FAERS database. The pipeline contains several new and updated components, including a network analysis-based community detection routine for breaking up sparsely connected groups of duplicates constructed from chains of pairwise comparisons. The pipeline was applied to all 29 million FAERS reports to assemble groups of duplicate cases.</div></div><div><h3>Results</h3><div>The pipeline was evaluated on 12 human expert adjudicated data sets with a total of 2300 reports and was found to have better overall performance than the current tool used at the FDA for labeling duplicates on 10 of them, with F1 scores ranging from 0.36 to 0.93, with half above 0.75. Because minimizing false discovery increases human expert review efficiency, the improved deduplication pipeline was applied to all historic and daily incoming FAERS reports at FDA and identified about 5 million reports as duplicates.</div></div><div><h3>Conclusions</h3><div>The InfoViP deduplication pipeline is operating at FDA to identify duplicate case reports in FAERS and provide deduplicated input for improved efficiency and accuracy of safety review operations like adverse event data mining calculations.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"165 ","pages":"Article 104824"},"PeriodicalIF":4.0000,"publicationDate":"2025-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Biomedical Informatics","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S153204642500053X","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0

Abstract

Objective

To improve the reliability of data mining for product safety concerns in the Food and Drug Administration’s (FDA) Adverse Event Reporting System (FAERS) by robustly identifying duplicate reports describing the same patient experience.

Materials and methods

A duplicate detection algorithm based on a probabilistic record linkage algorithm, including features extracted from report narratives, and designed to support FAERS case safety review as part of the Information Visualization Platform (InfoViP) has been upgraded into a full deduplication pipeline for the entire FAERS database. The pipeline contains several new and updated components, including a network analysis-based community detection routine for breaking up sparsely connected groups of duplicates constructed from chains of pairwise comparisons. The pipeline was applied to all 29 million FAERS reports to assemble groups of duplicate cases.

Results

The pipeline was evaluated on 12 human expert adjudicated data sets with a total of 2300 reports and was found to have better overall performance than the current tool used at the FDA for labeling duplicates on 10 of them, with F1 scores ranging from 0.36 to 0.93, with half above 0.75. Because minimizing false discovery increases human expert review efficiency, the improved deduplication pipeline was applied to all historic and daily incoming FAERS reports at FDA and identified about 5 million reports as duplicates.

Conclusions

The InfoViP deduplication pipeline is operating at FDA to identify duplicate case reports in FAERS and provide deduplicated input for improved efficiency and accuracy of safety review operations like adverse event data mining calculations.

Abstract Image

基于网络分组的新应用使FDA不良事件报告系统重复
目的通过识别描述相同患者经历的重复报告,提高美国食品药品监督管理局(FDA)不良事件报告系统(FAERS)中产品安全问题数据挖掘的可靠性。材料和方法基于概率记录链接算法的重复检测算法,包括从报告叙述中提取的特征,作为信息可视化平台(InfoViP)的一部分,旨在支持FAERS案例安全审查,现已升级为整个FAERS数据库的完整重复数据删除管道。该管道包含几个新的和更新的组件,包括一个基于网络分析的社区检测例程,用于分解由成对比较链构建的稀疏连接的重复组。该管道应用于所有2900万FAERS报告,以收集重复病例组。结果:该管道在12个人类专家评审的数据集上进行了评估,共有2300份报告,发现其整体性能优于FDA目前使用的10个重复标记工具,F1得分在0.36至0.93之间,其中一半高于0.75。由于最大限度地减少错误发现可以提高人类专家的审查效率,因此改进的重复数据删除管道应用于FDA所有历史和每日传入的FAERS报告,并确定了大约500万份重复报告。InfoViP重复数据删除管道正在FDA运行,用于识别FAERS中的重复病例报告,并提供重复数据删除输入,以提高安全审查操作(如不良事件数据挖掘计算)的效率和准确性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Journal of Biomedical Informatics
Journal of Biomedical Informatics 医学-计算机:跨学科应用
CiteScore
8.90
自引率
6.70%
发文量
243
审稿时长
32 days
期刊介绍: The Journal of Biomedical Informatics reflects a commitment to high-quality original research papers, reviews, and commentaries in the area of biomedical informatics methodology. Although we publish articles motivated by applications in the biomedical sciences (for example, clinical medicine, health care, population health, and translational bioinformatics), the journal emphasizes reports of new methodologies and techniques that have general applicability and that form the basis for the evolving science of biomedical informatics. Articles on medical devices; evaluations of implemented systems (including clinical trials of information technologies); or papers that provide insight into a biological process, a specific disease, or treatment options would generally be more suitable for publication in other venues. Papers on applications of signal processing and image analysis are often more suitable for biomedical engineering journals or other informatics journals, although we do publish papers that emphasize the information management and knowledge representation/modeling issues that arise in the storage and use of biological signals and images. System descriptions are welcome if they illustrate and substantiate the underlying methodology that is the principal focus of the report and an effort is made to address the generalizability and/or range of application of that methodology. Note also that, given the international nature of JBI, papers that deal with specific languages other than English, or with country-specific health systems or approaches, are acceptable for JBI only if they offer generalizable lessons that are relevant to the broad JBI readership, regardless of their country, language, culture, or health system.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信