A systematic approach to 'cleaning' of drug name records data in the FAERS database: a case report

Michael A. Veronin, Robert P. Schumaker, R. Dixit, Pooja Dhake, Morgan Ogwo
{"title":"A systematic approach to 'cleaning' of drug name records data in the FAERS database: a case report","authors":"Michael A. Veronin, Robert P. Schumaker, R. Dixit, Pooja Dhake, Morgan Ogwo","doi":"10.1504/ijbdm.2020.10034546","DOIUrl":null,"url":null,"abstract":"Data 'cleaning', also known as data 'cleansing', or data 'curation' is about identifying and rectifying errors in data. The objective of this report is to present a data cleaning and standardisation process for the drug name files in the U.S. Food and Drug Administration adverse event reporting system database, FAERS. Drug name data was cleaned and standardised using a combination of data cleaning tools and manual correction techniques. Data files were organised into frequency intervals and a strategy of cleaning using iteration and programming scripts in the MySQL Workbench was employed. The download of the FAERS quarterly reports for the time periods ranging from Q1 2004 to Q3 2016 resulted in 32,736,657 DRUG file records. Records contained a variety of errors, such as misspellings, abbreviations and non-descript or ambiguous names. Upon completion of the process, standardisation of greater than 95% of the drug name data in the FAERS database was achieved. With large datasets such as FAERS, a cleaning process is necessary to rectify data that may be incomplete or inaccurate due to input errors, in order to improve the quality and validity of information.","PeriodicalId":158664,"journal":{"name":"International Journal of Big Data Management","volume":"75 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-12-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Big Data Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1504/ijbdm.2020.10034546","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

Data 'cleaning', also known as data 'cleansing', or data 'curation' is about identifying and rectifying errors in data. The objective of this report is to present a data cleaning and standardisation process for the drug name files in the U.S. Food and Drug Administration adverse event reporting system database, FAERS. Drug name data was cleaned and standardised using a combination of data cleaning tools and manual correction techniques. Data files were organised into frequency intervals and a strategy of cleaning using iteration and programming scripts in the MySQL Workbench was employed. The download of the FAERS quarterly reports for the time periods ranging from Q1 2004 to Q3 2016 resulted in 32,736,657 DRUG file records. Records contained a variety of errors, such as misspellings, abbreviations and non-descript or ambiguous names. Upon completion of the process, standardisation of greater than 95% of the drug name data in the FAERS database was achieved. With large datasets such as FAERS, a cleaning process is necessary to rectify data that may be incomplete or inaccurate due to input errors, in order to improve the quality and validity of information.
“清理”FAERS数据库中药品名称记录数据的系统方法:一份病例报告
数据“清理”,也称为数据“清理”或数据“管理”,是关于识别和纠正数据中的错误。本报告的目的是介绍美国食品和药物管理局不良事件报告系统数据库FAERS中药品名称文件的数据清理和标准化过程。使用数据清理工具和人工校正技术对药名数据进行清理和标准化。数据文件被组织成频率间隔,并使用MySQL Workbench中的迭代和编程脚本进行清理。从2004年第一季度到2016年第三季度的FAERS季度报告的下载产生了32,736,657个药物文件记录。记录中包含各种各样的错误,如拼写错误、缩写和非描述性或模棱两可的名称。该过程完成后,FAERS数据库中95%以上的药名数据实现了标准化。对于FAERS这样的大型数据集,为了提高信息的质量和有效性,有必要对由于输入错误而可能不完整或不准确的数据进行清理。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信