Data Duplication and Errors in Large Medical Data Sets: A Case Study in the IRIS® Registry

IF 4.6 Q1 OPHTHALMOLOGY
Eric A. Goldberg MS, Connor J. Ross BS, Vivian Paraskevi Douglas MD, DVM, Alexander Ivanov MS, Tobias Elze PhD, Joan W. Miller MD, Alice C. Lorch MD, MPH
{"title":"Data Duplication and Errors in Large Medical Data Sets: A Case Study in the IRIS® Registry","authors":"Eric A. Goldberg MS,&nbsp;Connor J. Ross BS,&nbsp;Vivian Paraskevi Douglas MD, DVM,&nbsp;Alexander Ivanov MS,&nbsp;Tobias Elze PhD,&nbsp;Joan W. Miller MD,&nbsp;Alice C. Lorch MD, MPH","doi":"10.1016/j.xops.2025.100933","DOIUrl":null,"url":null,"abstract":"<div><h3>Purpose</h3><div>To investigate entry errors and data duplication within the American Academy of Ophthalmology IRIS® Registry (Intelligent Research in Sight) utilizing cataract surgery (CS), neodymium-doped: yttrium aluminum garnet (YAG) capsulotomy, age-related macular degeneration (AMD), and diabetic retinopathy (DR) records.</div></div><div><h3>Design</h3><div>Retrospective cohort study.</div></div><div><h3>Participants</h3><div>Patients in the IRIS Registry.</div></div><div><h3>Methods</h3><div>We collected records of CS and YAG capsulotomy with specified laterality within the IRIS Registry (years 2013–2023), identifying eyes having &gt;1 record and eyes having ≥1 record <em>on a date after the first entry</em> (different date duplication, <em>D</em><sub><em>d</em></sub>). Additionally, we identified eyes amongst records of DR and AMD with (1) a diagnosis indicating a more severe stage then reversion to the less severe stage or (2) a transition to a more severe stage before later being diagnosed with the less severe stage, defined as transition errors. We investigated potential predictors of <em>D</em><sub><em>d</em></sub> and transition errors among patient and practice characteristics by evaluating the permutation feature importance (PFI) of classification models.</div></div><div><h3>Main Outcome Measures</h3><div>For CS and YAG capsulotomy, we measure the proportion of eyes having &gt;1 procedure record, having &gt;1 record only on the initial procedure date, and having ≥1 procedure record on a date after the first entry. For DR and AMD, we measure the proportion of eyes reverting to an earlier stage after starting at a later stage and the proportion reverting to an earlier stage after transitioning to a later stage.</div></div><div><h3>Results</h3><div>Of the 14 718 896 CS-treated eyes, 30.9% had duplicates, with 5.5% having <em>D</em><sub><em>d</em></sub>. For YAG capsulotomy, out of 5 113 679 eyes, 29.1% had duplicates, with 4.1% having <em>D</em><sub><em>d</em></sub>. For AMD and DR, 13.6% and 12.7% of eyes, respectively, exhibited transition errors. Models captured a relationship between the eye’s first practice on record and the data errors under study, indicated by F1-loss = 0.230 (<em>D</em><sub><em>d</em></sub> model), 0.062 (transition error model) on average by PFI.</div></div><div><h3>Conclusions</h3><div>Data duplication in large medical data sets necessitates caution when analyzing repeated procedures or relapsing conditions. Addressing problematic errors requires transparency and communication amongst stakeholders across organizations. Within the IRIS Registry, the results indicated an association between the first record’s originating practice and data errors, providing an investigative entry point for upstream data stewards.</div></div><div><h3>Financial Disclosure(s)</h3><div>Proprietary or commercial disclosure may be found in the Footnotes and Disclosures at the end of this article.</div></div>","PeriodicalId":74363,"journal":{"name":"Ophthalmology science","volume":"6 1","pages":"Article 100933"},"PeriodicalIF":4.6000,"publicationDate":"2025-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Ophthalmology science","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666914525002313","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"OPHTHALMOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Purpose

To investigate entry errors and data duplication within the American Academy of Ophthalmology IRIS® Registry (Intelligent Research in Sight) utilizing cataract surgery (CS), neodymium-doped: yttrium aluminum garnet (YAG) capsulotomy, age-related macular degeneration (AMD), and diabetic retinopathy (DR) records.

Design

Retrospective cohort study.

Participants

Patients in the IRIS Registry.

Methods

We collected records of CS and YAG capsulotomy with specified laterality within the IRIS Registry (years 2013–2023), identifying eyes having >1 record and eyes having ≥1 record on a date after the first entry (different date duplication, Dd). Additionally, we identified eyes amongst records of DR and AMD with (1) a diagnosis indicating a more severe stage then reversion to the less severe stage or (2) a transition to a more severe stage before later being diagnosed with the less severe stage, defined as transition errors. We investigated potential predictors of Dd and transition errors among patient and practice characteristics by evaluating the permutation feature importance (PFI) of classification models.

Main Outcome Measures

For CS and YAG capsulotomy, we measure the proportion of eyes having >1 procedure record, having >1 record only on the initial procedure date, and having ≥1 procedure record on a date after the first entry. For DR and AMD, we measure the proportion of eyes reverting to an earlier stage after starting at a later stage and the proportion reverting to an earlier stage after transitioning to a later stage.

Results

Of the 14 718 896 CS-treated eyes, 30.9% had duplicates, with 5.5% having Dd. For YAG capsulotomy, out of 5 113 679 eyes, 29.1% had duplicates, with 4.1% having Dd. For AMD and DR, 13.6% and 12.7% of eyes, respectively, exhibited transition errors. Models captured a relationship between the eye’s first practice on record and the data errors under study, indicated by F1-loss = 0.230 (Dd model), 0.062 (transition error model) on average by PFI.

Conclusions

Data duplication in large medical data sets necessitates caution when analyzing repeated procedures or relapsing conditions. Addressing problematic errors requires transparency and communication amongst stakeholders across organizations. Within the IRIS Registry, the results indicated an association between the first record’s originating practice and data errors, providing an investigative entry point for upstream data stewards.

Financial Disclosure(s)

Proprietary or commercial disclosure may be found in the Footnotes and Disclosures at the end of this article.
大型医疗数据集中的数据重复和错误:IRIS®注册中心的案例研究
目的:利用白内障手术(CS)、掺钕钇铝石榴石(YAG)囊切术、年龄相关性黄斑变性(AMD)和糖尿病视网膜病变(DR)记录,调查美国眼科学会IRIS®注册表(视力智能研究)中的输入错误和数据重复。设计回顾性队列研究。“综合注册资讯系统”注册系统的参与者。方法收集IRIS注册表(2013-2023年)中具有特定侧位的CS和YAG囊切开术记录,识别首次登记后日期有1条记录和≥1条记录的眼睛(不同日期重复,Dd)。此外,我们在DR和AMD的记录中发现了:(1)诊断表明较严重的阶段,然后恢复到较轻的阶段,或(2)过渡到较严重的阶段,然后被诊断为较轻的阶段,定义为过渡错误。我们通过评估分类模型的排列特征重要性(PFI)来研究患者和实践特征之间Dd和过渡误差的潜在预测因素。对于CS和YAG囊膜切开术,我们测量了有1次手术记录的眼睛的比例,仅在初次手术日期有1次手术记录的眼睛的比例,以及在首次进入日期后有≥1次手术记录的眼睛的比例。对于DR和AMD,我们测量了从较晚阶段开始的眼睛恢复到较早阶段的比例,以及过渡到较晚阶段后恢复到较早阶段的比例。结果cs治疗的14 718 896只眼中,30.9%有重复,5.5%有Dd。YAG包膜切开术的5 113 679只眼中,29.1%有重复,4.1%有Dd。AMD和DR分别有13.6%和12.7%的眼出现过渡错误。模型捕获了记录的眼睛首次实践与研究数据误差之间的关系,PFI显示f1损失平均= 0.230 (Dd模型),0.062(过渡误差模型)。结论大型医疗数据集中的数据重复,在分析重复操作或复发情况时需要谨慎。解决有问题的错误需要跨组织的利益相关者之间的透明度和沟通。在IRIS Registry中,结果表明了第一条记录的原始实践与数据错误之间的关联,为上游数据管理员提供了调查入口点。财务披露专有或商业披露可在本文末尾的脚注和披露中找到。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Ophthalmology science
Ophthalmology science Ophthalmology
CiteScore
3.40
自引率
0.00%
发文量
0
审稿时长
89 days
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信