Use of an ultrasound picture archiving and communication system to answer research questions: Description of data cleaning methods

Q3 Medicine
Matthew K Moore, Gillian Whalley, Gregory T Jones, Sean Coffey
{"title":"Use of an ultrasound picture archiving and communication system to answer research questions: Description of data cleaning methods","authors":"Matthew K Moore,&nbsp;Gillian Whalley,&nbsp;Gregory T Jones,&nbsp;Sean Coffey","doi":"10.1002/ajum.12374","DOIUrl":null,"url":null,"abstract":"<div>\n \n \n <section>\n \n <h3> Introduction/Purpose</h3>\n \n <p>Ultrasound picture archiving and communication system (PACS) databases are useful for quality improvement and clinical research but frequently contain free text that is not easily readable. Here, we present a method to extract and clean a semi-structured echocardiography (cardiac ultrasound) PACS database.</p>\n </section>\n \n <section>\n \n <h3> Methods</h3>\n \n <p>Echocardiography studies between 1 January 2010 and 31 December 2018 were extracted using a data mining tool. Numeric variables were recoded with extreme values excluded. Analysis of free text, including descriptions of the heart valves and right and left ventricular size and function, was performed using a rule-based system. Different levels of free text variables were initially identified using commonly used phrases and then iteratively developed. Randomly selected sets of 100 studies were compared to the electronic health record to validate the data cleaning process.</p>\n </section>\n \n <section>\n \n <h3> Results</h3>\n \n <p>The data validation step was performed three times in total, with Cohen's kappa ranging between 0.88 and 1.00 for the final set of data validation across all measures.</p>\n </section>\n \n <section>\n \n <h3> Conclusion</h3>\n \n <p>Free text cleaning of semi-structured PACS databases is possible using freely available open-source software. The accuracy of this method is high, and the resulting dataset can be linked to administrative data to answer research questions. We present a method that could be used to answer clinical questions or to develop quality improvement initiatives.</p>\n </section>\n </div>","PeriodicalId":36517,"journal":{"name":"Australasian Journal of Ultrasound in Medicine","volume":"27 1","pages":"49-55"},"PeriodicalIF":0.0000,"publicationDate":"2024-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/ajum.12374","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Australasian Journal of Ultrasound in Medicine","FirstCategoryId":"1085","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/ajum.12374","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Medicine","Score":null,"Total":0}
引用次数: 0

Abstract

Introduction/Purpose

Ultrasound picture archiving and communication system (PACS) databases are useful for quality improvement and clinical research but frequently contain free text that is not easily readable. Here, we present a method to extract and clean a semi-structured echocardiography (cardiac ultrasound) PACS database.

Methods

Echocardiography studies between 1 January 2010 and 31 December 2018 were extracted using a data mining tool. Numeric variables were recoded with extreme values excluded. Analysis of free text, including descriptions of the heart valves and right and left ventricular size and function, was performed using a rule-based system. Different levels of free text variables were initially identified using commonly used phrases and then iteratively developed. Randomly selected sets of 100 studies were compared to the electronic health record to validate the data cleaning process.

Results

The data validation step was performed three times in total, with Cohen's kappa ranging between 0.88 and 1.00 for the final set of data validation across all measures.

Conclusion

Free text cleaning of semi-structured PACS databases is possible using freely available open-source software. The accuracy of this method is high, and the resulting dataset can be linked to administrative data to answer research questions. We present a method that could be used to answer clinical questions or to develop quality improvement initiatives.

Abstract Image

使用超声图片存档和通信系统回答研究问题:数据清理方法说明
超声图片存档和通信系统(PACS)数据库对质量改进和临床研究非常有用,但经常包含不易阅读的自由文本。在此,我们介绍一种提取和清理半结构化超声心动图(心脏超声)PACS 数据库的方法。我们使用数据挖掘工具提取了 2010 年 1 月 1 日至 2018 年 12 月 31 日期间的超声心动图研究。对数值变量进行了重新编码,并排除了极端值。使用基于规则的系统分析自由文本,包括对心脏瓣膜和左右心室大小及功能的描述。最初使用常用短语确定不同层次的自由文本变量,然后进行反复开发。随机抽取的 100 组研究报告与电子健康记录进行对比,以验证数据清理过程。数据验证步骤共进行了三次,在所有措施的最后一组数据验证中,科恩卡帕(Cohen's kappa)介于 0.88 和 1.00 之间。使用免费提供的开源软件可以对半结构化 PACS 数据库进行自由文本清理。这种方法的准确性很高,所得到的数据集可以与行政数据相连接,从而回答研究问题。我们介绍的方法可用于回答临床问题或制定质量改进计划。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Australasian Journal of Ultrasound in Medicine
Australasian Journal of Ultrasound in Medicine Medicine-Radiology, Nuclear Medicine and Imaging
CiteScore
1.90
自引率
0.00%
发文量
40
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信