Use of an ultrasound picture archiving and communication system to answer research questions: Description of data cleaning methods

Q3 Medicine

Australasian Journal of Ultrasound in Medicine Pub Date : 2024-01-13 DOI:10.1002/ajum.12374

Matthew K Moore, Gillian Whalley, Gregory T Jones, Sean Coffey

{"title":"Use of an ultrasound picture archiving and communication system to answer research questions: Description of data cleaning methods","authors":"Matthew K Moore, Gillian Whalley, Gregory T Jones, Sean Coffey","doi":"10.1002/ajum.12374","DOIUrl":null,"url":null,"abstract":"<div>\n \n \n <section>\n \n <h3> Introduction/Purpose</h3>\n \n <p>Ultrasound picture archiving and communication system (PACS) databases are useful for quality improvement and clinical research but frequently contain free text that is not easily readable. Here, we present a method to extract and clean a semi-structured echocardiography (cardiac ultrasound) PACS database.</p>\n </section>\n \n <section>\n \n <h3> Methods</h3>\n \n <p>Echocardiography studies between 1 January 2010 and 31 December 2018 were extracted using a data mining tool. Numeric variables were recoded with extreme values excluded. Analysis of free text, including descriptions of the heart valves and right and left ventricular size and function, was performed using a rule-based system. Different levels of free text variables were initially identified using commonly used phrases and then iteratively developed. Randomly selected sets of 100 studies were compared to the electronic health record to validate the data cleaning process.</p>\n </section>\n \n <section>\n \n <h3> Results</h3>\n \n <p>The data validation step was performed three times in total, with Cohen's kappa ranging between 0.88 and 1.00 for the final set of data validation across all measures.</p>\n </section>\n \n <section>\n \n <h3> Conclusion</h3>\n \n <p>Free text cleaning of semi-structured PACS databases is possible using freely available open-source software. The accuracy of this method is high, and the resulting dataset can be linked to administrative data to answer research questions. We present a method that could be used to answer clinical questions or to develop quality improvement initiatives.</p>\n </section>\n </div>","PeriodicalId":36517,"journal":{"name":"Australasian Journal of Ultrasound in Medicine","volume":"27 1","pages":"49-55"},"PeriodicalIF":0.0000,"publicationDate":"2024-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/ajum.12374","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Australasian Journal of Ultrasound in Medicine","FirstCategoryId":"1085","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/ajum.12374","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Medicine","Score":null,"Total":0}

引用次数: 0

Abstract

Introduction/Purpose

Ultrasound picture archiving and communication system (PACS) databases are useful for quality improvement and clinical research but frequently contain free text that is not easily readable. Here, we present a method to extract and clean a semi-structured echocardiography (cardiac ultrasound) PACS database.

Methods

Echocardiography studies between 1 January 2010 and 31 December 2018 were extracted using a data mining tool. Numeric variables were recoded with extreme values excluded. Analysis of free text, including descriptions of the heart valves and right and left ventricular size and function, was performed using a rule-based system. Different levels of free text variables were initially identified using commonly used phrases and then iteratively developed. Randomly selected sets of 100 studies were compared to the electronic health record to validate the data cleaning process.

Results

The data validation step was performed three times in total, with Cohen's kappa ranging between 0.88 and 1.00 for the final set of data validation across all measures.

Conclusion

Free text cleaning of semi-structured PACS databases is possible using freely available open-source software. The accuracy of this method is high, and the resulting dataset can be linked to administrative data to answer research questions. We present a method that could be used to answer clinical questions or to develop quality improvement initiatives.

Abstract Image

查看原文本刊更多论文

使用超声图片存档和通信系统回答研究问题：数据清理方法说明

超声图片存档和通信系统（PACS）数据库对质量改进和临床研究非常有用，但经常包含不易阅读的自由文本。在此，我们介绍一种提取和清理半结构化超声心动图（心脏超声）PACS 数据库的方法。我们使用数据挖掘工具提取了 2010 年 1 月 1 日至 2018 年 12 月 31 日期间的超声心动图研究。对数值变量进行了重新编码，并排除了极端值。使用基于规则的系统分析自由文本，包括对心脏瓣膜和左右心室大小及功能的描述。最初使用常用短语确定不同层次的自由文本变量，然后进行反复开发。随机抽取的 100 组研究报告与电子健康记录进行对比，以验证数据清理过程。数据验证步骤共进行了三次，在所有措施的最后一组数据验证中，科恩卡帕（Cohen's kappa）介于 0.88 和 1.00 之间。使用免费提供的开源软件可以对半结构化 PACS 数据库进行自由文本清理。这种方法的准确性很高，所得到的数据集可以与行政数据相连接，从而回答研究问题。我们介绍的方法可用于回答临床问题或制定质量改进计划。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Australasian Journal of Ultrasound in Medicine Medicine-Radiology, Nuclear Medicine and Imaging

CiteScore

1.90

自引率

0.00%

发文量