Jenna M Schabdach, Remo M S Williams, Joseph Logan, Viveknarayanan Padmanabhan, Russell D'Aiello Iii, Johnny Mclaughlin, Alexander Gonzalez, Edward M Krause, Gregory E Tasian, Susan Sotardi, Aaron F Alexander-Bloch
{"title":"From Scanner to Science: Reusing Clinically Acquired Medical Images for Research.","authors":"Jenna M Schabdach, Remo M S Williams, Joseph Logan, Viveknarayanan Padmanabhan, Russell D'Aiello Iii, Johnny Mclaughlin, Alexander Gonzalez, Edward M Krause, Gregory E Tasian, Susan Sotardi, Aaron F Alexander-Bloch","doi":"","DOIUrl":null,"url":null,"abstract":"<p><p>Growth in the field of medical imaging research has revealed a need for larger volume and variety in available data. This need could be met using curated clinically acquired data, but the process for getting this data from the scanners to the scientists is complex and lengthy. We present a manifest-driven modular Extract, Transform, and Load (ETL) process named Locutus designed to appropriately handle difficulties present in the process of reusing clinically acquired medical imaging data. The design of Locutus was based on four foundational assumptions about medical data, research data, and communication. All parts of a workflow must communicate with each other and be adaptable to unique data delivery requests. In addition, the workflow must be robust to possible errors and uncertainties in clinically-acquired data, which may require human intervention to resolve. With these assumptions in mind,Locutus presents a five-phase workflow for downloading, deidentifying, and delivering unique requests for imaging data. The phases include initialization, data preparation, extraction of data from the research server to a pre-deidentification data warehouse, transformation into deidentified space, and loading into post-deidentification data warehouse. To date, this workflow has been used to process 32,962 imaging accessions for research use. This number is expected to grow as technical challenges are addressed and the role of humans is expected to shift from frequent intervention to regular monitoring.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":"2025 ","pages":"471-480"},"PeriodicalIF":0.0000,"publicationDate":"2025-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12150695/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","FirstCategoryId":"1085","ListUrlMain":"","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Growth in the field of medical imaging research has revealed a need for larger volume and variety in available data. This need could be met using curated clinically acquired data, but the process for getting this data from the scanners to the scientists is complex and lengthy. We present a manifest-driven modular Extract, Transform, and Load (ETL) process named Locutus designed to appropriately handle difficulties present in the process of reusing clinically acquired medical imaging data. The design of Locutus was based on four foundational assumptions about medical data, research data, and communication. All parts of a workflow must communicate with each other and be adaptable to unique data delivery requests. In addition, the workflow must be robust to possible errors and uncertainties in clinically-acquired data, which may require human intervention to resolve. With these assumptions in mind,Locutus presents a five-phase workflow for downloading, deidentifying, and delivering unique requests for imaging data. The phases include initialization, data preparation, extraction of data from the research server to a pre-deidentification data warehouse, transformation into deidentified space, and loading into post-deidentification data warehouse. To date, this workflow has been used to process 32,962 imaging accessions for research use. This number is expected to grow as technical challenges are addressed and the role of humans is expected to shift from frequent intervention to regular monitoring.