Multicenter Imaging Studies: Automated Approach to Evaluating Data Variability and the Role of Outliers

M. Bento, R. Souza, R. Frayne
{"title":"Multicenter Imaging Studies: Automated Approach to Evaluating Data Variability and the Role of Outliers","authors":"M. Bento, R. Souza, R. Frayne","doi":"10.1109/SIBGRAPI.2018.00030","DOIUrl":null,"url":null,"abstract":"Magnetic resonance (MR) as well as other imaging modalities have been used in a large number of clinical and research studies for the analysis and quantification of important structures and the detection of abnormalities. In this context, machine learning is playing an increasingly important role in the development of automated tools for aiding in image quantification, patient diagnosis and follow-up. Normally, these techniques require large, heterogeneous datasets to provide accurate and generalizable results. Large, multi-center studies, for example, can provide such data. Images acquired at different centers, however, can present varying characteristics due to differences in acquisition parameters, site procedures and scanners configuration. While variability in the dataset is required to develop robust, generalizable studies (i.e., independent of the acquisition parameters or center), like all studies there is also a need to ensure overall data quality by prospectively identifying and removing poor-quality data samples that should not be included, e.g., outliers. We wish to keep image samples that are representative of the underlying population (so called inliers), yet removing those samples that are not. We propose a framework to analyze data variability and identify samples that should be removed in order to have more representative, reliable and robust datasets. Our example case study is based on a public dataset containing T1-weighted volumetric head images data acquired at six different centers, using three different scanner vendors and at two commonly used magnetic fields strengths. We propose an algorithm for assessing data robustness and finding the optimal data for study occlusion (i.e., the data size that presents with lowest variability while maintaining generalizability (i.e., using samples from all sites)).","PeriodicalId":208985,"journal":{"name":"2018 31st SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 31st SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SIBGRAPI.2018.00030","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

Magnetic resonance (MR) as well as other imaging modalities have been used in a large number of clinical and research studies for the analysis and quantification of important structures and the detection of abnormalities. In this context, machine learning is playing an increasingly important role in the development of automated tools for aiding in image quantification, patient diagnosis and follow-up. Normally, these techniques require large, heterogeneous datasets to provide accurate and generalizable results. Large, multi-center studies, for example, can provide such data. Images acquired at different centers, however, can present varying characteristics due to differences in acquisition parameters, site procedures and scanners configuration. While variability in the dataset is required to develop robust, generalizable studies (i.e., independent of the acquisition parameters or center), like all studies there is also a need to ensure overall data quality by prospectively identifying and removing poor-quality data samples that should not be included, e.g., outliers. We wish to keep image samples that are representative of the underlying population (so called inliers), yet removing those samples that are not. We propose a framework to analyze data variability and identify samples that should be removed in order to have more representative, reliable and robust datasets. Our example case study is based on a public dataset containing T1-weighted volumetric head images data acquired at six different centers, using three different scanner vendors and at two commonly used magnetic fields strengths. We propose an algorithm for assessing data robustness and finding the optimal data for study occlusion (i.e., the data size that presents with lowest variability while maintaining generalizability (i.e., using samples from all sites)).
多中心成像研究:评估数据变异性和异常值作用的自动化方法
磁共振(MR)以及其他成像方式已在大量的临床和研究中用于重要结构的分析和量化以及异常的检测。在这种情况下,机器学习在辅助图像量化、患者诊断和随访的自动化工具的开发中发挥着越来越重要的作用。通常,这些技术需要大型异构数据集来提供准确和可推广的结果。例如,大型、多中心的研究可以提供这样的数据。然而,由于采集参数、现场程序和扫描仪配置的差异,在不同中心获取的图像可能呈现不同的特征。虽然需要数据集中的可变性来开展稳健的、可推广的研究(即独立于采集参数或中心),但与所有研究一样,还需要通过前瞻性地识别和删除不应包括的低质量数据样本(例如异常值)来确保整体数据质量。我们希望保留代表底层总体的图像样本(即所谓的内线),同时删除那些不代表底层总体的样本。我们提出了一个框架来分析数据变异性,并确定应该删除的样本,以获得更具代表性,可靠和健壮的数据集。我们的示例案例研究基于一个公共数据集,该数据集包含从六个不同的中心获取的t1加权体积头部图像数据,使用三个不同的扫描仪供应商和两种常用的磁场强度。我们提出了一种算法,用于评估数据稳健性和寻找研究遮挡的最佳数据(即,在保持通用性的同时呈现最低变异性的数据大小(即使用来自所有站点的样本))。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信