{"title":"多中心成像研究:评估数据变异性和异常值作用的自动化方法","authors":"M. Bento, R. Souza, R. Frayne","doi":"10.1109/SIBGRAPI.2018.00030","DOIUrl":null,"url":null,"abstract":"Magnetic resonance (MR) as well as other imaging modalities have been used in a large number of clinical and research studies for the analysis and quantification of important structures and the detection of abnormalities. In this context, machine learning is playing an increasingly important role in the development of automated tools for aiding in image quantification, patient diagnosis and follow-up. Normally, these techniques require large, heterogeneous datasets to provide accurate and generalizable results. Large, multi-center studies, for example, can provide such data. Images acquired at different centers, however, can present varying characteristics due to differences in acquisition parameters, site procedures and scanners configuration. While variability in the dataset is required to develop robust, generalizable studies (i.e., independent of the acquisition parameters or center), like all studies there is also a need to ensure overall data quality by prospectively identifying and removing poor-quality data samples that should not be included, e.g., outliers. We wish to keep image samples that are representative of the underlying population (so called inliers), yet removing those samples that are not. We propose a framework to analyze data variability and identify samples that should be removed in order to have more representative, reliable and robust datasets. Our example case study is based on a public dataset containing T1-weighted volumetric head images data acquired at six different centers, using three different scanner vendors and at two commonly used magnetic fields strengths. We propose an algorithm for assessing data robustness and finding the optimal data for study occlusion (i.e., the data size that presents with lowest variability while maintaining generalizability (i.e., using samples from all sites)).","PeriodicalId":208985,"journal":{"name":"2018 31st SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Multicenter Imaging Studies: Automated Approach to Evaluating Data Variability and the Role of Outliers\",\"authors\":\"M. Bento, R. Souza, R. Frayne\",\"doi\":\"10.1109/SIBGRAPI.2018.00030\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Magnetic resonance (MR) as well as other imaging modalities have been used in a large number of clinical and research studies for the analysis and quantification of important structures and the detection of abnormalities. In this context, machine learning is playing an increasingly important role in the development of automated tools for aiding in image quantification, patient diagnosis and follow-up. Normally, these techniques require large, heterogeneous datasets to provide accurate and generalizable results. Large, multi-center studies, for example, can provide such data. Images acquired at different centers, however, can present varying characteristics due to differences in acquisition parameters, site procedures and scanners configuration. While variability in the dataset is required to develop robust, generalizable studies (i.e., independent of the acquisition parameters or center), like all studies there is also a need to ensure overall data quality by prospectively identifying and removing poor-quality data samples that should not be included, e.g., outliers. We wish to keep image samples that are representative of the underlying population (so called inliers), yet removing those samples that are not. We propose a framework to analyze data variability and identify samples that should be removed in order to have more representative, reliable and robust datasets. Our example case study is based on a public dataset containing T1-weighted volumetric head images data acquired at six different centers, using three different scanner vendors and at two commonly used magnetic fields strengths. We propose an algorithm for assessing data robustness and finding the optimal data for study occlusion (i.e., the data size that presents with lowest variability while maintaining generalizability (i.e., using samples from all sites)).\",\"PeriodicalId\":208985,\"journal\":{\"name\":\"2018 31st SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI)\",\"volume\":\"5 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 31st SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SIBGRAPI.2018.00030\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 31st SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SIBGRAPI.2018.00030","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Multicenter Imaging Studies: Automated Approach to Evaluating Data Variability and the Role of Outliers
Magnetic resonance (MR) as well as other imaging modalities have been used in a large number of clinical and research studies for the analysis and quantification of important structures and the detection of abnormalities. In this context, machine learning is playing an increasingly important role in the development of automated tools for aiding in image quantification, patient diagnosis and follow-up. Normally, these techniques require large, heterogeneous datasets to provide accurate and generalizable results. Large, multi-center studies, for example, can provide such data. Images acquired at different centers, however, can present varying characteristics due to differences in acquisition parameters, site procedures and scanners configuration. While variability in the dataset is required to develop robust, generalizable studies (i.e., independent of the acquisition parameters or center), like all studies there is also a need to ensure overall data quality by prospectively identifying and removing poor-quality data samples that should not be included, e.g., outliers. We wish to keep image samples that are representative of the underlying population (so called inliers), yet removing those samples that are not. We propose a framework to analyze data variability and identify samples that should be removed in order to have more representative, reliable and robust datasets. Our example case study is based on a public dataset containing T1-weighted volumetric head images data acquired at six different centers, using three different scanner vendors and at two commonly used magnetic fields strengths. We propose an algorithm for assessing data robustness and finding the optimal data for study occlusion (i.e., the data size that presents with lowest variability while maintaining generalizability (i.e., using samples from all sites)).