领导力级HPC系统上用于十亿像素整张幻灯片成像分析的可扩展流水线

2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2022-05-01 DOI:10.1109/IPDPSW55747.2022.00223

Sajal Dash, Benjamín Hernández, A. Tsaris, Folami T. Alamudun, Hong-Jun Yoon, Feiyi Wang

{"title":"领导力级HPC系统上用于十亿像素整张幻灯片成像分析的可扩展流水线","authors":"Sajal Dash, Benjamín Hernández, A. Tsaris, Folami T. Alamudun, Hong-Jun Yoon, Feiyi Wang","doi":"10.1109/IPDPSW55747.2022.00223","DOIUrl":null,"url":null,"abstract":"Whole Slide Imaging (WSI) captures microscopic details of a patient's histopathological features at multiple res-olutions organized across different levels. Images produced by WSI are gigapixel-sized, and saving a single image in memory requires a few gigabytes which is scarce since a complicated model occupies tens of gigabytes. Performing a simple met-ric operation on these large images is also expensive. High-performance computing (HPC) can help us quickly analyze such large images using distributed training of complex deep learning models. One popular approach in analyzing these images is to divide a WSI image into smaller tiles (patches) and then train a simpler model with these reduced-sized but large numbers of patches. However, we need to solve three pre-processing challenges efficiently for pursuing this patch-based approach. 1) Creating small patches from a high-resolution image can result in a high number (hundreds of thousands per image) of patches. Storing and processing these images can be challenging due to a large number of I/O and arithmetic operations. To reduce I/Oand memory accesses, an optimal balance between the size and number of patches must exist to reduce I/O and memory accesses. 2) WSI images may have tiny annotated regions for cancer tissue and a significant portion with normal and fatty tissues; correct patch sampling should avoid dataset imbalance. 3) storing and retrieving many patches to and from disk storage might incur I/O latency while training a deep learning model. An efficient distributed data loader should reduce I/O latency during the training and inference steps. This paper explores these three challenges and provides empirical and algorithmic solutions deployed on the Summit supercomputer hosted at the Oak Ridge Leadership Computing Facility.","PeriodicalId":286968,"journal":{"name":"2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"A Scalable Pipeline for Gigapixel Whole Slide Imaging Analysis on Leadership Class HPC Systems\",\"authors\":\"Sajal Dash, Benjamín Hernández, A. Tsaris, Folami T. Alamudun, Hong-Jun Yoon, Feiyi Wang\",\"doi\":\"10.1109/IPDPSW55747.2022.00223\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Whole Slide Imaging (WSI) captures microscopic details of a patient's histopathological features at multiple res-olutions organized across different levels. Images produced by WSI are gigapixel-sized, and saving a single image in memory requires a few gigabytes which is scarce since a complicated model occupies tens of gigabytes. Performing a simple met-ric operation on these large images is also expensive. High-performance computing (HPC) can help us quickly analyze such large images using distributed training of complex deep learning models. One popular approach in analyzing these images is to divide a WSI image into smaller tiles (patches) and then train a simpler model with these reduced-sized but large numbers of patches. However, we need to solve three pre-processing challenges efficiently for pursuing this patch-based approach. 1) Creating small patches from a high-resolution image can result in a high number (hundreds of thousands per image) of patches. Storing and processing these images can be challenging due to a large number of I/O and arithmetic operations. To reduce I/Oand memory accesses, an optimal balance between the size and number of patches must exist to reduce I/O and memory accesses. 2) WSI images may have tiny annotated regions for cancer tissue and a significant portion with normal and fatty tissues; correct patch sampling should avoid dataset imbalance. 3) storing and retrieving many patches to and from disk storage might incur I/O latency while training a deep learning model. An efficient distributed data loader should reduce I/O latency during the training and inference steps. This paper explores these three challenges and provides empirical and algorithmic solutions deployed on the Summit supercomputer hosted at the Oak Ridge Leadership Computing Facility.\",\"PeriodicalId\":286968,\"journal\":{\"name\":\"2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)\",\"volume\":\"19 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IPDPSW55747.2022.00223\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPSW55747.2022.00223","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

全玻片成像(WSI)在不同水平组织的多个分辨率上捕获患者组织病理特征的微观细节。WSI生成的图像是千兆像素大小的，将单个图像保存在内存中需要几gb，这是稀缺的，因为复杂的模型占用数十gb。对这些大图像执行简单的度量操作也很昂贵。高性能计算(HPC)可以通过对复杂深度学习模型的分布式训练，帮助我们快速分析这种大型图像。分析这些图像的一种流行方法是将WSI图像分成更小的块(patch)，然后用这些尺寸减小但数量较多的块训练一个更简单的模型。然而，为了实现这种基于补丁的方法，我们需要有效地解决三个预处理挑战。1)从高分辨率图像中创建小补丁可能导致大量(每张图像数十万)补丁。由于大量的I/O和算术运算，存储和处理这些映像可能具有挑战性。为了减少I/O和内存访问，必须在补丁大小和数量之间达到最佳平衡，以减少I/O和内存访问。2) WSI图像可能有微小的癌组织注释区域，而正常组织和脂肪组织的注释区域占很大一部分;正确的补丁采样应该避免数据集不平衡。3)在训练深度学习模型时，在磁盘存储中存储和检索许多补丁可能会导致I/O延迟。高效的分布式数据加载器应该减少训练和推理步骤中的I/O延迟。本文探讨了这三个挑战，并提供了部署在橡树岭领导计算设施的Summit超级计算机上的经验和算法解决方案。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A Scalable Pipeline for Gigapixel Whole Slide Imaging Analysis on Leadership Class HPC Systems

Whole Slide Imaging (WSI) captures microscopic details of a patient's histopathological features at multiple res-olutions organized across different levels. Images produced by WSI are gigapixel-sized, and saving a single image in memory requires a few gigabytes which is scarce since a complicated model occupies tens of gigabytes. Performing a simple met-ric operation on these large images is also expensive. High-performance computing (HPC) can help us quickly analyze such large images using distributed training of complex deep learning models. One popular approach in analyzing these images is to divide a WSI image into smaller tiles (patches) and then train a simpler model with these reduced-sized but large numbers of patches. However, we need to solve three pre-processing challenges efficiently for pursuing this patch-based approach. 1) Creating small patches from a high-resolution image can result in a high number (hundreds of thousands per image) of patches. Storing and processing these images can be challenging due to a large number of I/O and arithmetic operations. To reduce I/Oand memory accesses, an optimal balance between the size and number of patches must exist to reduce I/O and memory accesses. 2) WSI images may have tiny annotated regions for cancer tissue and a significant portion with normal and fatty tissues; correct patch sampling should avoid dataset imbalance. 3) storing and retrieving many patches to and from disk storage might incur I/O latency while training a deep learning model. An efficient distributed data loader should reduce I/O latency during the training and inference steps. This paper explores these three challenges and provides empirical and algorithmic solutions deployed on the Summit supercomputer hosted at the Oak Ridge Leadership Computing Facility.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

自引率

0.00%

发文量