Reproducibility and Performance of Deep Learning Applications for Cancer Detection in Pathological Images

2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID) Pub Date : 2019-05-01 DOI:10.1109/CCGRID.2019.00080

Christoph Jansen, Bruno Schilling, K. Strohmenger, Michael Witt, Jonas Annuscheit, D. Krefting

{"title":"Reproducibility and Performance of Deep Learning Applications for Cancer Detection in Pathological Images","authors":"Christoph Jansen, Bruno Schilling, K. Strohmenger, Michael Witt, Jonas Annuscheit, D. Krefting","doi":"10.1109/CCGRID.2019.00080","DOIUrl":null,"url":null,"abstract":"Convolutional Neural Networks (CNN) are used for automatic cancer detection in pathological images. These data-driven experiments are difficult to reproduce, because the CNNs may require CUDA-enabled Nvidia GPUs for acceleration and training is often performed on a large dataset stored on a researcher's computer, inaccessible to others. We introduce the RED file format for reproducible experiment description, where executable programs are packaged and referenced as Docker container images. Data inputs and outputs are described as network resources using standard transmission and authentication protocols instead of local file paths. Following the FAIR guiding principles, the RED format is based on and compatible with the established Common Workflow Language specification. RED files are interpreted by the accompanying Curious Containers (CC) software. Arbitrarily large datasets are mounted inside containers via FUSE network filesystems like SSHFS. SSHFS is compared to NFS and a local SSD in artificial benchmarks and in the context of a CNN training scenario, where SSHFS introduces a performance decrease by a factor of 1.8. We are convinced that RED can greatly improve the reproducibility of deep learning workloads and data-driven experiments. This is in particular important in clinical scenarios where the result of an analysis may contribute to a patient's treatment.","PeriodicalId":234571,"journal":{"name":"2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)","volume":"114 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCGRID.2019.00080","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

Convolutional Neural Networks (CNN) are used for automatic cancer detection in pathological images. These data-driven experiments are difficult to reproduce, because the CNNs may require CUDA-enabled Nvidia GPUs for acceleration and training is often performed on a large dataset stored on a researcher's computer, inaccessible to others. We introduce the RED file format for reproducible experiment description, where executable programs are packaged and referenced as Docker container images. Data inputs and outputs are described as network resources using standard transmission and authentication protocols instead of local file paths. Following the FAIR guiding principles, the RED format is based on and compatible with the established Common Workflow Language specification. RED files are interpreted by the accompanying Curious Containers (CC) software. Arbitrarily large datasets are mounted inside containers via FUSE network filesystems like SSHFS. SSHFS is compared to NFS and a local SSD in artificial benchmarks and in the context of a CNN training scenario, where SSHFS introduces a performance decrease by a factor of 1.8. We are convinced that RED can greatly improve the reproducibility of deep learning workloads and data-driven experiments. This is in particular important in clinical scenarios where the result of an analysis may contribute to a patient's treatment.

查看原文本刊更多论文

病理图像中癌症检测的深度学习应用的再现性和性能

卷积神经网络(CNN)用于病理图像的自动癌症检测。这些数据驱动的实验很难重现，因为cnn可能需要支持cuda的Nvidia gpu来加速，而且训练通常是在存储在研究人员计算机上的大型数据集上进行的，其他人无法访问。我们引入RED文件格式用于可重复的实验描述，其中可执行程序被打包并引用为Docker容器映像。数据输入和输出被描述为使用标准传输和认证协议而不是本地文件路径的网络资源。遵循FAIR指导原则，RED格式基于并兼容已建立的通用工作流语言规范。RED文件由随附的好奇容器(CC)软件进行解释。任意大的数据集通过FUSE网络文件系统(如SSHFS)挂载到容器中。在人工基准测试和CNN训练场景中，将SSHFS与NFS和本地SSD进行比较，其中SSHFS会导致1.8倍的性能下降。我们相信RED可以极大地提高深度学习工作负载和数据驱动实验的可重复性。这在临床情况下尤其重要，因为分析结果可能有助于患者的治疗。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)

自引率

0.00%

发文量