DANCE: DAta-Network Co-optimization for Efficient Segmentation Model Training and Inference

ACM Transactions on Design Automation of Electronic Systems (TODAES) Pub Date : 2021-07-16 DOI:10.1145/3510835

Chaojian Li, Wuyang Chen, Yuchen Gu, Tianlong Chen, Yonggan Fu, Zhangyang Wang, Yingyan Lin

{"title":"DANCE: DAta-Network Co-optimization for Efficient Segmentation Model Training and Inference","authors":"Chaojian Li, Wuyang Chen, Yuchen Gu, Tianlong Chen, Yonggan Fu, Zhangyang Wang, Yingyan Lin","doi":"10.1145/3510835","DOIUrl":null,"url":null,"abstract":"Semantic segmentation for scene understanding is nowadays widely demanded, raising significant challenges for the algorithm efficiency, especially its applications on resource-limited platforms. Current segmentation models are trained and evaluated on massive high-resolution scene images (“data-level”) and suffer from the expensive computation arising from the required multi-scale aggregation (“network level”). In both folds, the computational and energy costs in training and inference are notable due to the often desired large input resolutions and heavy computational burden of segmentation models. To this end, we propose DANCE, general automated DAta-Network Co-optimization for Efficient segmentation model training and inference. Distinct from existing efficient segmentation approaches that focus merely on light-weight network design, DANCE distinguishes itself as an automated simultaneous data-network co-optimization via both input data manipulation and network architecture slimming. Specifically, DANCE integrates automated data slimming which adaptively downsamples/drops input images and controls their corresponding contribution to the training loss guided by the images’ spatial complexity. Such a downsampling operation, in addition to slimming down the cost associated with the input size directly, also shrinks the dynamic range of input object and context scales, therefore motivating us to also adaptively slim the network to match the downsampled data. Extensive experiments and ablating studies (on four SOTA segmentation models with three popular segmentation datasets under two training settings) demonstrate that DANCE can achieve “all-win” towards efficient segmentation (reduced training cost, less expensive inference, and better mean Intersection-over-Union (mIoU)). Specifically, DANCE can reduce ↓25%–↓77% energy consumption in training, ↓31%–↓56% in inference, while boosting the mIoU by ↓0.71%–↑ 13.34%.","PeriodicalId":6933,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems (TODAES)","volume":"59 1","pages":"1 - 20"},"PeriodicalIF":0.0000,"publicationDate":"2021-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Design Automation of Electronic Systems (TODAES)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3510835","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Semantic segmentation for scene understanding is nowadays widely demanded, raising significant challenges for the algorithm efficiency, especially its applications on resource-limited platforms. Current segmentation models are trained and evaluated on massive high-resolution scene images (“data-level”) and suffer from the expensive computation arising from the required multi-scale aggregation (“network level”). In both folds, the computational and energy costs in training and inference are notable due to the often desired large input resolutions and heavy computational burden of segmentation models. To this end, we propose DANCE, general automated DAta-Network Co-optimization for Efficient segmentation model training and inference. Distinct from existing efficient segmentation approaches that focus merely on light-weight network design, DANCE distinguishes itself as an automated simultaneous data-network co-optimization via both input data manipulation and network architecture slimming. Specifically, DANCE integrates automated data slimming which adaptively downsamples/drops input images and controls their corresponding contribution to the training loss guided by the images’ spatial complexity. Such a downsampling operation, in addition to slimming down the cost associated with the input size directly, also shrinks the dynamic range of input object and context scales, therefore motivating us to also adaptively slim the network to match the downsampled data. Extensive experiments and ablating studies (on four SOTA segmentation models with three popular segmentation datasets under two training settings) demonstrate that DANCE can achieve “all-win” towards efficient segmentation (reduced training cost, less expensive inference, and better mean Intersection-over-Union (mIoU)). Specifically, DANCE can reduce ↓25%–↓77% energy consumption in training, ↓31%–↓56% in inference, while boosting the mIoU by ↓0.71%–↑ 13.34%.

查看原文本刊更多论文

DANCE:有效分割模型训练和推理的数据-网络协同优化

面向场景理解的语义分割是目前应用广泛的需求，这对算法效率提出了重大挑战，特别是在资源有限的平台上的应用。目前的分割模型是在大量高分辨率场景图像(“数据级”)上进行训练和评估的，并且由于所需的多尺度聚合(“网络级”)而产生昂贵的计算。在这两个折叠中，由于通常需要大的输入分辨率和分割模型的沉重计算负担，训练和推理的计算和能量成本都是显著的。为此，我们提出了DANCE，一种通用的自动化数据网络协同优化，用于有效的分割模型训练和推理。与现有的仅关注轻量级网络设计的高效分割方法不同，DANCE通过输入数据操作和网络架构精简，将自己区分为自动同步数据网络协同优化。具体来说，DANCE集成了自动数据瘦身，它可以自适应地降采样/降输入图像，并根据图像的空间复杂度控制它们对训练损失的相应贡献。这样的降采样操作，除了直接减少与输入大小相关的成本外，还缩小了输入对象和上下文尺度的动态范围，因此激励我们也自适应地缩小网络以匹配降采样数据。大量的实验和研究(在三种流行的分割数据集和两种训练设置下的四种SOTA分割模型上)表明，DANCE可以实现高效分割的“多赢”(降低训练成本，降低推理成本，更好的平均交叉-联合(mIoU))。具体来说，DANCE可以在训练中降低↓25% -↓77%的能量消耗，在推理中降低↓31% -↓56%的能量消耗，同时将mIoU提高↓0.71% -↑13.34%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ACM Transactions on Design Automation of Electronic Systems (TODAES)

自引率

0.00%

发文量