加速深度CNN推理的增量和近似计算

ACM Transactions on Database Systems (TODS) Pub Date : 2020-12-06 DOI:10.1145/3397461

Supun Nakandala, Kabir Nagrecha, Arun Kumar, Y. Papakonstantinou

{"title":"加速深度CNN推理的增量和近似计算","authors":"Supun Nakandala, Kabir Nagrecha, Arun Kumar, Y. Papakonstantinou","doi":"10.1145/3397461","DOIUrl":null,"url":null,"abstract":"Deep learning now offers state-of-the-art accuracy for many prediction tasks. A form of deep learning called deep convolutional neural networks (CNNs) are especially popular on image, video, and time series data. Due to its high computational cost, CNN inference is often a bottleneck in analytics tasks on such data. Thus, a lot of work in the computer architecture, systems, and compilers communities study how to make CNN inference faster. In this work, we show that by elevating the abstraction level and re-imagining CNN inference as queries, we can bring to bear database-style query optimization techniques to improve CNN inference efficiency. We focus on tasks that perform CNN inference repeatedly on inputs that are only slightly different. We identify two popular CNN tasks with this behavior: occlusion-based explanations (OBE) and object recognition in videos (ORV). OBE is a popular method for “explaining” CNN predictions. It outputs a heatmap over the input to show which regions (e.g., image pixels) mattered most for a given prediction. It leads to many re-inference requests on locally modified inputs. ORV uses CNNs to identify and track objects across video frames. It also leads to many re-inference requests. We cast such tasks in a unified manner as a novel instance of the incremental view maintenance problem and create a comprehensive algebraic framework for incremental CNN inference that reduces computational costs. We produce materialized views of features produced inside a CNN and connect them with a novel multi-query optimization scheme for CNN re-inference. Finally, we also devise novel OBE-specific and ORV-specific approximate inference optimizations exploiting their semantics. We prototype our ideas in Python to create a tool called Krypton that supports both CPUs and GPUs. Experiments with real data and CNNs show that Krypton reduces runtimes by up to 5× (respectively, 35×) to produce exact (respectively, high-quality approximate) results without raising resource requirements.","PeriodicalId":6983,"journal":{"name":"ACM Transactions on Database Systems (TODS)","volume":"9 1","pages":"1 - 42"},"PeriodicalIF":0.0000,"publicationDate":"2020-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"18","resultStr":"{\"title\":\"Incremental and Approximate Computations for Accelerating Deep CNN Inference\",\"authors\":\"Supun Nakandala, Kabir Nagrecha, Arun Kumar, Y. Papakonstantinou\",\"doi\":\"10.1145/3397461\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Deep learning now offers state-of-the-art accuracy for many prediction tasks. A form of deep learning called deep convolutional neural networks (CNNs) are especially popular on image, video, and time series data. Due to its high computational cost, CNN inference is often a bottleneck in analytics tasks on such data. Thus, a lot of work in the computer architecture, systems, and compilers communities study how to make CNN inference faster. In this work, we show that by elevating the abstraction level and re-imagining CNN inference as queries, we can bring to bear database-style query optimization techniques to improve CNN inference efficiency. We focus on tasks that perform CNN inference repeatedly on inputs that are only slightly different. We identify two popular CNN tasks with this behavior: occlusion-based explanations (OBE) and object recognition in videos (ORV). OBE is a popular method for “explaining” CNN predictions. It outputs a heatmap over the input to show which regions (e.g., image pixels) mattered most for a given prediction. It leads to many re-inference requests on locally modified inputs. ORV uses CNNs to identify and track objects across video frames. It also leads to many re-inference requests. We cast such tasks in a unified manner as a novel instance of the incremental view maintenance problem and create a comprehensive algebraic framework for incremental CNN inference that reduces computational costs. We produce materialized views of features produced inside a CNN and connect them with a novel multi-query optimization scheme for CNN re-inference. Finally, we also devise novel OBE-specific and ORV-specific approximate inference optimizations exploiting their semantics. We prototype our ideas in Python to create a tool called Krypton that supports both CPUs and GPUs. Experiments with real data and CNNs show that Krypton reduces runtimes by up to 5× (respectively, 35×) to produce exact (respectively, high-quality approximate) results without raising resource requirements.\",\"PeriodicalId\":6983,\"journal\":{\"name\":\"ACM Transactions on Database Systems (TODS)\",\"volume\":\"9 1\",\"pages\":\"1 - 42\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-12-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"18\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Transactions on Database Systems (TODS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3397461\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Database Systems (TODS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3397461","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 18

摘要

深度学习现在为许多预测任务提供了最先进的准确性。深度学习的一种形式被称为深度卷积神经网络(cnn)，在图像、视频和时间序列数据上尤其流行。由于计算成本高，CNN推理往往是这类数据分析任务的瓶颈。因此，计算机体系结构、系统和编译器社区的许多工作都在研究如何使CNN推理更快。在这项工作中，我们表明通过提升抽象级别并将CNN推理重新想象为查询，我们可以采用数据库风格的查询优化技术来提高CNN推理效率。我们关注的任务是对只有轻微不同的输入重复执行CNN推理。我们用这种行为确定了两个流行的CNN任务:基于遮挡的解释(OBE)和视频中的对象识别(ORV)。出窍是“解释”CNN预测的一种流行方法。它在输入上输出一个热图，以显示哪些区域(例如，图像像素)对给定的预测最重要。它会导致对本地修改输入的许多重新推理请求。ORV使用cnn来识别和跟踪视频帧中的物体。它还会导致许多重新推理请求。我们以统一的方式将这些任务作为增量视图维护问题的新实例，并为增量CNN推理创建了一个全面的代数框架，从而降低了计算成本。我们生成了CNN内部产生的特征的物化视图，并将它们与一种新的CNN再推理的多查询优化方案联系起来。最后，我们还利用它们的语义设计了新的特定于obe和特定于orv的近似推理优化。我们在Python中创建了一个名为Krypton的工具，该工具同时支持cpu和gpu。对真实数据和cnn的实验表明，在不增加资源需求的情况下，氪将运行时间减少了高达5倍(分别为35倍)，以产生精确(分别为高质量近似)的结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Incremental and Approximate Computations for Accelerating Deep CNN Inference

Deep learning now offers state-of-the-art accuracy for many prediction tasks. A form of deep learning called deep convolutional neural networks (CNNs) are especially popular on image, video, and time series data. Due to its high computational cost, CNN inference is often a bottleneck in analytics tasks on such data. Thus, a lot of work in the computer architecture, systems, and compilers communities study how to make CNN inference faster. In this work, we show that by elevating the abstraction level and re-imagining CNN inference as queries, we can bring to bear database-style query optimization techniques to improve CNN inference efficiency. We focus on tasks that perform CNN inference repeatedly on inputs that are only slightly different. We identify two popular CNN tasks with this behavior: occlusion-based explanations (OBE) and object recognition in videos (ORV). OBE is a popular method for “explaining” CNN predictions. It outputs a heatmap over the input to show which regions (e.g., image pixels) mattered most for a given prediction. It leads to many re-inference requests on locally modified inputs. ORV uses CNNs to identify and track objects across video frames. It also leads to many re-inference requests. We cast such tasks in a unified manner as a novel instance of the incremental view maintenance problem and create a comprehensive algebraic framework for incremental CNN inference that reduces computational costs. We produce materialized views of features produced inside a CNN and connect them with a novel multi-query optimization scheme for CNN re-inference. Finally, we also devise novel OBE-specific and ORV-specific approximate inference optimizations exploiting their semantics. We prototype our ideas in Python to create a tool called Krypton that supports both CPUs and GPUs. Experiments with real data and CNNs show that Krypton reduces runtimes by up to 5× (respectively, 35×) to produce exact (respectively, high-quality approximate) results without raising resource requirements.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ACM Transactions on Database Systems (TODS)

自引率

0.00%

发文量