Comparing Offline and Online Testing of Deep Neural Networks: An Autonomous Car Case Study

2020 IEEE 13th International Conference on Software Testing, Validation and Verification (ICST) Pub Date : 2019-11-28 DOI:10.1109/ICST46399.2020.00019

Fitash Ul Haq, Donghwan Shin, S. Nejati, L. Briand

{"title":"Comparing Offline and Online Testing of Deep Neural Networks: An Autonomous Car Case Study","authors":"Fitash Ul Haq, Donghwan Shin, S. Nejati, L. Briand","doi":"10.1109/ICST46399.2020.00019","DOIUrl":null,"url":null,"abstract":"There is a growing body of research on developing testing techniques for Deep Neural Networks (DNNs). We distinguish two general modes of testing for DNNs: Offline testing where DNNs are tested as individual units based on test datasets obtained independently from the DNNs under test, and online testing where DNNs are embedded into a specific application and tested in a close-loop mode in interaction with the application environment. In addition, we identify two sources for generating test datasets for DNNs: Datasets obtained from real-life and datasets generated by simulators. While offline testing can be used with datasets obtained from either sources, online testing is largely confined to using simulators since online testing within real-life applications can be time consuming, expensive and dangerous. In this paper, we study the following two important questions aiming to compare test datasets and testing modes for DNNs: First, can we use simulator-generated data as a reliable substitute to real-world data for the purpose of DNN testing? Second, how do online and offline testing results differ and complement each other? Though these questions are generally relevant to all autonomous systems, we study them in the context of automated driving systems where, as study subjects, we use DNNs automating end-to-end control of cars’ steering actuators. Our results show that simulator-generated datasets are able to yield DNN prediction errors that are similar to those obtained by testing DNNs with real-life datasets. Further, offline testing is more optimistic than online testing as many safety violations identified by online testing could not be identified by offline testing, while large prediction errors generated by offline testing always led to severe safety violations detectable by online testing.","PeriodicalId":235967,"journal":{"name":"2020 IEEE 13th International Conference on Software Testing, Validation and Verification (ICST)","volume":"85 2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"40","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE 13th International Conference on Software Testing, Validation and Verification (ICST)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICST46399.2020.00019","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 40

Abstract

There is a growing body of research on developing testing techniques for Deep Neural Networks (DNNs). We distinguish two general modes of testing for DNNs: Offline testing where DNNs are tested as individual units based on test datasets obtained independently from the DNNs under test, and online testing where DNNs are embedded into a specific application and tested in a close-loop mode in interaction with the application environment. In addition, we identify two sources for generating test datasets for DNNs: Datasets obtained from real-life and datasets generated by simulators. While offline testing can be used with datasets obtained from either sources, online testing is largely confined to using simulators since online testing within real-life applications can be time consuming, expensive and dangerous. In this paper, we study the following two important questions aiming to compare test datasets and testing modes for DNNs: First, can we use simulator-generated data as a reliable substitute to real-world data for the purpose of DNN testing? Second, how do online and offline testing results differ and complement each other? Though these questions are generally relevant to all autonomous systems, we study them in the context of automated driving systems where, as study subjects, we use DNNs automating end-to-end control of cars’ steering actuators. Our results show that simulator-generated datasets are able to yield DNN prediction errors that are similar to those obtained by testing DNNs with real-life datasets. Further, offline testing is more optimistic than online testing as many safety violations identified by online testing could not be identified by offline testing, while large prediction errors generated by offline testing always led to severe safety violations detectable by online testing.

查看原文本刊更多论文

深度神经网络离线和在线测试的比较:一个自动驾驶汽车的案例研究

对深度神经网络(dnn)测试技术的研究越来越多。我们区分了dnn的两种一般测试模式:离线测试，其中dnn作为独立于被测dnn的测试数据集进行测试，以及在线测试，其中dnn嵌入到特定应用程序中，并在与应用程序环境交互的闭环模式下进行测试。此外，我们确定了为dnn生成测试数据集的两个来源:从现实生活中获得的数据集和由模拟器生成的数据集。虽然离线测试可以使用从任何来源获得的数据集，但在线测试主要限于使用模拟器，因为在实际应用程序中进行在线测试可能耗时、昂贵且危险。在本文中，我们研究了以下两个重要问题，旨在比较DNN的测试数据集和测试模式:首先，我们是否可以使用模拟器生成的数据作为真实世界数据的可靠替代品来进行DNN测试?第二，线上和线下的检测结果是如何区别和互补的?虽然这些问题通常与所有自动驾驶系统相关，但我们在自动驾驶系统的背景下研究它们，作为研究对象，我们使用dnn自动化汽车转向执行器的端到端控制。我们的研究结果表明，模拟器生成的数据集能够产生与使用真实数据集测试DNN所获得的预测误差相似的DNN预测误差。此外，离线测试比在线测试更乐观，因为许多在线测试识别的安全违规行为无法通过离线测试识别，而离线测试产生的较大预测误差往往导致在线测试可以检测到严重的安全违规行为。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2020 IEEE 13th International Conference on Software Testing, Validation and Verification (ICST)

自引率

0.00%

发文量