Towards safe machine learning for CPS: infer uncertainty from training data

Xiaozhe Gu, A. Easwaran
{"title":"Towards safe machine learning for CPS: infer uncertainty from training data","authors":"Xiaozhe Gu, A. Easwaran","doi":"10.1145/3302509.3311038","DOIUrl":null,"url":null,"abstract":"Machine learning (ML) techniques are increasingly applied to decision-making and control problems in Cyber-Physical Systems among which many are safety-critical, e.g., chemical plants, robotics, autonomous vehicles. Despite the significant benefits brought by ML techniques, they also raise additional safety issues because 1) most expressive and powerful ML models are not transparent and behave as a black box and 2) the training data which plays a crucial role in ML safety is usually incomplete. An important technique to achieve safety for ML models is \"Safe Fail\", i.e., a model selects a reject option and applies the backup solution, a traditional controller or a human operator for example, when it has low confidence in a prediction. Data-driven models produced by ML algorithms learn from training data, and hence they are only as good as the examples they have learnt. As pointed in [17], ML models work well in the \"training space\" (i.e., feature space with sufficient training data), but they could not extrapolate beyond the training space. As observed in many previous studies, a feature space that lacks training data generally has a much higher error rate than the one that contains sufficient training samples [31]. Therefore, it is essential to identify the training space and avoid extrapolating beyond the training space. In this paper, we propose an efficient Feature Space Partitioning Tree (FSPT) to address this problem. Using experiments, we also show that, a strong relationship exists between model performance and FSPT score.","PeriodicalId":413733,"journal":{"name":"Proceedings of the 10th ACM/IEEE International Conference on Cyber-Physical Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"23","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 10th ACM/IEEE International Conference on Cyber-Physical Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3302509.3311038","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 23

Abstract

Machine learning (ML) techniques are increasingly applied to decision-making and control problems in Cyber-Physical Systems among which many are safety-critical, e.g., chemical plants, robotics, autonomous vehicles. Despite the significant benefits brought by ML techniques, they also raise additional safety issues because 1) most expressive and powerful ML models are not transparent and behave as a black box and 2) the training data which plays a crucial role in ML safety is usually incomplete. An important technique to achieve safety for ML models is "Safe Fail", i.e., a model selects a reject option and applies the backup solution, a traditional controller or a human operator for example, when it has low confidence in a prediction. Data-driven models produced by ML algorithms learn from training data, and hence they are only as good as the examples they have learnt. As pointed in [17], ML models work well in the "training space" (i.e., feature space with sufficient training data), but they could not extrapolate beyond the training space. As observed in many previous studies, a feature space that lacks training data generally has a much higher error rate than the one that contains sufficient training samples [31]. Therefore, it is essential to identify the training space and avoid extrapolating beyond the training space. In this paper, we propose an efficient Feature Space Partitioning Tree (FSPT) to address this problem. Using experiments, we also show that, a strong relationship exists between model performance and FSPT score.
面向CPS的安全机器学习:从训练数据中推断不确定性
机器学习(ML)技术越来越多地应用于网络物理系统中的决策和控制问题,其中许多是安全关键问题,例如化工厂,机器人,自动驾驶汽车。尽管机器学习技术带来了巨大的好处,但它们也带来了额外的安全问题,因为1)最具表现力和强大的机器学习模型不透明,表现得像一个黑匣子;2)在机器学习安全中起关键作用的训练数据通常是不完整的。实现ML模型安全性的一项重要技术是“安全失败”,即当模型对预测的置信度较低时,模型选择拒绝选项并应用备份解决方案,例如传统控制器或人工操作员。由ML算法生成的数据驱动模型从训练数据中学习,因此它们只能与它们学习的示例一样好。正如[17]所指出的,ML模型在“训练空间”(即具有足够训练数据的特征空间)中工作得很好,但它们不能外推到训练空间之外。从以往的许多研究中可以看出,缺乏训练数据的特征空间通常比包含足够训练样本[31]的特征空间错误率要高得多。因此,必须识别训练空间,避免外推超出训练空间。在本文中,我们提出了一种有效的特征空间划分树(FSPT)来解决这个问题。通过实验,我们还表明,模型性能与FSPT得分之间存在很强的关系。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信