A-XAI: adversarial machine learning for trustable explainability

Nishita Agrawal, Isha Pendharkar, Jugal Shroff, Jatin Raghuvanshi, Akashdip Neogi, Shruti Patil, Rahee Walambe, Ketan Kotecha
{"title":"A-XAI: adversarial machine learning for trustable explainability","authors":"Nishita Agrawal,&nbsp;Isha Pendharkar,&nbsp;Jugal Shroff,&nbsp;Jatin Raghuvanshi,&nbsp;Akashdip Neogi,&nbsp;Shruti Patil,&nbsp;Rahee Walambe,&nbsp;Ketan Kotecha","doi":"10.1007/s43681-023-00368-4","DOIUrl":null,"url":null,"abstract":"<div><p>With the recent advancements in the usage of Artificial Intelligence (AI)-based systems in the healthcare and medical domain, it has become necessary to monitor whether these systems make predictions using the correct features or not. For this purpose, many different types of model interpretability and explainability methods are proposed in the literature. However, with the rising number of adversarial attacks against these AI-based systems, it also becomes necessary to make those systems more robust to adversarial attacks and validate the correctness of the generated model explainability. In this work, we first demonstrate how an adversarial attack can affect the model explainability even after robust training. Along with this, we present two different types of attack classifiers: one that can detect whether the given input is benign or adversarial and the other classifier that can identify the type of attack. We also identify the regions affected by the adversarial attack using model explainability. Finally, we demonstrate how the correctness of the generated explainability can be verified using model interpretability methods.</p></div>","PeriodicalId":72137,"journal":{"name":"AI and ethics","volume":"4 4","pages":"1143 - 1174"},"PeriodicalIF":0.0000,"publicationDate":"2024-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"AI and ethics","FirstCategoryId":"1085","ListUrlMain":"https://link.springer.com/article/10.1007/s43681-023-00368-4","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

With the recent advancements in the usage of Artificial Intelligence (AI)-based systems in the healthcare and medical domain, it has become necessary to monitor whether these systems make predictions using the correct features or not. For this purpose, many different types of model interpretability and explainability methods are proposed in the literature. However, with the rising number of adversarial attacks against these AI-based systems, it also becomes necessary to make those systems more robust to adversarial attacks and validate the correctness of the generated model explainability. In this work, we first demonstrate how an adversarial attack can affect the model explainability even after robust training. Along with this, we present two different types of attack classifiers: one that can detect whether the given input is benign or adversarial and the other classifier that can identify the type of attack. We also identify the regions affected by the adversarial attack using model explainability. Finally, we demonstrate how the correctness of the generated explainability can be verified using model interpretability methods.

Abstract Image

A-XAI:可信任可解释性的对抗式机器学习
随着最近基于人工智能(AI)的系统在医疗保健和医学领域的应用不断发展,有必要对这些系统是否使用正确的特征进行预测进行监测。为此,文献中提出了许多不同类型的模型可解释性和可解释性方法。然而,随着针对这些基于人工智能的系统的对抗性攻击日益增多,也有必要使这些系统对对抗性攻击更具鲁棒性,并验证生成的模型可解释性的正确性。在这项工作中,我们首先展示了对抗性攻击是如何影响模型的可解释性的,即使是在鲁棒性训练之后。与此同时,我们提出了两种不同类型的攻击分类器:一种可以检测给定输入是良性的还是对抗性的,另一种可以识别攻击类型。我们还利用模型的可解释性确定了受对抗性攻击影响的区域。最后,我们演示了如何使用模型可解释性方法验证生成的可解释性的正确性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信