From Myths to Norms: Demystifying Data Mining Models with Instance-Based Transparency

Y. Alufaisan, Yan Zhou, Murat Kantarcioglu, B. Thuraisingham
{"title":"From Myths to Norms: Demystifying Data Mining Models with Instance-Based Transparency","authors":"Y. Alufaisan, Yan Zhou, Murat Kantarcioglu, B. Thuraisingham","doi":"10.1109/CIC.2017.00042","DOIUrl":null,"url":null,"abstract":"The desire of moving from data to intelligence has become a trend that pushes the world we live in today fast forward. Machine learning and data mining techniques are being used as important tools to unlock the wealth of voluminous amounts of data owned by organizations. Despite the existing effort of explaining their underlying machinery in layman's terms, data mining models and their output remain as esoteric, discipline-based black boxes-viable only to experts with years of training and development experiences. As data mining techniques gain growing popularity in the real world, the ability to understand their intelligent decision-making artifacts has become increasingly important and critical, especially in areas such as criminal justice and law enforcement where transparency of decision-making is vital for ensuring fairness, justice, and equality. In this paper, we present a transparency model to help unmask the incomprehensible reasoning many data mining techniques are deservedly taking the blame for. Our transparency model substitutes a comprehensible, rule-based counterpart for the complex, black-box output of any data mining technique using a novel rule selection technique. The rule-based substitute explains the decision made for each instance with a tiny set of rules, resulting in a significant reduction in model complexity. Besides model simplicity and comprehensibility, we also assess the quality of our rule set by measuring its similarity to the output of the original data mining algorithm. Furthermore, we compute its accuracy on unseen test data as a complementary assessment criteria. We empirically demonstrate the effectiveness of our transparency model by experimenting on eight real datasets that deal with predicting important personal attributes ranging from credit worthiness to criminal recidivism. Our transparency model demonstrates a high degree of consistency with the original data mining algorithms in nearly all cases. We also compare our results to one of the state-of-the-art transparency models-LIME, and show that our transparency model outperforms LIME 84% of the time.","PeriodicalId":156843,"journal":{"name":"2017 IEEE 3rd International Conference on Collaboration and Internet Computing (CIC)","volume":"105 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE 3rd International Conference on Collaboration and Internet Computing (CIC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CIC.2017.00042","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

The desire of moving from data to intelligence has become a trend that pushes the world we live in today fast forward. Machine learning and data mining techniques are being used as important tools to unlock the wealth of voluminous amounts of data owned by organizations. Despite the existing effort of explaining their underlying machinery in layman's terms, data mining models and their output remain as esoteric, discipline-based black boxes-viable only to experts with years of training and development experiences. As data mining techniques gain growing popularity in the real world, the ability to understand their intelligent decision-making artifacts has become increasingly important and critical, especially in areas such as criminal justice and law enforcement where transparency of decision-making is vital for ensuring fairness, justice, and equality. In this paper, we present a transparency model to help unmask the incomprehensible reasoning many data mining techniques are deservedly taking the blame for. Our transparency model substitutes a comprehensible, rule-based counterpart for the complex, black-box output of any data mining technique using a novel rule selection technique. The rule-based substitute explains the decision made for each instance with a tiny set of rules, resulting in a significant reduction in model complexity. Besides model simplicity and comprehensibility, we also assess the quality of our rule set by measuring its similarity to the output of the original data mining algorithm. Furthermore, we compute its accuracy on unseen test data as a complementary assessment criteria. We empirically demonstrate the effectiveness of our transparency model by experimenting on eight real datasets that deal with predicting important personal attributes ranging from credit worthiness to criminal recidivism. Our transparency model demonstrates a high degree of consistency with the original data mining algorithms in nearly all cases. We also compare our results to one of the state-of-the-art transparency models-LIME, and show that our transparency model outperforms LIME 84% of the time.
从神话到规范:揭秘数据挖掘模型与基于实例的透明度
从数据转向智能的愿望已经成为一种趋势,推动着我们今天生活的世界快速前进。机器学习和数据挖掘技术正被用作解锁组织拥有的大量数据财富的重要工具。尽管现有的工作是用外行的术语解释其底层机制,但数据挖掘模型及其输出仍然是深奥的、基于学科的黑盒子——只有具有多年培训和开发经验的专家才可行。随着数据挖掘技术在现实世界中越来越受欢迎,理解其智能决策工件的能力变得越来越重要和关键,特别是在刑事司法和执法等领域,决策的透明度对于确保公平、正义和平等至关重要。在本文中,我们提出了一个透明度模型,以帮助揭示许多数据挖掘技术应该承担的不可理解的原因。我们的透明度模型使用一种新的规则选择技术,用一种可理解的、基于规则的对等物代替任何数据挖掘技术的复杂的黑箱输出。基于规则的替代方法用一组很小的规则解释了为每个实例所做的决策,从而显著降低了模型的复杂性。除了模型的简单性和可理解性外,我们还通过测量其与原始数据挖掘算法输出的相似度来评估规则集的质量。此外,我们计算了其在未见测试数据上的准确性,作为补充评估标准。我们通过对八个真实数据集的实验,从经验上证明了透明度模型的有效性,这些数据集处理从信用价值到犯罪累犯等重要个人属性的预测。我们的透明度模型在几乎所有情况下都与原始数据挖掘算法具有高度的一致性。我们还将我们的结果与最先进的透明度模型之一LIME进行了比较,并表明我们的透明度模型在84%的时间内优于LIME。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信