Y. Alufaisan, Yan Zhou, Murat Kantarcioglu, B. Thuraisingham
{"title":"从神话到规范:揭秘数据挖掘模型与基于实例的透明度","authors":"Y. Alufaisan, Yan Zhou, Murat Kantarcioglu, B. Thuraisingham","doi":"10.1109/CIC.2017.00042","DOIUrl":null,"url":null,"abstract":"The desire of moving from data to intelligence has become a trend that pushes the world we live in today fast forward. Machine learning and data mining techniques are being used as important tools to unlock the wealth of voluminous amounts of data owned by organizations. Despite the existing effort of explaining their underlying machinery in layman's terms, data mining models and their output remain as esoteric, discipline-based black boxes-viable only to experts with years of training and development experiences. As data mining techniques gain growing popularity in the real world, the ability to understand their intelligent decision-making artifacts has become increasingly important and critical, especially in areas such as criminal justice and law enforcement where transparency of decision-making is vital for ensuring fairness, justice, and equality. In this paper, we present a transparency model to help unmask the incomprehensible reasoning many data mining techniques are deservedly taking the blame for. Our transparency model substitutes a comprehensible, rule-based counterpart for the complex, black-box output of any data mining technique using a novel rule selection technique. The rule-based substitute explains the decision made for each instance with a tiny set of rules, resulting in a significant reduction in model complexity. Besides model simplicity and comprehensibility, we also assess the quality of our rule set by measuring its similarity to the output of the original data mining algorithm. Furthermore, we compute its accuracy on unseen test data as a complementary assessment criteria. We empirically demonstrate the effectiveness of our transparency model by experimenting on eight real datasets that deal with predicting important personal attributes ranging from credit worthiness to criminal recidivism. Our transparency model demonstrates a high degree of consistency with the original data mining algorithms in nearly all cases. We also compare our results to one of the state-of-the-art transparency models-LIME, and show that our transparency model outperforms LIME 84% of the time.","PeriodicalId":156843,"journal":{"name":"2017 IEEE 3rd International Conference on Collaboration and Internet Computing (CIC)","volume":"105 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"From Myths to Norms: Demystifying Data Mining Models with Instance-Based Transparency\",\"authors\":\"Y. Alufaisan, Yan Zhou, Murat Kantarcioglu, B. Thuraisingham\",\"doi\":\"10.1109/CIC.2017.00042\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The desire of moving from data to intelligence has become a trend that pushes the world we live in today fast forward. Machine learning and data mining techniques are being used as important tools to unlock the wealth of voluminous amounts of data owned by organizations. Despite the existing effort of explaining their underlying machinery in layman's terms, data mining models and their output remain as esoteric, discipline-based black boxes-viable only to experts with years of training and development experiences. As data mining techniques gain growing popularity in the real world, the ability to understand their intelligent decision-making artifacts has become increasingly important and critical, especially in areas such as criminal justice and law enforcement where transparency of decision-making is vital for ensuring fairness, justice, and equality. In this paper, we present a transparency model to help unmask the incomprehensible reasoning many data mining techniques are deservedly taking the blame for. Our transparency model substitutes a comprehensible, rule-based counterpart for the complex, black-box output of any data mining technique using a novel rule selection technique. The rule-based substitute explains the decision made for each instance with a tiny set of rules, resulting in a significant reduction in model complexity. Besides model simplicity and comprehensibility, we also assess the quality of our rule set by measuring its similarity to the output of the original data mining algorithm. Furthermore, we compute its accuracy on unseen test data as a complementary assessment criteria. We empirically demonstrate the effectiveness of our transparency model by experimenting on eight real datasets that deal with predicting important personal attributes ranging from credit worthiness to criminal recidivism. Our transparency model demonstrates a high degree of consistency with the original data mining algorithms in nearly all cases. We also compare our results to one of the state-of-the-art transparency models-LIME, and show that our transparency model outperforms LIME 84% of the time.\",\"PeriodicalId\":156843,\"journal\":{\"name\":\"2017 IEEE 3rd International Conference on Collaboration and Internet Computing (CIC)\",\"volume\":\"105 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 IEEE 3rd International Conference on Collaboration and Internet Computing (CIC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CIC.2017.00042\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE 3rd International Conference on Collaboration and Internet Computing (CIC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CIC.2017.00042","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
From Myths to Norms: Demystifying Data Mining Models with Instance-Based Transparency
The desire of moving from data to intelligence has become a trend that pushes the world we live in today fast forward. Machine learning and data mining techniques are being used as important tools to unlock the wealth of voluminous amounts of data owned by organizations. Despite the existing effort of explaining their underlying machinery in layman's terms, data mining models and their output remain as esoteric, discipline-based black boxes-viable only to experts with years of training and development experiences. As data mining techniques gain growing popularity in the real world, the ability to understand their intelligent decision-making artifacts has become increasingly important and critical, especially in areas such as criminal justice and law enforcement where transparency of decision-making is vital for ensuring fairness, justice, and equality. In this paper, we present a transparency model to help unmask the incomprehensible reasoning many data mining techniques are deservedly taking the blame for. Our transparency model substitutes a comprehensible, rule-based counterpart for the complex, black-box output of any data mining technique using a novel rule selection technique. The rule-based substitute explains the decision made for each instance with a tiny set of rules, resulting in a significant reduction in model complexity. Besides model simplicity and comprehensibility, we also assess the quality of our rule set by measuring its similarity to the output of the original data mining algorithm. Furthermore, we compute its accuracy on unseen test data as a complementary assessment criteria. We empirically demonstrate the effectiveness of our transparency model by experimenting on eight real datasets that deal with predicting important personal attributes ranging from credit worthiness to criminal recidivism. Our transparency model demonstrates a high degree of consistency with the original data mining algorithms in nearly all cases. We also compare our results to one of the state-of-the-art transparency models-LIME, and show that our transparency model outperforms LIME 84% of the time.