解释实体解析的深度学习模型:使用LIME的体验报告

Vincenzo Di Cicco, D. Firmani, Nick Koudas, P. Merialdo, D. Srivastava
{"title":"解释实体解析的深度学习模型:使用LIME的体验报告","authors":"Vincenzo Di Cicco, D. Firmani, Nick Koudas, P. Merialdo, D. Srivastava","doi":"10.1145/3329859.3329878","DOIUrl":null,"url":null,"abstract":"Entity Resolution (ER) seeks to understand which records refer to the same entity (e.g., matching products sold on multiple websites). The sheer number of ways humans represent and misrepresent information about real-world entities makes ER a challenging problem. Deep Learning (DL) has provided impressive results in the field of natural language processing, thus recent works started exploring DL approaches to the ER problem, with encouraging results. However, we are still far from understanding why and when these approaches work in the ER setting. We are developing a methodology, Mojito, to produce explainable interpretations of the output of DL models for the ER task. Our methodology is based on LIME, a popular tool for producing prediction explanations for generic classification tasks. In this paper we report our first experiences in interpreting recent DL models for the ER task. Our results demonstrate the importance of explanations in the DL space, and suggest that, when assessing performance of DL algorithms for ER, accuracy alone may not be sufficient to demonstrate generality and reproducibility in a production environment.","PeriodicalId":118194,"journal":{"name":"Proceedings of the Second International Workshop on Exploiting Artificial Intelligence Techniques for Data Management","volume":"28 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"32","resultStr":"{\"title\":\"Interpreting deep learning models for entity resolution: an experience report using LIME\",\"authors\":\"Vincenzo Di Cicco, D. Firmani, Nick Koudas, P. Merialdo, D. Srivastava\",\"doi\":\"10.1145/3329859.3329878\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Entity Resolution (ER) seeks to understand which records refer to the same entity (e.g., matching products sold on multiple websites). The sheer number of ways humans represent and misrepresent information about real-world entities makes ER a challenging problem. Deep Learning (DL) has provided impressive results in the field of natural language processing, thus recent works started exploring DL approaches to the ER problem, with encouraging results. However, we are still far from understanding why and when these approaches work in the ER setting. We are developing a methodology, Mojito, to produce explainable interpretations of the output of DL models for the ER task. Our methodology is based on LIME, a popular tool for producing prediction explanations for generic classification tasks. In this paper we report our first experiences in interpreting recent DL models for the ER task. Our results demonstrate the importance of explanations in the DL space, and suggest that, when assessing performance of DL algorithms for ER, accuracy alone may not be sufficient to demonstrate generality and reproducibility in a production environment.\",\"PeriodicalId\":118194,\"journal\":{\"name\":\"Proceedings of the Second International Workshop on Exploiting Artificial Intelligence Techniques for Data Management\",\"volume\":\"28 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-07-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"32\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the Second International Workshop on Exploiting Artificial Intelligence Techniques for Data Management\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3329859.3329878\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Second International Workshop on Exploiting Artificial Intelligence Techniques for Data Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3329859.3329878","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 32

摘要

实体解析(ER)试图了解哪些记录指的是同一个实体(例如,匹配在多个网站上销售的产品)。人类对真实世界实体的信息表示和歪曲的方式之多,使得ER成为一个具有挑战性的问题。深度学习(DL)在自然语言处理领域提供了令人印象深刻的结果,因此最近的工作开始探索深度学习方法来解决ER问题,并取得了令人鼓舞的结果。然而,我们仍然远远不能理解为什么以及何时这些方法在急诊室环境中起作用。我们正在开发一种方法,Mojito,为ER任务生成可解释的深度学习模型输出。我们的方法基于LIME,这是一种为通用分类任务生成预测解释的流行工具。在本文中,我们报告了我们在解释最近的ER任务DL模型方面的第一次经验。我们的研究结果证明了解释在深度学习空间中的重要性,并表明,在评估ER的深度学习算法的性能时,仅凭准确性可能不足以证明生产环境中的通用性和可重复性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Interpreting deep learning models for entity resolution: an experience report using LIME
Entity Resolution (ER) seeks to understand which records refer to the same entity (e.g., matching products sold on multiple websites). The sheer number of ways humans represent and misrepresent information about real-world entities makes ER a challenging problem. Deep Learning (DL) has provided impressive results in the field of natural language processing, thus recent works started exploring DL approaches to the ER problem, with encouraging results. However, we are still far from understanding why and when these approaches work in the ER setting. We are developing a methodology, Mojito, to produce explainable interpretations of the output of DL models for the ER task. Our methodology is based on LIME, a popular tool for producing prediction explanations for generic classification tasks. In this paper we report our first experiences in interpreting recent DL models for the ER task. Our results demonstrate the importance of explanations in the DL space, and suggest that, when assessing performance of DL algorithms for ER, accuracy alone may not be sufficient to demonstrate generality and reproducibility in a production environment.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信