CauseRuDi：通过因果统计生成和规则蒸馏解释行为序列模型

IF 8.9 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Knowledge and Data Engineering Pub Date : 2024-10-29 DOI:10.1109/TKDE.2024.3487625

Yao Zhang;Yun Xiong;Yiheng Sun;Yucheng Jin;Caihua Shan;Tian Lu;Hui Song;Shengli Sun

{"title":"CauseRuDi：通过因果统计生成和规则蒸馏解释行为序列模型","authors":"Yao Zhang;Yun Xiong;Yiheng Sun;Yucheng Jin;Caihua Shan;Tian Lu;Hui Song;Shengli Sun","doi":"10.1109/TKDE.2024.3487625","DOIUrl":null,"url":null,"abstract":"Risk scoring systems have been widely deployed in many applications, which assign risk scores to users according to their behavior sequences. Though many deep learning methods with sophisticated designs have achieved promising results, the black-box nature hinders their applications due to fairness, explainability, and compliance consideration. Rule-based systems are considered reliable in these sensitive scenarios. However, building a rule system is labor-intensive. Experts need to find informative statistics from user behavior sequences, design rules based on statistics and assign weights to each rule. In this paper, we bridge the gap between effective but black-box models and transparent rule models. We propose a two-stage framework, CauseRuDi, that distills the knowledge of black-box teacher models into rule-based student models. We design a Monte Carlo tree search-based statistics generation method that maximizes the correlation or dependence between the generated statistics and the teacher model's outputs. We formulate a sequential move game and a simultaneous move coalitional game to generate multiple statistics. Then statistics are composed into logical rules with our proposed neural logical networks by mimicking the outputs of teacher models. We evaluate CauseRuDi on three real-world public datasets and an industrial dataset to demonstrate its effectiveness.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 1","pages":"116-129"},"PeriodicalIF":8.9000,"publicationDate":"2024-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"CauseRuDi: Explaining Behavior Sequence Models by Causal Statistics Generation and Rule Distillation\",\"authors\":\"Yao Zhang;Yun Xiong;Yiheng Sun;Yucheng Jin;Caihua Shan;Tian Lu;Hui Song;Shengli Sun\",\"doi\":\"10.1109/TKDE.2024.3487625\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Risk scoring systems have been widely deployed in many applications, which assign risk scores to users according to their behavior sequences. Though many deep learning methods with sophisticated designs have achieved promising results, the black-box nature hinders their applications due to fairness, explainability, and compliance consideration. Rule-based systems are considered reliable in these sensitive scenarios. However, building a rule system is labor-intensive. Experts need to find informative statistics from user behavior sequences, design rules based on statistics and assign weights to each rule. In this paper, we bridge the gap between effective but black-box models and transparent rule models. We propose a two-stage framework, CauseRuDi, that distills the knowledge of black-box teacher models into rule-based student models. We design a Monte Carlo tree search-based statistics generation method that maximizes the correlation or dependence between the generated statistics and the teacher model's outputs. We formulate a sequential move game and a simultaneous move coalitional game to generate multiple statistics. Then statistics are composed into logical rules with our proposed neural logical networks by mimicking the outputs of teacher models. We evaluate CauseRuDi on three real-world public datasets and an industrial dataset to demonstrate its effectiveness.\",\"PeriodicalId\":13496,\"journal\":{\"name\":\"IEEE Transactions on Knowledge and Data Engineering\",\"volume\":\"37 1\",\"pages\":\"116-129\"},\"PeriodicalIF\":8.9000,\"publicationDate\":\"2024-10-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Knowledge and Data Engineering\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10737680/\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Knowledge and Data Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10737680/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

风险评分系统根据用户的行为顺序对用户进行风险评分，已广泛应用于许多应用中。尽管许多设计复杂的深度学习方法取得了令人鼓舞的成果，但由于公平性、可解释性和合规性的考虑，黑箱性质阻碍了它们的应用。在这些敏感场景中，基于规则的系统被认为是可靠的。然而，建立规则系统是一项劳动密集型工作。专家需要从用户行为序列中找到信息统计，基于统计设计规则并为每个规则分配权重。在本文中，我们弥合了有效的黑盒模型和透明规则模型之间的差距。我们提出了一个两阶段框架CauseRuDi，它将黑箱教师模型的知识提炼成基于规则的学生模型。我们设计了一种基于蒙特卡罗树搜索的统计数据生成方法，该方法最大限度地提高了生成的统计数据与教师模型输出之间的相关性或依赖性。我们制定了一个顺序移动博弈和一个同步移动联合博弈来产生多个统计数据。然后通过模拟教师模型的输出，利用我们提出的神经逻辑网络将统计数据组成逻辑规则。我们在三个真实世界的公共数据集和一个工业数据集上评估了CauseRuDi，以证明其有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

CauseRuDi: Explaining Behavior Sequence Models by Causal Statistics Generation and Rule Distillation

Risk scoring systems have been widely deployed in many applications, which assign risk scores to users according to their behavior sequences. Though many deep learning methods with sophisticated designs have achieved promising results, the black-box nature hinders their applications due to fairness, explainability, and compliance consideration. Rule-based systems are considered reliable in these sensitive scenarios. However, building a rule system is labor-intensive. Experts need to find informative statistics from user behavior sequences, design rules based on statistics and assign weights to each rule. In this paper, we bridge the gap between effective but black-box models and transparent rule models. We propose a two-stage framework, CauseRuDi, that distills the knowledge of black-box teacher models into rule-based student models. We design a Monte Carlo tree search-based statistics generation method that maximizes the correlation or dependence between the generated statistics and the teacher model's outputs. We formulate a sequential move game and a simultaneous move coalitional game to generate multiple statistics. Then statistics are composed into logical rules with our proposed neural logical networks by mimicking the outputs of teacher models. We evaluate CauseRuDi on three real-world public datasets and an industrial dataset to demonstrate its effectiveness.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Knowledge and Data Engineering 工程技术-工程：电子与电气

CiteScore

11.70

自引率

3.40%

发文量

515

审稿时长

6 months

期刊介绍： The IEEE Transactions on Knowledge and Data Engineering encompasses knowledge and data engineering aspects within computer science, artificial intelligence, electrical engineering, computer engineering, and related fields. It provides an interdisciplinary platform for disseminating new developments in knowledge and data engineering and explores the practicality of these concepts in both hardware and software. Specific areas covered include knowledge-based and expert systems, AI techniques for knowledge and data management, tools, and methodologies, distributed processing, real-time systems, architectures, data management practices, database design, query languages, security, fault tolerance, statistical databases, algorithms, performance evaluation, and applications.