通过深度强化学习发现编辑规则

2023 IEEE 39th International Conference on Data Engineering (ICDE) Pub Date : 2023-04-01 DOI:10.1109/ICDE55515.2023.00034

Yinan Mei, Shaoxu Song, Chenguang Fang, Ziheng Wei, Jingyun Fang, Jiang Long

{"title":"通过深度强化学习发现编辑规则","authors":"Yinan Mei, Shaoxu Song, Chenguang Fang, Ziheng Wei, Jingyun Fang, Jiang Long","doi":"10.1109/ICDE55515.2023.00034","DOIUrl":null,"url":null,"abstract":"Editing rules specify the conditions of applying high quality master data to repair low quality input data. Discovering editing rules, however, is challenging, since it considers not only the well curated master data but also the large-scale input data, an extremely large search space. A natural baseline, namely EnuMiner, costly enumerates the rules with possible conditions from both master and input data. Although several pruning strategies are enabled, the algorithm still takes a long time when the enumeration space is large. To avoid enumerating all candidate rules during mining, we argue to model the rule discovery process as a Markov Decision Process. Specifically, we discover editing rules by growing a rule tree where each node corresponds to a rule. The algorithm generates a new rule from the current node as a child node. We propose a reinforcement learning-based editing rule discovery algorithm, RLMiner, which trains an agent to wisely make decisions on branches when traversing the tree. Following the idea of evaluating rules, we design a reward function that is more in line with rule discovery scenarios and makes our algorithm perform effectively and efficiently. The experimental results show that our proposed RLMiner can mine high-utility editing rules like EnuMiner and scale well on the datasets with many attributes and large domains.","PeriodicalId":434744,"journal":{"name":"2023 IEEE 39th International Conference on Data Engineering (ICDE)","volume":"53 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Discovering Editing Rules by Deep Reinforcement Learning\",\"authors\":\"Yinan Mei, Shaoxu Song, Chenguang Fang, Ziheng Wei, Jingyun Fang, Jiang Long\",\"doi\":\"10.1109/ICDE55515.2023.00034\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Editing rules specify the conditions of applying high quality master data to repair low quality input data. Discovering editing rules, however, is challenging, since it considers not only the well curated master data but also the large-scale input data, an extremely large search space. A natural baseline, namely EnuMiner, costly enumerates the rules with possible conditions from both master and input data. Although several pruning strategies are enabled, the algorithm still takes a long time when the enumeration space is large. To avoid enumerating all candidate rules during mining, we argue to model the rule discovery process as a Markov Decision Process. Specifically, we discover editing rules by growing a rule tree where each node corresponds to a rule. The algorithm generates a new rule from the current node as a child node. We propose a reinforcement learning-based editing rule discovery algorithm, RLMiner, which trains an agent to wisely make decisions on branches when traversing the tree. Following the idea of evaluating rules, we design a reward function that is more in line with rule discovery scenarios and makes our algorithm perform effectively and efficiently. The experimental results show that our proposed RLMiner can mine high-utility editing rules like EnuMiner and scale well on the datasets with many attributes and large domains.\",\"PeriodicalId\":434744,\"journal\":{\"name\":\"2023 IEEE 39th International Conference on Data Engineering (ICDE)\",\"volume\":\"53 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 IEEE 39th International Conference on Data Engineering (ICDE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDE55515.2023.00034\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE 39th International Conference on Data Engineering (ICDE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDE55515.2023.00034","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

编辑规则指定应用高质量主数据来修复低质量输入数据的条件。然而，发现编辑规则是具有挑战性的，因为它不仅要考虑精心策划的主数据，还要考虑大规模的输入数据，这是一个非常大的搜索空间。自然基线(即EnuMiner)从主数据和输入数据中枚举带有可能条件的规则。虽然启用了几种剪枝策略，但当枚举空间较大时，该算法仍然需要较长的时间。为了避免在挖掘过程中枚举所有候选规则，我们认为将规则发现过程建模为马尔可夫决策过程。具体来说，我们通过生长一个规则树来发现编辑规则，其中每个节点对应于一个规则。该算法从当前节点作为子节点生成新规则。我们提出了一种基于强化学习的编辑规则发现算法RLMiner，该算法训练智能体在遍历树时明智地对分支做出决策。遵循评估规则的思想，我们设计了一个更符合规则发现场景的奖励函数，使我们的算法有效地执行。实验结果表明，本文提出的RLMiner可以挖掘出像EnuMiner这样的高实用编辑规则，并能在属性多、域大的数据集上很好地扩展。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Discovering Editing Rules by Deep Reinforcement Learning

Editing rules specify the conditions of applying high quality master data to repair low quality input data. Discovering editing rules, however, is challenging, since it considers not only the well curated master data but also the large-scale input data, an extremely large search space. A natural baseline, namely EnuMiner, costly enumerates the rules with possible conditions from both master and input data. Although several pruning strategies are enabled, the algorithm still takes a long time when the enumeration space is large. To avoid enumerating all candidate rules during mining, we argue to model the rule discovery process as a Markov Decision Process. Specifically, we discover editing rules by growing a rule tree where each node corresponds to a rule. The algorithm generates a new rule from the current node as a child node. We propose a reinforcement learning-based editing rule discovery algorithm, RLMiner, which trains an agent to wisely make decisions on branches when traversing the tree. Following the idea of evaluating rules, we design a reward function that is more in line with rule discovery scenarios and makes our algorithm perform effectively and efficiently. The experimental results show that our proposed RLMiner can mine high-utility editing rules like EnuMiner and scale well on the datasets with many attributes and large domains.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2023 IEEE 39th International Conference on Data Engineering (ICDE)

自引率

0.00%

发文量