树相似度的统计学习算法

Seventh IEEE International Conference on Data Mining (ICDM 2007) Pub Date : 2007-10-28 DOI:10.1109/ICDM.2007.38

A. Takasu, Daiji Fukagawa, T. Akutsu

{"title":"树相似度的统计学习算法","authors":"A. Takasu, Daiji Fukagawa, T. Akutsu","doi":"10.1109/ICDM.2007.38","DOIUrl":null,"url":null,"abstract":"Tree edit distance is one of the most frequently used distance measures for comparing trees. When using the tree edit distance, we need to determine the cost of each operation, but this is a labor-intensive and highly skilled task. This paper proposes an algorithm for learning the costs of tree edit operations from training data consisting of pairs of similar trees. To formalize the cost learning problem, we define a probabilistic model for tree alignment that is a variant of tree edit distance. Then, the parameters of the model are estimated using the expectation maximization (EM) technique. In this paper, we develop an algorithm for parameter learning that is polynomial in time (O{mn2d6)) and space (O{n2d4)) where n, d, and m represent the size of the trees, the maximum degree of trees, and the number of training pairs of trees, respectively.","PeriodicalId":233758,"journal":{"name":"Seventh IEEE International Conference on Data Mining (ICDM 2007)","volume":"255 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2007-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":"{\"title\":\"Statistical Learning Algorithm for Tree Similarity\",\"authors\":\"A. Takasu, Daiji Fukagawa, T. Akutsu\",\"doi\":\"10.1109/ICDM.2007.38\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Tree edit distance is one of the most frequently used distance measures for comparing trees. When using the tree edit distance, we need to determine the cost of each operation, but this is a labor-intensive and highly skilled task. This paper proposes an algorithm for learning the costs of tree edit operations from training data consisting of pairs of similar trees. To formalize the cost learning problem, we define a probabilistic model for tree alignment that is a variant of tree edit distance. Then, the parameters of the model are estimated using the expectation maximization (EM) technique. In this paper, we develop an algorithm for parameter learning that is polynomial in time (O{mn2d6)) and space (O{n2d4)) where n, d, and m represent the size of the trees, the maximum degree of trees, and the number of training pairs of trees, respectively.\",\"PeriodicalId\":233758,\"journal\":{\"name\":\"Seventh IEEE International Conference on Data Mining (ICDM 2007)\",\"volume\":\"255 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2007-10-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"12\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Seventh IEEE International Conference on Data Mining (ICDM 2007)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDM.2007.38\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Seventh IEEE International Conference on Data Mining (ICDM 2007)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDM.2007.38","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 12

摘要

树编辑距离是比较树最常用的距离度量之一。在使用树编辑距离时，我们需要确定每个操作的成本，但这是一项劳动密集型和高技能的任务。本文提出了一种从由相似树对组成的训练数据中学习树编辑操作代价的算法。为了形式化成本学习问题，我们定义了树对齐的概率模型，该模型是树编辑距离的一个变体。然后，利用期望最大化(EM)技术估计模型的参数。在本文中，我们开发了一种参数学习算法，该算法是时间(O{mn2d6)和空间(O{n2d4)的多项式，其中n, d和m分别表示树的大小，树的最大程度和树的训练对的数量。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Statistical Learning Algorithm for Tree Similarity

Tree edit distance is one of the most frequently used distance measures for comparing trees. When using the tree edit distance, we need to determine the cost of each operation, but this is a labor-intensive and highly skilled task. This paper proposes an algorithm for learning the costs of tree edit operations from training data consisting of pairs of similar trees. To formalize the cost learning problem, we define a probabilistic model for tree alignment that is a variant of tree edit distance. Then, the parameters of the model are estimated using the expectation maximization (EM) technique. In this paper, we develop an algorithm for parameter learning that is polynomial in time (O{mn2d6)) and space (O{n2d4)) where n, d, and m represent the size of the trees, the maximum degree of trees, and the number of training pairs of trees, respectively.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Seventh IEEE International Conference on Data Mining (ICDM 2007)

自引率

0.00%

发文量