不平衡层次术语结构下聚类相似性度量的计算机制

2007 International Conference on Parallel Processing Workshops (ICPPW 2007) Pub Date : 2007-09-10 DOI:10.1109/ICPPW.2007.6

MinTzu Wang, P. Hsu, K. Lin, J. Hung

{"title":"不平衡层次术语结构下聚类相似性度量的计算机制","authors":"MinTzu Wang, P. Hsu, K. Lin, J. Hung","doi":"10.1109/ICPPW.2007.6","DOIUrl":null,"url":null,"abstract":"The effective retrieval of reverent information often is quite useful to the user, for example, to query the respectful knowledge or information, especially for on-line e-leaner. The most common method is to make use of synonym and antonym from a dictionary with the most frequent terms. However, sometimes we are focusing on a pair of or a set of associated keywords offered by user, instead of same meaning. Generally, we would probably adopt the association rule to solve the problem. Nonetheless, the keywords or terms sets extracted from huge queries often contain sparse information composed of a wide range of keywords, with each term set only containing a few terms. These data render basket analysis with extremely low item support, lift the term to a higher level of concept hierarchy may get enough support, but missing the detailed information. Although a similarity measure represented by counting the depth of the least common ancestor normalized by the depth of the concept tree lifts the limitation of binary equality, it produces counter intuitive results when the concept hierarchy is unbalanced since two terms in deeper subtrees are very likely to have a higher similarity than two terms in shallower subtrees. The research proposes to calculate the distance between two terms by counting the edge traversal needed yet from user's viewpoint to link them in order to solve the issues. The method is straight forward yet achieves better outcome with information query when concept hierarchy is unbalanced.","PeriodicalId":367703,"journal":{"name":"2007 International Conference on Parallel Processing Workshops (ICPPW 2007)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2007-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"A Calculation Mechanism for Similarity Measure with Clustering an Unbalanced Hierarchical Terminology Structure\",\"authors\":\"MinTzu Wang, P. Hsu, K. Lin, J. Hung\",\"doi\":\"10.1109/ICPPW.2007.6\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The effective retrieval of reverent information often is quite useful to the user, for example, to query the respectful knowledge or information, especially for on-line e-leaner. The most common method is to make use of synonym and antonym from a dictionary with the most frequent terms. However, sometimes we are focusing on a pair of or a set of associated keywords offered by user, instead of same meaning. Generally, we would probably adopt the association rule to solve the problem. Nonetheless, the keywords or terms sets extracted from huge queries often contain sparse information composed of a wide range of keywords, with each term set only containing a few terms. These data render basket analysis with extremely low item support, lift the term to a higher level of concept hierarchy may get enough support, but missing the detailed information. Although a similarity measure represented by counting the depth of the least common ancestor normalized by the depth of the concept tree lifts the limitation of binary equality, it produces counter intuitive results when the concept hierarchy is unbalanced since two terms in deeper subtrees are very likely to have a higher similarity than two terms in shallower subtrees. The research proposes to calculate the distance between two terms by counting the edge traversal needed yet from user's viewpoint to link them in order to solve the issues. The method is straight forward yet achieves better outcome with information query when concept hierarchy is unbalanced.\",\"PeriodicalId\":367703,\"journal\":{\"name\":\"2007 International Conference on Parallel Processing Workshops (ICPPW 2007)\",\"volume\":\"22 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2007-09-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2007 International Conference on Parallel Processing Workshops (ICPPW 2007)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICPPW.2007.6\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2007 International Conference on Parallel Processing Workshops (ICPPW 2007)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICPPW.2007.6","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

虔诚信息的有效检索往往对用户非常有用，例如查询虔诚的知识或信息，特别是对在线电子学习者。最常见的方法是利用字典中出现频率最高的同义词和反义词。然而，有时我们关注的是用户提供的一对或一组相关关键字，而不是相同的含义。一般来说，我们可能会采用关联规则来解决这个问题。尽管如此，从大型查询中提取的关键字或术语集通常包含由广泛的关键字组成的稀疏信息，每个术语集只包含几个术语。这些数据呈现的篮子分析具有极低的项目支持，将术语提升到更高层次的概念层次可能会得到足够的支持，但缺少详细信息。尽管通过计算概念树深度归一化的最小共同祖先的深度来表示的相似性度量解除了二值相等的限制，但当概念层次结构不平衡时，它会产生反直觉的结果，因为较深子树中的两个项很可能比较浅子树中的两个项具有更高的相似性。为了解决这一问题，本研究提出了从用户的角度出发，通过计算连接两项所需的遍历次数来计算两项之间的距离。该方法简单明了，但在概念层次不平衡的情况下，信息查询的效果更好。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A Calculation Mechanism for Similarity Measure with Clustering an Unbalanced Hierarchical Terminology Structure

The effective retrieval of reverent information often is quite useful to the user, for example, to query the respectful knowledge or information, especially for on-line e-leaner. The most common method is to make use of synonym and antonym from a dictionary with the most frequent terms. However, sometimes we are focusing on a pair of or a set of associated keywords offered by user, instead of same meaning. Generally, we would probably adopt the association rule to solve the problem. Nonetheless, the keywords or terms sets extracted from huge queries often contain sparse information composed of a wide range of keywords, with each term set only containing a few terms. These data render basket analysis with extremely low item support, lift the term to a higher level of concept hierarchy may get enough support, but missing the detailed information. Although a similarity measure represented by counting the depth of the least common ancestor normalized by the depth of the concept tree lifts the limitation of binary equality, it produces counter intuitive results when the concept hierarchy is unbalanced since two terms in deeper subtrees are very likely to have a higher similarity than two terms in shallower subtrees. The research proposes to calculate the distance between two terms by counting the edge traversal needed yet from user's viewpoint to link them in order to solve the issues. The method is straight forward yet achieves better outcome with information query when concept hierarchy is unbalanced.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2007 International Conference on Parallel Processing Workshops (ICPPW 2007)

自引率

0.00%

发文量