大型树结构数据集的高效关键字搜索

KEYS '12 Pub Date : 2012-05-20 DOI:10.1145/2254736.2254749

Aggeliki Dimitriou, D. Theodoratos

{"title":"大型树结构数据集的高效关键字搜索","authors":"Aggeliki Dimitriou, D. Theodoratos","doi":"10.1145/2254736.2254749","DOIUrl":null,"url":null,"abstract":"Keyword search is the most popular paradigm for querying XML data on the web. In this context, three challenging problems are (a) to avoid missing useful results in the answer set, (b) to rank the results with respect to some relevance criterion and (c) to design algorithms that can efficiently compute the results on large datasets.\n In this paper, we present a novel multi-stack based algorithm that returns as an answer to a keyword query all the results ranked on their size. Our algorithm exploits a lattice of stacks each corresponding to a partition of the keyword set of the query. This feature empowers a linear time performance on the size of the input data for a given number of query keywords. As a result, our algorithm can run efficiently on large input data for several keywords. We also present a variation of our algorithm which accounts for infrequent keywords in the query and show that it can significantly improve the execution time. An extensive experimental evaluation of our approach confirms the theoretical analysis, and shows that it scales smoothly when the size of the input data and the number of input keywords increases.","PeriodicalId":170987,"journal":{"name":"KEYS '12","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":"{\"title\":\"Efficient keyword search on large tree structured datasets\",\"authors\":\"Aggeliki Dimitriou, D. Theodoratos\",\"doi\":\"10.1145/2254736.2254749\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Keyword search is the most popular paradigm for querying XML data on the web. In this context, three challenging problems are (a) to avoid missing useful results in the answer set, (b) to rank the results with respect to some relevance criterion and (c) to design algorithms that can efficiently compute the results on large datasets.\\n In this paper, we present a novel multi-stack based algorithm that returns as an answer to a keyword query all the results ranked on their size. Our algorithm exploits a lattice of stacks each corresponding to a partition of the keyword set of the query. This feature empowers a linear time performance on the size of the input data for a given number of query keywords. As a result, our algorithm can run efficiently on large input data for several keywords. We also present a variation of our algorithm which accounts for infrequent keywords in the query and show that it can significantly improve the execution time. An extensive experimental evaluation of our approach confirms the theoretical analysis, and shows that it scales smoothly when the size of the input data and the number of input keywords increases.\",\"PeriodicalId\":170987,\"journal\":{\"name\":\"KEYS '12\",\"volume\":\"5 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-05-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"9\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"KEYS '12\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2254736.2254749\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"KEYS '12","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2254736.2254749","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 9

摘要

关键字搜索是查询web上XML数据的最流行的范例。在这种情况下，三个具有挑战性的问题是:(a)避免在答案集中遗漏有用的结果，(b)根据一些相关标准对结果进行排序，以及(c)设计能够在大型数据集上有效计算结果的算法。在本文中，我们提出了一种新的基于多堆栈的算法，该算法将所有结果按其大小排序作为关键字查询的答案返回。我们的算法利用了一个堆栈格，每个堆栈对应于查询的关键字集的一个分区。对于给定数量的查询关键字，该特性支持输入数据大小的线性时间性能。因此，我们的算法可以在多个关键字的大量输入数据上高效运行。我们还提出了我们的算法的一个变体，它考虑了查询中不常见的关键字，并表明它可以显着提高执行时间。对我们的方法进行了广泛的实验评估，证实了理论分析，并表明当输入数据的大小和输入关键字的数量增加时，它可以平滑地扩展。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Efficient keyword search on large tree structured datasets

Keyword search is the most popular paradigm for querying XML data on the web. In this context, three challenging problems are (a) to avoid missing useful results in the answer set, (b) to rank the results with respect to some relevance criterion and (c) to design algorithms that can efficiently compute the results on large datasets. In this paper, we present a novel multi-stack based algorithm that returns as an answer to a keyword query all the results ranked on their size. Our algorithm exploits a lattice of stacks each corresponding to a partition of the keyword set of the query. This feature empowers a linear time performance on the size of the input data for a given number of query keywords. As a result, our algorithm can run efficiently on large input data for several keywords. We also present a variation of our algorithm which accounts for infrequent keywords in the query and show that it can significantly improve the execution time. An extensive experimental evaluation of our approach confirms the theoretical analysis, and shows that it scales smoothly when the size of the input data and the number of input keywords increases.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

KEYS '12

自引率

0.00%

发文量