Top-k keyword search over probabilistic XML data

2011 IEEE 27th International Conference on Data Engineering Pub Date : 2011-04-11 DOI:10.1109/ICDE.2011.5767875

Jianxin Li, Chengfei Liu, Rui Zhou, Wei Wang

引用次数: 74

Abstract

Despite the proliferation of work on XML keyword query, it remains open to support keyword query over probabilistic XML data. Compared with traditional keyword search, it is far more expensive to answer a keyword query over probabilistic XML data due to the consideration of possible world semantics. In this paper, we firstly define the new problem of studying top-k keyword search over probabilistic XML data, which is to retrieve k SLCA results with the k highest probabilities of existence. And then we propose two efficient algorithms. The first algorithm PrStack can find k SLCA results with the k highest probabilities by scanning the relevant keyword nodes only once. To further improve the efficiency, we propose a second algorithm EagerTopK based on a set of pruning properties which can quickly prune unsatisfied SLCA candidates. Finally, we implement the two algorithms and compare their performance with analysis of extensive experimental results.

查看原文本刊更多论文

对概率XML数据进行Top-k关键字搜索

尽管在XML关键字查询方面的工作越来越多，但它仍然支持对概率XML数据进行关键字查询。与传统的关键字搜索相比，由于要考虑可能的世界语义，在概率性XML数据上回答关键字查询的成本要高得多。本文首先定义了在概率性XML数据上研究top-k关键字搜索的新问题，即检索k个存在概率最高的SLCA结果。然后我们提出了两种有效的算法。第一个算法PrStack只需扫描一次相关关键字节点，就能找到k个具有k个最高概率的SLCA结果。为了进一步提高效率，我们提出了基于一组剪枝属性的第二种算法EagerTopK，该算法可以快速剪枝不满意的SLCA候选。最后，我们实现了这两种算法，并对它们的性能进行了比较，分析了大量的实验结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2011 IEEE 27th International Conference on Data Engineering

自引率

0.00%

发文量