TEXplorer: keyword-based object search and exploration in multidimensional text databases

Bo Zhao, C. Lin, Bolin Ding, Jiawei Han
{"title":"TEXplorer: keyword-based object search and exploration in multidimensional text databases","authors":"Bo Zhao, C. Lin, Bolin Ding, Jiawei Han","doi":"10.1145/2063576.2063822","DOIUrl":null,"url":null,"abstract":"We propose a novel system TEXplorer that integrates keyword-based object ranking with the aggregation and exploration power of OLAP in a text database with rich structured attributes available, e.g., a product review database. TEXplorer can be implemented within a multi-dimensional text database, where each row is associated with structural dimensions (attributes) and text data (e.g., a document). The system utilizes the text cube data model, where a cell aggregates a set of documents with matching values in a subset of dimensions. Cells in a text cube capture different levels of summarization of the documents, and can represent objects at different conceptual levels.\n Users query the system by submitting a set of keywords. Instead of returning a ranked list of all the cells, we propose a keyword-based interactive exploration framework that could offer flexible OLAP navigational guides and help users identify the levels and objects they are interested in. A novel significance measure of dimensions is proposed based on the distribution of IR relevance of cells. During each interaction stage, dimensions are ranked according to their significance scores to guide drilling down; and cells in the same cuboids are ranked according to their relevance to guide exploration. We propose efficient algorithms and materialization strategies for ranking top-k dimensions and cells. Finally, extensive experiments on real datasets demonstrate the efficiency and effectiveness of our approach.","PeriodicalId":74507,"journal":{"name":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","volume":"26 1","pages":"1709-1718"},"PeriodicalIF":0.0000,"publicationDate":"2011-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"23","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2063576.2063822","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 23

Abstract

We propose a novel system TEXplorer that integrates keyword-based object ranking with the aggregation and exploration power of OLAP in a text database with rich structured attributes available, e.g., a product review database. TEXplorer can be implemented within a multi-dimensional text database, where each row is associated with structural dimensions (attributes) and text data (e.g., a document). The system utilizes the text cube data model, where a cell aggregates a set of documents with matching values in a subset of dimensions. Cells in a text cube capture different levels of summarization of the documents, and can represent objects at different conceptual levels. Users query the system by submitting a set of keywords. Instead of returning a ranked list of all the cells, we propose a keyword-based interactive exploration framework that could offer flexible OLAP navigational guides and help users identify the levels and objects they are interested in. A novel significance measure of dimensions is proposed based on the distribution of IR relevance of cells. During each interaction stage, dimensions are ranked according to their significance scores to guide drilling down; and cells in the same cuboids are ranked according to their relevance to guide exploration. We propose efficient algorithms and materialization strategies for ranking top-k dimensions and cells. Finally, extensive experiments on real datasets demonstrate the efficiency and effectiveness of our approach.
多维文本数据库中基于关键字的对象搜索和探索
我们提出了一种新的系统TEXplorer,该系统将基于关键字的对象排序与OLAP在具有丰富结构化属性的文本数据库(例如产品评论数据库)中的聚合和探索能力相结合。TEXplorer可以在多维文本数据库中实现,其中每一行都与结构维度(属性)和文本数据(例如,文档)相关联。该系统利用文本多维数据集数据模型,其中单元格聚集一组在维度子集中具有匹配值的文档。文本多维数据集中的单元格捕获文档的不同级别的摘要,并且可以表示不同概念级别的对象。用户通过提交一组关键字查询系统。我们提出了一个基于关键字的交互式探索框架,它可以提供灵活的OLAP导航指南,并帮助用户识别他们感兴趣的级别和对象,而不是返回所有单元的排名列表。提出了一种基于细胞红外相关性分布的显著性维度度量方法。在每个交互阶段,根据各维度的显著性得分对其进行排序,以指导向下钻取;同一长方体中的细胞根据它们的相关性进行排序,以指导探索。我们提出了对top-k维和单元排序的有效算法和物化策略。最后,在实际数据集上的大量实验证明了我们的方法的效率和有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信