T. A. Bjørklund, Nils Grimsmo, J. Gehrke, Øystein Torbjørnsen
{"title":"Inverted indexes vs. bitmap indexes in decision support systems","authors":"T. A. Bjørklund, Nils Grimsmo, J. Gehrke, Øystein Torbjørnsen","doi":"10.1145/1645953.1646158","DOIUrl":"https://doi.org/10.1145/1645953.1646158","url":null,"abstract":"Bitmap indexes are widely used in Decision Support Systems (DSSs) to improve query performance. In this paper, we evaluate the use of compressed inverted indexes with adapted query processing strategies from Information Retrieval as an alternative. In a thorough experimental evaluation on both synthetic data and data from the Star Schema Benchmark, we show that inverted indexes are more compact than bitmap indexes in almost all cases. This compactness combined with efficient query processing strategies results in inverted indexes outperforming bitmap indexes for most queries, often significantly.","PeriodicalId":286251,"journal":{"name":"Proceedings of the 18th ACM conference on Information and knowledge management","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122118964","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Seok-Ho Yoon, Jung-Hwan Shin, Sang-Wook Kim, Sunju Park
{"title":"Extraction of a latent blog community based on subject","authors":"Seok-Ho Yoon, Jung-Hwan Shin, Sang-Wook Kim, Sunju Park","doi":"10.1145/1645953.1646163","DOIUrl":"https://doi.org/10.1145/1645953.1646163","url":null,"abstract":"In the blogosphere, there exist posts relevant to a particular subject and blogs that show interests in the subject. In this paper, we define a set of such posts and blogs as \"blog community\" and propose a method for extracting the blog community associated with a particular subject. The proposed method is based on the idea that the blogs who have performed actions to the posts of a particular subject are the ones that have interests in the subject, and that the posts which have received actions from such blogs are the ones that contain the subject. The proposed method selects a small number of seed posts that contain the subject. Then, it selects the blogs that perform actions to the seed posts over some threshold and the posts that have received actions over some threshold. By repeating these two steps, it gradually expands the blog community. The experimental results show that the proposed method exhibits a higher level of accuracy than the methods proposed in prior research.","PeriodicalId":286251,"journal":{"name":"Proceedings of the 18th ACM conference on Information and knowledge management","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129520412","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Consistent on-line classification of dbs workload events","authors":"M. Holze, Claas Gaidies, N. Ritter","doi":"10.1145/1645953.1646193","DOIUrl":"https://doi.org/10.1145/1645953.1646193","url":null,"abstract":"An important goal of self-managing databases is the autonomic adaptation of the database configuration to evolving workloads. However, the diversity of SQL statements in real-world workloads typically causes the required analysis overhead to be prohibitive for a continuous workload analysis. The workload classification presented in this paper reduces the workload analysis overhead by grouping similar workload events into classes. Our approach employs clustering techniques based upon a general distance function for DBS workload events. To be applicable for a continuous workload analysis, our workload classification specifically addresses a stream-based, lightweight operation, a controllable loss of quality, and self-management.","PeriodicalId":286251,"journal":{"name":"Proceedings of the 18th ACM conference on Information and knowledge management","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129688968","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"ComprehEnRank: estimating comprehension in classroom by absorbing random walks on a cognitive graph","authors":"Nimit Pattanasri, M. Mukunoki, M. Minoh","doi":"10.1145/1645953.1646226","DOIUrl":"https://doi.org/10.1145/1645953.1646226","url":null,"abstract":"This paper develops a graph-theoretic framework for estimating comprehension in classroom. To deal with imprecise data gathered in classroom, we propose multi-step comprehension propagation over a semantic graph. Random walks on the graph measure students' comprehension with probabilities absorbed at student nodes.","PeriodicalId":286251,"journal":{"name":"Proceedings of the 18th ACM conference on Information and knowledge management","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128236838","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A signal-to-noise approach to score normalization","authors":"A. Arampatzis, J. Kamps","doi":"10.1145/1645953.1646055","DOIUrl":"https://doi.org/10.1145/1645953.1646055","url":null,"abstract":"Score normalization is indispensable in distributed retrieval and fusion or meta-search where merging of result-lists is required. Distributional approaches to score normalization with reference to relevance, such as binary mixture models like the normal-exponential, suffer from lack of universality and troublesome parameter estimation especially under sparse relevance. We develop a new approach which tackles both problems by using aggregate score distributions without reference to relevance, and is suitable for uncooperative engines. The method is based on the assumption that scores produced by engines consist of a signal and a noise component which can both be approximated by submitting well-defined sets of artificial queries to each engine. We evaluate in a standard distributed retrieval testbed and show that the signal-to-noise approach yields better results than other distributional methods. As a significant by-product, we investigate query-length distributions.","PeriodicalId":286251,"journal":{"name":"Proceedings of the 18th ACM conference on Information and knowledge management","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129016184","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Session details: Industry data and query similarity","authors":"Young-Hee Park","doi":"10.1145/3261239","DOIUrl":"https://doi.org/10.1145/3261239","url":null,"abstract":"","PeriodicalId":286251,"journal":{"name":"Proceedings of the 18th ACM conference on Information and knowledge management","volume":"193 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124966718","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bin Zhang, Ming Xie, Jinyan Shao, Wenjun Yin, Jin Dong
{"title":"ROSE: retail outlet site evaluation by learning with both sample and feature preference","authors":"Bin Zhang, Ming Xie, Jinyan Shao, Wenjun Yin, Jin Dong","doi":"10.1145/1645953.1646129","DOIUrl":"https://doi.org/10.1145/1645953.1646129","url":null,"abstract":"It is critical for retail enterprises to select good sites or locations to open their stores, especially in current competitive retail market. However, evaluating the goodness of sites in real business applications is a complex problem. That is, how to judge whether the market around a store site is good? We don't know the exact mechanism of how a site can be good and it is hard to have correct site goodness values as supervised labels. The Retail Outlet Site Evaluation (ROSE) tool is designed to learn the site evaluation model by integrating city geographic & demographic data and two kinds of expert knowledge: sample preference and feature preference. The feature preference information can help greatly reduce the required number of sample preferences. It enables our application practicable because it is almost impossible to give such amount of sample preference pairs manually by experts when ranking hundreds of data points. In the experiment and case study part, we show that the ROSE tool can achieve good results and useful for users to do site evaluation work in real cases.","PeriodicalId":286251,"journal":{"name":"Proceedings of the 18th ACM conference on Information and knowledge management","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130325931","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Label correspondence learning for part-of-speech annotation transformation","authors":"Muhua Zhu, Huizhen Wang, Jingbo Zhu","doi":"10.1145/1645953.1646145","DOIUrl":"https://doi.org/10.1145/1645953.1646145","url":null,"abstract":"The performance of machine learning methods heavily depends on the volume of used training data. For the purpose of dataset enlargement, it is of interest to study the problem of unifying multiple labeled datasets with different annotation standards. In this paper, we focus on the case of unifying datasets for sequence labeling problems with natural language part-of-speech (POS) tagging as an examplar application. To this end, we propose a probabilistic approach to transforming the annotations of one dataset to the standard specified by another dataset. The key component of the approach, named as label correspondence learning, serves as a bridge of annotations from the datasets. Two methods designed from distinct perspectives are proposed to attack this sub-problem. Experiments on two large-scale part-of-speech datasets demonstrate the efficacy of the transformation and label correspondence learning methods.","PeriodicalId":286251,"journal":{"name":"Proceedings of the 18th ACM conference on Information and knowledge management","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129290862","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"RSS watchdog: an instant event monitor on real online news streams","authors":"Chih-Lin Hu, C. Chou","doi":"10.1145/1645953.1646321","DOIUrl":"https://doi.org/10.1145/1645953.1646321","url":null,"abstract":"This paper introduces the RSS Watchdog system, which is capable of news clustering and instant event monitoring over multiple real and online RSS news streams. We briefly mention software architecture design, technical implementation, and prototype demonstration. In addition, the results of real case studies are presented to notice the RSS Watchdog's functionality","PeriodicalId":286251,"journal":{"name":"Proceedings of the 18th ACM conference on Information and knowledge management","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123682002","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Data extraction from the web using wild card queries","authors":"Davood Rafiei, Haobin Li","doi":"10.1145/1645953.1646270","DOIUrl":"https://doi.org/10.1145/1645953.1646270","url":null,"abstract":"This paper presents an overview of our work for searching and retrieving facts and relationships within natural language text sources. In this work, an extraction task over a text collection is expressed as a query that combines text fragments with wild cards, and the query result is a set of facts in the form of unary, binary and general n-ary tuples. Despite being both simple and declarative, the framework can be applied to a wide range of extraction tasks. This paper presents an overview of the work and its various components. We also report some of our experiments and an evaluation of the proposed querying framework in extracting relevant information to a task.","PeriodicalId":286251,"journal":{"name":"Proceedings of the 18th ACM conference on Information and knowledge management","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114535993","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}