{"title":"i-SEE: integrated stream execution environment over on-line data streams","authors":"S. Shin, H. Park, Ho Jin Woo, W. Lee","doi":"10.1145/1871437.1871784","DOIUrl":"https://doi.org/10.1145/1871437.1871784","url":null,"abstract":"The primary objective of various data stream applications is to monitor the on-going variations of data elements newly generated in data streams, so that appropriate services can be delivered to users timely. Basically, such monitoring activities can be divided into three categories: instant event monitoring, frequent behavior monitoring and analytic OLAP measures monitoring. Several prototype systems for data streams have been introduced but they are designated to support only the operations of one or two categories. However, many real-world data stream applications often require mixed operations of the three categories. This demonstration introduces an Integrated Stream Execution Environment (i-SEE) that can support the mixed execution of the operations of the three categories seamlessly, so that an i-SEE system can provide the multifaceted analytic results of on-line data streams continuously for various emerging applications.","PeriodicalId":310611,"journal":{"name":"Proceedings of the 19th ACM international conference on Information and knowledge management","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128789412","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Einat Minkov, B. Charrow, J. Ledlie, S. Teller, T. Jaakkola
{"title":"Collaborative future event recommendation","authors":"Einat Minkov, B. Charrow, J. Ledlie, S. Teller, T. Jaakkola","doi":"10.1145/1871437.1871542","DOIUrl":"https://doi.org/10.1145/1871437.1871542","url":null,"abstract":"We demonstrate a method for collaborative ranking of future events. Previous work on recommender systems typically relies on feedback on a particular item, such as a movie, and generalizes this to other items or other people. In contrast, we examine a setting where no feedback exists on the particular item. Because direct feedback does not exist for events that have not taken place, we recommend them based on individuals' preferences for past events, combined collaboratively with other peoples' likes and dislikes. We examine the topic of unseen item recommendation through a user study of academic (scientific) talk recommendation, where we aim to correctly estimate a ranking function for each user, predicting which talks would be of most interest to them. Then by decomposing user parameters into shared and individual dimensions, we induce a similarity metric between users based on the degree to which they share these dimensions. We show that the collaborative ranking predictions of future events are more effective than pure content-based recommendation. Finally, to further reduce the need for explicit user feedback, we suggest an active learning approach for eliciting feedback and a method for incorporating available implicit user cues.","PeriodicalId":310611,"journal":{"name":"Proceedings of the 19th ACM international conference on Information and knowledge management","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125598584","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Yet another write-optimized DBMS layer for flash-based solid state storage","authors":"Hongchan Roh, D. Lee, Sanghyun Park","doi":"10.1145/1871437.1871617","DOIUrl":"https://doi.org/10.1145/1871437.1871617","url":null,"abstract":"Flash-based Solid State Storage (flashSSS) has write-oriented problems such as low write throughput, and limited life-time. Especially, flashSSDs have a characteristic vulnerable to random-writes, due to its control logic utilizing parallelism between the flash memory chips. In this paper, we present a write-optimized layer of DBMSs to address the write-oriented problems of flashSSS in on-line transaction processing environments. The layer consists of a write-optimized buffer, a corresponding log space, and an in-memory mapping table, closely associated with a novel logging scheme called InCremental Logging (ICL). The ICL scheme enables DBMSs to reduce page-writes at the least expense of additional page-reads, while replacing random-writes into sequential-writes. Through experiments, our approach demonstrated up-to an order of magnitude performance enhancement in I/O processing time compared to the original DBMS, increasing the longevity of flashSSS by approximately a factor of two.","PeriodicalId":310611,"journal":{"name":"Proceedings of the 19th ACM international conference on Information and knowledge management","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126234583","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A new mathematics retrieval system","authors":"Shahab Kamali, Frank Wm. Tompa","doi":"10.1145/1871437.1871635","DOIUrl":"https://doi.org/10.1145/1871437.1871635","url":null,"abstract":"The Web contains a large collection of documents, some with mathematical expressions. Because mathematical expressions are objects with complex structures and rather few distinct symbols, conventional text retrieval systems are not very successful in mathematics retrieval. The lack of a definition for similarity between mathematical expressions, and the inadequacy of searching for exact matches only, makes the problem of mathematics retrieval even harder. As a result, the few existing mathematics retrieval systems are not very helpful in addressing users' needs. We propose a powerful query language for mathematical expressions that augments exact matching with approximate matching, but in a way that is controlled by the user. We also introduce a novel indexing scheme that scales well for large collections of expressions. Based on this indexing scheme, an efficient lookup algorithm is proposed.","PeriodicalId":310611,"journal":{"name":"Proceedings of the 19th ACM international conference on Information and knowledge management","volume":"7 4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114126030","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A comparison of user and system query performance predictions","authors":"C. Hauff, D. Kelly, L. Azzopardi","doi":"10.1145/1871437.1871562","DOIUrl":"https://doi.org/10.1145/1871437.1871562","url":null,"abstract":"Query performance prediction methods are usually applied to estimate the retrieval effectiveness of queries, where the evaluation is largely system sided. However, little work has been conducted to understand query performance prediction from the user's perspective. The question we consider is, whether the predictions of query performance that systems make are in line with the predictions that users make. To this aim, we compare the performance ratings users assign to queries with the performance scores estimated by a range of pre-retrieval and post-retrieval query performance predictors. Two studies are presented that explore the relationship between user ratings and system predictions on two levels: (i) the topic level, and, (ii) the query suggestions level. It is shown that when predicting the performance of query suggestions, user ratings were mostly uncorrelated with system predictions. At the topic level though, where a single query is judged for each information need, we observed moderate correlations between user ratings and a subset of system predictions. As query performance prediction methods are often based on intuitions of how users might rate queries, these findings suggest that such methods are not representative of how users actually rate query suggestions and topics. This motivates further research into understanding the rating process engaged by users, and developing models of query performance prediction in order to bridge the divide between systems and users.","PeriodicalId":310611,"journal":{"name":"Proceedings of the 19th ACM international conference on Information and knowledge management","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121159060","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Markus Bundschus, A. Bauer-Mehren, Volker Tresp, L. Furlong, H. Kriegel
{"title":"Digging for knowledge with information extraction: a case study on human gene-disease associations","authors":"Markus Bundschus, A. Bauer-Mehren, Volker Tresp, L. Furlong, H. Kriegel","doi":"10.1145/1871437.1871744","DOIUrl":"https://doi.org/10.1145/1871437.1871744","url":null,"abstract":"We present the information extraction system Text2SemRel. The system (semi-) automatically constructs knowledge bases from textual data consisting of facts about entities using semantic relations. An integral part of the system is a graph-based interactive visualization and search layer. The second contribution in this paper is the presentation of a case study on the (semi-) automatic construction of a knowledge base consisting of gene-disease associations. The resulting knowledge base, the Literature-derived Human Gene-Disease Network (LHGDN), is now an integral part of the Linked Life Data initiative and represents currently the largest publicly available gene-disease repository. The LHGDN is compared against several curated state of the art databases. A unique feature of the LHGDN is that the semantics of the associations constitute a wide variety of biomolecular conditions.","PeriodicalId":310611,"journal":{"name":"Proceedings of the 19th ACM international conference on Information and knowledge management","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115273731","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Document allocation policies for selective searching of distributed indexes","authors":"Anagha Kulkarni, Jamie Callan","doi":"10.1145/1871437.1871497","DOIUrl":"https://doi.org/10.1145/1871437.1871497","url":null,"abstract":"Indexes for large collections are often divided into shards that are distributed across multiple computers and searched in parallel to provide rapid interactive search. Typically, all index shards are searched for each query. For organizations with modest computational resources the high query processing cost incurred in this exhaustive search setup can be a deterrent to working with large collections. This paper investigates document allocation policies that permit searching only a few shards for each query (selective search) without sacrificing search accuracy. Random, source-based and topic-based document-to-shard allocation policies are studied in the context of selective search. A thorough study of the tradeoff between search cost and search accuracy in a sharded index environment is performed using three large TREC collections. The experimental results demonstrate that selective search using topic-based shards cuts the search cost to less than 1/5th of that of the exhaustive search without reducing search accuracy across all the three datasets. Stability analysis shows that 90% of the queries do as well or improve with selective search. An overlap-based evaluation with an additional 1000 queries for each dataset tests and confirms the conclusions drawn using the smaller TREC query sets.","PeriodicalId":310611,"journal":{"name":"Proceedings of the 19th ACM international conference on Information and knowledge management","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122918958","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Daifeng Li, Bing He, Ying Ding, Jie Tang, Cassidy R. Sugimoto, Zheng Qin, E. Yan, Juan-Zi Li, Tianxi Dong
{"title":"Community-based topic modeling for social tagging","authors":"Daifeng Li, Bing He, Ying Ding, Jie Tang, Cassidy R. Sugimoto, Zheng Qin, E. Yan, Juan-Zi Li, Tianxi Dong","doi":"10.1145/1871437.1871673","DOIUrl":"https://doi.org/10.1145/1871437.1871673","url":null,"abstract":"Exploring community is fundamental for uncovering the connections between structure and function of complex networks and for practical applications in many disciplines such as biology and sociology. In this paper, we propose a TTR-LDA-Community model which combines the Latent Dirichlet Allocation model (LDA) and the Girvan-Newman community detection algorithm with an inference mechanism. The model is then applied to data from Delicious, a popular social tagging system, over the time period of 2005-2008. Our results show that 1) users in the same community tend to be interested in similar set of topics in all time periods; and 2) topics may divide into several sub-topics and scatter into different communities over time. We evaluate the effectiveness of our model and show that the TTR-LDA-Community model is meaningful for understanding communities and outperforms TTR-LDA and LDA models in tag prediction.","PeriodicalId":310611,"journal":{"name":"Proceedings of the 19th ACM international conference on Information and knowledge management","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131358211","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Eduard Constantin Dragut, Clement T. Yu, A. Sistla, W. Meng
{"title":"Construction of a sentimental word dictionary","authors":"Eduard Constantin Dragut, Clement T. Yu, A. Sistla, W. Meng","doi":"10.1145/1871437.1871723","DOIUrl":"https://doi.org/10.1145/1871437.1871723","url":null,"abstract":"The Web has plenty of reviews, comments and reports about products, services, government policies, institutions, etc. The opinions expressed in these reviews influence how people regard these entities. For example, a product with consistently good reviews is likely to sell well, while a product with numerous bad reviews is likely to sell poorly. Our aim is to build a sentimental word dictionary, which is larger than existing sentimental word dictionaries and has high accuracy. We introduce rules for deduction, which take words with known polarities as input and produce synsets (a set of synonyms with a definition) with polarities. The synsets with deduced polarities can then be used to further deduce the polarities of other words. Experimental results show that for a given sentimental word dictionary with D words, approximately an additional 50% of D words with polarities can be deduced. An experiment is conducted to find the accuracy of a random sample of the deduced words. It is found that the accuracy is about the same as that of comparing the judgment of one human with that of another.","PeriodicalId":310611,"journal":{"name":"Proceedings of the 19th ACM international conference on Information and knowledge management","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115162784","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Anonymizing data with quasi-sensitive attribute values","authors":"Pu Shi, Li Xiong, B. Fung","doi":"10.1145/1871437.1871628","DOIUrl":"https://doi.org/10.1145/1871437.1871628","url":null,"abstract":"We study the problem of anonymizing data with quasi-sensitive attributes. Quasi-sensitive attributes are not sensitive by themselves, but certain values or their combinations may be linked to external knowledge to reveal indirect sensitive information of an individual. We formalize the notion of l-diversity and t-closeness for quasi-sensitive attributes, which we call QS l-diversity and QS t-closeness, to prevent indirect sensitive attribute disclosure. We propose a two-phase anonymization algorithm that combines quasi-identifying value generalization and quasi-sensitive value suppression to achieve QS l-diversity and QS t-closeness.","PeriodicalId":310611,"journal":{"name":"Proceedings of the 19th ACM international conference on Information and knowledge management","volume":"359 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133323331","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}