W. Ding, T. Stepinski, L. Bandeira, R. Vilalta, Youxi Wu, Zhenyu Lu, Tianyu Cao
{"title":"Automatic detection of craters in planetary images: an embedded framework using feature selection and boosting","authors":"W. Ding, T. Stepinski, L. Bandeira, R. Vilalta, Youxi Wu, Zhenyu Lu, Tianyu Cao","doi":"10.1145/1871437.1871534","DOIUrl":"https://doi.org/10.1145/1871437.1871534","url":null,"abstract":"Identifying impact craters on planetary surfaces is one fundamental task in planetary science. In this paper, we present an embedded framework on auto-detection of craters, using feature selection and boosting strategies. The paradigm aims at building a universal and practical crater detector. This methodology addresses three issues that such a tool must possess: (i) it utilizes mathematical morphology to efficiently identify the regions of an image that can potentially contain craters; only those regions, defined as crater candidates, are the subjects of further processing; (ii) it selects Haar-like image texture features in combination with boosting ensemble supervised learning algorithms to accurately classify candidates into craters and non-craters; (iii) it uses transfer learning, at a minimum additional cost, to enable maintaining an accurate auto-detection of craters on new images, having morphology different from what has been captured by the original training set. All three aforementioned components of the detection methodology are discussed, and the entire framework is evaluated on a large test image of 37,500 x 56,250$ m2 on Mars, showing heavily cratered Martian terrain characterized by nonuniform surface morphology. Our study demonstrates that this methodology provides a robust and practical tool for planetary science, in terms of both detection accuracy and efficiency.","PeriodicalId":310611,"journal":{"name":"Proceedings of the 19th ACM international conference on Information and knowledge management","volume":"90 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121446244","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Liang Wang, Reynold Cheng, Sau-dan. Lee, D. Cheung
{"title":"Accelerating probabilistic frequent itemset mining: a model-based approach","authors":"Liang Wang, Reynold Cheng, Sau-dan. Lee, D. Cheung","doi":"10.1145/1871437.1871494","DOIUrl":"https://doi.org/10.1145/1871437.1871494","url":null,"abstract":"Data uncertainty is inherent in emerging applications such as location-based services, sensor monitoring systems, and data integration. To handle a large amount of imprecise information, uncertain databases have been recently developed. In this paper, we study how to efficiently discover frequent itemsets from large uncertain databases, interpreted under the Possible World Semantics. This is technically challenging, since an uncertain database induces an exponential number of possible worlds. To tackle this problem, we propose a novel method to capture the itemset mining process as a Poisson binomial distribution. This model-based approach extracts frequent itemsets with a high degree of accuracy, and supports large databases. We apply our techniques to improve the performance of the algorithms for: (1) finding itemsets whose frequentness probabilities are larger than some threshold; and (2) mining itemsets with the k highest frequentness probabilities. Our approaches support both tuple and attribute uncertainty models, which are commonly used to represent uncertain databases. Extensive evaluation on real and synthetic datasets shows that our methods are highly accurate. Moreover, they are orders of magnitudes faster than previous approaches.","PeriodicalId":310611,"journal":{"name":"Proceedings of the 19th ACM international conference on Information and knowledge management","volume":"33 8","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113933722","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Support elements in graph structured schema reintegration","authors":"Xun Sun, R. Pottinger, Michael K. Lawrence","doi":"10.1145/1871437.1871621","DOIUrl":"https://doi.org/10.1145/1871437.1871621","url":null,"abstract":"Manipulating graph-structured schemas (ontologies, models, etc.) requires the result to remain fully connected. In certain cases, e.g., calculating the difference of two schemas, support structures may be needed in the result. We describe our engine to process support structures in the context of a schema management system and describe schema reintegration experiments which validate the performance and correctness of our system","PeriodicalId":310611,"journal":{"name":"Proceedings of the 19th ACM international conference on Information and knowledge management","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130245678","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ee-Peng Lim, Viet-An Nguyen, Nitin Jindal, B. Liu, Hady W. Lauw
{"title":"Detecting product review spammers using rating behaviors","authors":"Ee-Peng Lim, Viet-An Nguyen, Nitin Jindal, B. Liu, Hady W. Lauw","doi":"10.1145/1871437.1871557","DOIUrl":"https://doi.org/10.1145/1871437.1871557","url":null,"abstract":"This paper aims to detect users generating spam reviews or review spammers. We identify several characteristic behaviors of review spammers and model these behaviors so as to detect the spammers. In particular, we seek to model the following behaviors. First, spammers may target specific products or product groups in order to maximize their impact. Second, they tend to deviate from the other reviewers in their ratings of products. We propose scoring methods to measure the degree of spam for each reviewer and apply them on an Amazon review dataset. We then select a subset of highly suspicious reviewers for further scrutiny by our user evaluators with the help of a web based spammer evaluation software specially developed for user evaluation experiments. Our results show that our proposed ranking and supervised methods are effective in discovering spammers and outperform other baseline method based on helpfulness votes alone. We finally show that the detected spammers have more significant impact on ratings compared with the unhelpful reviewers.","PeriodicalId":310611,"journal":{"name":"Proceedings of the 19th ACM international conference on Information and knowledge management","volume":"202 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131839679","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Index structures for efficiently searching natural language text","authors":"P. Chubak, Davood Rafiei","doi":"10.1145/1871437.1871527","DOIUrl":"https://doi.org/10.1145/1871437.1871527","url":null,"abstract":"Many existing indexes on text work at the document granularity and are not effective in answering the class of queries where the desired answer is only a term or a phrase. In this paper, we study some of the index structures that are capable of answering the class of queries referred to here as wild card queries and perform an analysis of their performance. Our experimental results on a large class of queries from different sources (including query logs and parse trees) and with various datasets reveal some of the performance barriers of these indexes. We then present Word Permuterm Index (WPI) which is an adaptation of the permuterm index for natural language text applications and show that this index supports a wide range of wild card queries, is quick to construct and is highly scalable. Our experimental resultS comparing WPI to alternative methods on a wide range oF wild card queries show a few orders of magnitude performancE improvements for WPI while the memory usage is kept the same for all compared systems.","PeriodicalId":310611,"journal":{"name":"Proceedings of the 19th ACM international conference on Information and knowledge management","volume":"171 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129398045","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Marina Barsky, Alex Thomo, Zoltán Tóth, C. Zuzarte
{"title":"Online update of b-trees","authors":"Marina Barsky, Alex Thomo, Zoltán Tóth, C. Zuzarte","doi":"10.1145/1871437.1871460","DOIUrl":"https://doi.org/10.1145/1871437.1871460","url":null,"abstract":"Many scenarios impose a heavy update load on B-tree indexes in modern databases. A typical case is when B-trees are used for indexing all the keywords of a text field. For example upon the insertion of a new text record (e.g. a new document arrives), a barrage of new keywords has to be inserted into the index causing many random disk I/Os and interrupting the normal operation of the database. The common approach has been to collect the updates in a separate structure and then perform a batch update of the index. This update \"freezes\" the database. Many applications, however, require the immediate availability of the new updates without any interruption of the normal database operation. In this paper we present a novel online B-tree update method based on a new buffering data structure we introduce - Dynamic Bucket Tree (DBT). The DBT-buffer serves as a differential index for new updates. The grouping of keys in DBT-buffer is based on the longest common prefixes (LCP) of their binary representations. The LCP is used as a measure of the locality of keys to be transferred to the main B-tree. Our online update system does not slow down concurrent user transactions or lead to degradation of search performance. Experiments confirm that our DBT buffer can be efficiently used for online updates of text fields. As such it represents an effective solution to the notorious problem of handling updates to an Inverted Index.","PeriodicalId":310611,"journal":{"name":"Proceedings of the 19th ACM international conference on Information and knowledge management","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131064529","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Using various term dependencies according to their utilities","authors":"Lixin Shi, Jian-Yun Nie","doi":"10.1145/1871437.1871655","DOIUrl":"https://doi.org/10.1145/1871437.1871655","url":null,"abstract":"In this paper, we propose a model to integrate term dependencies. Different from previous studies, each pair of terms is assigned a different weight of dependency according to their utility to IR. The experiments show that our model can significantly outperform the previous dependency models using fixed weights.","PeriodicalId":310611,"journal":{"name":"Proceedings of the 19th ACM international conference on Information and knowledge management","volume":"88 4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131220174","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
V. Ganev, Zhaochen Guo, Diego Serrano, Denilson Barbosa, Eleni Stroulia
{"title":"Exploring and visualizing academic social networks","authors":"V. Ganev, Zhaochen Guo, Diego Serrano, Denilson Barbosa, Eleni Stroulia","doi":"10.1145/1871437.1871786","DOIUrl":"https://doi.org/10.1145/1871437.1871786","url":null,"abstract":"We demonstrate the ReaSoN portal, consisting of interactive web-based tools for visualizing, exploring, querying, and integrating academic social networks. We describe how these networks are automatically extracted from bibliographic and citation databases, discuss notions of visibility in such networks which enable a rich set of social network analysis, and demonstrate our novel tools for the visualization and exploration of social networks.","PeriodicalId":310611,"journal":{"name":"Proceedings of the 19th ACM international conference on Information and knowledge management","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132839912","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Selecting keywords for content based recommendation","authors":"Christian Wartena, Wout Slakhorst, M. Wibbels","doi":"10.1145/1871437.1871665","DOIUrl":"https://doi.org/10.1145/1871437.1871665","url":null,"abstract":"The continued growth of online content makes personalized recommendation an increasingly important tool for media consumption. While collaborative filtering techniques have shown to be very successful in stable collections, content based approaches are necessary for recommending new items. Content based recommendation uses the similarity between new items and consumed items to predict whether a new item is interesting for the user. The similarity is computed by comparing the content or the meta-data of the items. In this paper we consider recommendation of TV-broadcasts for which meta-data and synopses are available. We thereby concentrate on the new item problem. We investigate the value of different types of meta-data provided by the broadcaster or extracted from synopsis. We show that extracted keywords are better suited for recommendation than manually assigned keywords. Furthermore we show that the number of keywords used is of great importance. Using a rather small number of keywords to present an item yields the best results for recommendation.","PeriodicalId":310611,"journal":{"name":"Proceedings of the 19th ACM international conference on Information and knowledge management","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132871770","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A topical link model for community discovery in textual interaction graph","authors":"Guoqing Zheng, Jinwen Guo, Lichun Yang, Shengliang Xu, Shenghua Bao, Zhong Su, Dingyi Han, Yong Yu","doi":"10.1145/1871437.1871686","DOIUrl":"https://doi.org/10.1145/1871437.1871686","url":null,"abstract":"This paper is concerned with community discovery in textual interaction graph, where the links between entities are indicated by textual documents. Specifically, we propose a Topical Link Model(TLM), which leverages Hierarchical Dirichlet Process(HDP) to introduce hidden topical variable of the links. Other than the use of links, TLM can look into the documents on the links in detail to recover sound communities. Moreover, TLM is a nonparametric model, which is able to learn the number of communities from the data. Extensive experiments on two real world corpora show TLM outperforms two state-of-the-art baseline models, which verify the effectiveness of TLM in determining the proper number of communities and generating sound communities.","PeriodicalId":310611,"journal":{"name":"Proceedings of the 19th ACM international conference on Information and knowledge management","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132931603","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}