Takeshi S. Kobayakawa, T. Kumano, Hideki Tanaka, Naoaki Okazaki, Jin-Dong Kim, Junichi Tsujii
{"title":"Opinion classification with tree kernel SVM using linguistic modality analysis","authors":"Takeshi S. Kobayakawa, T. Kumano, Hideki Tanaka, Naoaki Okazaki, Jin-Dong Kim, Junichi Tsujii","doi":"10.1145/1645953.1646231","DOIUrl":"https://doi.org/10.1145/1645953.1646231","url":null,"abstract":"We propose a method for classifying opinions which captures the role of linguistic modalities in the sentence. We use features than simple bag-of-words or opinion-holding predicates. The method is based on a machine learning and utilizes opinion-holding predicates and linguistic modalities as features. Two different detectors help to classify the opinions: the opinion-holding predicate detector and the modality detector. An opinion in the target is first parsed into a dependency structure, and then the opinion-holding predicates and modalities stick onto the leaf nodes of the dependency tree. The whole tree is regarded as input features of the opinion, and it becomes the input of tree kernel support vector machines. We have applied method to opinions in Japanese about television programs, and have confirmed the effectiveness of the method against conventional bag-of-words features, or against simple opinion-holding predicates features","PeriodicalId":286251,"journal":{"name":"Proceedings of the 18th ACM conference on Information and knowledge management","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122387713","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The influence of the document ranking in expert search","authors":"C. Macdonald, I. Ounis","doi":"10.1145/1645953.1646282","DOIUrl":"https://doi.org/10.1145/1645953.1646282","url":null,"abstract":"The retrieval effectiveness of the underlying document search component of an expert search engine can have an important impact on the effectiveness of the generated expert search results. In this large-scale study, we perform novel experiments in the context of the document search and expert search tasks of the TREC Enterprise track, to measure the influence that the performance of the document ranking has on the ranking of candidate experts. In particular, we show, using real and simulated document rankings, that while the expert search system performance is related to the relevance of the retrieved documents, surprisingly, it is not always the case that increasing document search effectiveness causes an increase in expert search performance.","PeriodicalId":286251,"journal":{"name":"Proceedings of the 18th ACM conference on Information and knowledge management","volume":"102 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122547529","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Efficient processing of group-oriented connection queries in a large graph","authors":"James Cheng, Yiping Ke, Wilfred Ng","doi":"10.1145/1645953.1646151","DOIUrl":"https://doi.org/10.1145/1645953.1646151","url":null,"abstract":"We study query processing in large graphs that are fundamental data model underpinning various social networks and Web structures. Given a set of query nodes, we aim to find the groups which the query nodes belong to, as well as the best connection among the groups. Such a query is useful to many applications but the query processing is extremely costly. We define a new notion of Correlation Group (CG), which is a set of nodes that are strongly correlated in a large graph G. We then extract the subgraph from G that gives the best connection for the nodes in a CG. To facilitate query processing, we develop an efficient index built upon the CGs. Our experiments show that the CGs are meaningful as groups and importantly, the meaningfulness of the query results are justifiable. We also demonstrate the high efficiency of CG computation, index construction and query processing.","PeriodicalId":286251,"journal":{"name":"Proceedings of the 18th ACM conference on Information and knowledge management","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122746409","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Easiest-first search: towards comprehension-based web search","authors":"Makoto Nakatani, A. Jatowt, Katsumi Tanaka","doi":"10.1145/1645953.1646300","DOIUrl":"https://doi.org/10.1145/1645953.1646300","url":null,"abstract":"Although Web search engines have become information gateways to the Internet, for queries containing technical terms, search results often contain pages that are difficult to be understood by non-expert users. Therefore, re-ranking search results in a descending order of their comprehensibility should be effective for non-expert users. In our approach, the comprehensibility of Web pages is estimated considering both the document readability and the difficulty of technical terms in the domain of search queries. To extract technical terms, we exploit the domain knowledge extracted from Wikipedia. Our proposed method can be applied to general Web search engines as Wikipedia includes nearly every field of human knowledge. We demonstrate the usefulness of our approach by user experiments.","PeriodicalId":286251,"journal":{"name":"Proceedings of the 18th ACM conference on Information and knowledge management","volume":"527 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122828641","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Topic and keyword re-ranking for LDA-based topic modeling","authors":"Yangqiu Song, Shimei Pan, Shixia Liu, Michelle X. Zhou, Weihong Qian","doi":"10.1145/1645953.1646223","DOIUrl":"https://doi.org/10.1145/1645953.1646223","url":null,"abstract":"Topic-based text summaries promise to help average users quickly understand a text collection and derive insights. Recent research has shown that the Latent Dirichlet Allocation (LDA) model is one of the most effective approaches to topic analysis. However, the LDA-based results may not be ideal for human understanding and consumption. In this paper, we present several topic and keyword re-ranking approaches that can help users better understand and consume the LDA-derived topics in their text analysis. Our methods process the LDA output based on a set of criteria that model a user's information needs. Our evaluation demonstrates the usefulness of the methods in summarizing several large-scale, real world data sets.","PeriodicalId":286251,"journal":{"name":"Proceedings of the 18th ACM conference on Information and knowledge management","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122874000","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Analyzing and evaluating query reformulation strategies in web search logs","authors":"Jeff Huang, E. Efthimiadis","doi":"10.1145/1645953.1645966","DOIUrl":"https://doi.org/10.1145/1645953.1645966","url":null,"abstract":"Users frequently modify a previous search query in hope of retrieving better results. These modifications are called query reformulations or query refinements. Existing research has studied how web search engines can propose reformulations, but has given less attention to how people perform query reformulations. In this paper, we aim to better understand how web searchers refine queries and form a theoretical foundation for query reformulation. We study users' reformulation strategies in the context of the AOL query logs. We create a taxonomy of query refinement strategies and build a high precision rule-based classifier to detect each type of reformulation. Effectiveness of reformulations is measured using user click behavior. Most reformulation strategies result in some benefit to the user. Certain strategies like add/remove words, word substitution, acronym expansion, and spelling correction are more likely to cause clicks, especially on higher ranked results. In contrast, users often click the same result as their previous query or select no results when forming acronyms and reordering words. Perhaps the most surprising finding is that some reformulations are better suited to helping users when the current results are already fruitful, while other reformulations are more effective when the results are lacking. Our findings inform the design of applications that can assist searchers; examples are described in this paper.","PeriodicalId":286251,"journal":{"name":"Proceedings of the 18th ACM conference on Information and knowledge management","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121638761","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Subspace maximum margin clustering","authors":"Quanquan Gu, Jie Zhou","doi":"10.1145/1645953.1646122","DOIUrl":"https://doi.org/10.1145/1645953.1646122","url":null,"abstract":"In text mining, we are often confronted with very high dimensional data. Clustering with high dimensional data is a challenging problem due to the curse of dimensionality. In this paper, to address this problem, we propose an subspace maximum margin clustering (SMMC) method, which performs dimensionality reduction and maximum margin clustering simultaneously within a unified framework. We aim to learn a subspace, in which we try to find a cluster assignment of the data points, together with a hyperplane classifier, such that the resultant margin is maximized among all possible cluster assignments and all possible subspaces. The original problem is transformed from learning the subspace to learning a positive semi-definite matrix, in order to avoid tuning the dimensionality of the subspace. The transformed problem can be solved efficiently via cutting plane technique and constrained concave-convex procedure (CCCP). Since the sub-problem in each iteration of CCCP is joint convex, alternating minimization is adopted to obtain the global optimum. Experiments on benchmark data sets illustrate that the proposed method outperforms the state of the art clustering methods as well as many dimensionality reduction based clustering approaches.","PeriodicalId":286251,"journal":{"name":"Proceedings of the 18th ACM conference on Information and knowledge management","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133184421","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
T. Anastasakos, D. Hillard, Sanjay Kshetramade, Hema Raghavan
{"title":"A collaborative filtering approach to ad recommendation using the query-ad click graph","authors":"T. Anastasakos, D. Hillard, Sanjay Kshetramade, Hema Raghavan","doi":"10.1145/1645953.1646267","DOIUrl":"https://doi.org/10.1145/1645953.1646267","url":null,"abstract":"Search engine logs contain a large amount of click-through data that can be leveraged as soft indicators of relevance. In this paper we address the sponsored search retrieval problem which is to find and rank relevant ads to a search query. We propose a new technique to determine the relevance of an ad document for a search query using click-through data. The method builds on a collaborative filtering approach to discover new ads related to a query using a click graph. It is implemented on a graph with several million edges and scales to larger sizes easily. The proposed method is compared to three different baselines that are state-of-the-art for a commercial search engine. Evaluations on editorial data indicate that the model discovers many new ads not retrieved by the baseline methods. The ads from the new approach are on average of better quality than the baselines.","PeriodicalId":286251,"journal":{"name":"Proceedings of the 18th ACM conference on Information and knowledge management","volume":"151 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133233636","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Exploiting internal and external semantics for the clustering of short texts using world knowledge","authors":"Xia Hu, Nan Sun, Chao Zhang, Tat-Seng Chua","doi":"10.1145/1645953.1646071","DOIUrl":"https://doi.org/10.1145/1645953.1646071","url":null,"abstract":"Clustering of short texts, such as snippets, presents great challenges in existing aggregated search techniques due to the problem of data sparseness and the complex semantics of natural language. As short texts do not provide sufficient term occurring information, traditional text representation methods, such as ``bag of words\" model, have several limitations when directly applied to short texts tasks. In this paper, we propose a novel framework to improve the performance of short texts clustering by exploiting the internal semantics from original text and external concepts from world knowledge. The proposed method employs a hierarchical three-level structure to tackle the data sparsity problem of original short texts and reconstruct the corresponding feature space with the integration of multiple semantic knowledge bases -- Wikipedia and WordNet. Empirical evaluation with Reuters and real web dataset demonstrates that our approach is able to achieve significant improvement as compared to the state-of-the-art methods.","PeriodicalId":286251,"journal":{"name":"Proceedings of the 18th ACM conference on Information and knowledge management","volume":"170 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133937595","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Similarity-aware indexing for real-time entity resolution","authors":"P. Christen, Ross W. Gayler, D. Hawking","doi":"10.1145/1645953.1646173","DOIUrl":"https://doi.org/10.1145/1645953.1646173","url":null,"abstract":"Entity resolution, also known as data matching or record linkage, is the task of identifying and matching records from several databases that refer to the same entities. Traditionally, entity resolution has been applied in batch-mode and on static databases. However, many organisations are increasingly faced with the challenge of having large databases containing entities that need to be matched in real-time with a stream of query records also containing entities, such that the best matching records are retrieved. Example applications include online law enforcement and national security databases, public health surveillance and emergency response systems, financial verification systems, online retail stores, eGovernment services, and digital libraries. A novel inverted index based approach for real-time entity resolution is presented in this paper. At build time, similarities between attribute values are computed and stored to support the fast matching of records at query time. The presented approach differs from other approaches to approximate query matching in that it allows any similarity comparison function, and any 'blocking' (encoding) function, both possibly domain specific, to be incorporated. Experimental results on a real-world database indicate that the total size of all data structures of this novel index approach grows sub-linearly with the size of the database, and that it allows matching of query records in sub-second time, more than two orders of magnitude faster than a traditional entity resolution index approach. The interested reader is referred to the longer version of this paper [5].","PeriodicalId":286251,"journal":{"name":"Proceedings of the 18th ACM conference on Information and knowledge management","volume":"353 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134228425","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}