{"title":"Mr.KNN: soft relevance for multi-label classification","authors":"Xiaotong Lin, Xue-wen Chen","doi":"10.1145/1871437.1871485","DOIUrl":"https://doi.org/10.1145/1871437.1871485","url":null,"abstract":"Multi-label classification refers to learning tasks with each instance belonging to one or more classes simultaneously. It arose from real-world applications such as information retrieval, text categorization and functional genomics. Currently, most of the multi-label learning methods use the strategy called binary relevance, which constructs a classifier for each unique label by grouping data into positives (examples with this label) and negatives (examples without this label). With binary relevance, an example with multiple labels is considered as a positive data for each label it belongs to. For some classes, this data point may behave like an outlier confusing classifiers, especially in the cases of well-separated classes. In this paper, we first introduce a new strategy called soft relevance, where each multi-label example is assigned a relevance score to the labels it belongs to. This soft relevance is then employed in a voting function used in a k nearest neighbor classifier. Furthermore, a voting-margin ratio is introduced to the k nearest neighbor classifier for better performance. We compare the proposed method to other multi-label learning methods over three multi-label datasets and demonstrate that the proposed method provides an effective way to multi-label learning.","PeriodicalId":310611,"journal":{"name":"Proceedings of the 19th ACM international conference on Information and knowledge management","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133511787","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Identifying hotspots on the real-time web","authors":"K. Kamath, James Caverlee","doi":"10.1145/1871437.1871742","DOIUrl":"https://doi.org/10.1145/1871437.1871742","url":null,"abstract":"We study the problem of automatically identifying ``hotspots'' on the real-time web. Concretely, we propose to identify highly-dynamic ad-hoc collections of users -- what we refer to as crowds -- in massive social messaging systems like Twitter and Facebook. The proposed approach relies on a message-based communication clustering approach over time-evolving graphs that captures the natural conversational nature of social messaging systems. One of the salient features of the proposed approach is an efficient locality-based clustering approach for identifying crowds of users in near real-time compared to more heavyweight static clustering algorithms. Based on a three month snapshot of Twitter consisting of 711,612 users and 61.3 million messages, we show how the proposed approach can efficiently and effectively identify Twitter-based crowds relative to static graph clustering techniques at a fraction of the computational cost.","PeriodicalId":310611,"journal":{"name":"Proceedings of the 19th ACM international conference on Information and knowledge management","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133764610","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"XML schema computations: schema compatibility testing and subschema extraction","authors":"Thomas Y. T. Lee, D. Cheung","doi":"10.1145/1871437.1871545","DOIUrl":"https://doi.org/10.1145/1871437.1871545","url":null,"abstract":"In this paper, we propose new models and algorithms to perform practical computations on W3C XML Schemas, which are schema minimization, schema equivalence testing, subschema testing and subschema extraction. We have conducted experiments on an e-commerce standard XSD called xCBL to demonstrate the effectiveness of our algorithms. One experiment has refuted the claim that the xCBL 3.5 XSD is compatible with the xCBL 3.0 XSD. Another experiment has shown that the xCBL XSDs can be effectively trimmed into small subschemas for specific applications, which has significantly reduced schema processing time.","PeriodicalId":310611,"journal":{"name":"Proceedings of the 19th ACM international conference on Information and knowledge management","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134182606","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Network growth and the spectral evolution model","authors":"Jérôme Kunegis, D. Fay, C. Bauckhage","doi":"10.1145/1871437.1871533","DOIUrl":"https://doi.org/10.1145/1871437.1871533","url":null,"abstract":"We introduce and study the spectral evolution model, which characterizes the growth of large networks in terms of the eigenvalue decomposition of their adjacency matrices: In large networks, changes over time result in a change of a graph's spectrum, leaving the eigenvectors unchanged. We validate this hypothesis for several large social, collaboration, authorship, rating, citation, communication and tagging networks, covering unipartite, bipartite, signed and unsigned graphs. Following these observations, we introduce a link prediction algorithm based on the extrapolation of a network's spectral evolution. This new link prediction method generalizes several common graph kernels that can be expressed as spectral transformations. In contrast to these graph kernels, the spectral extrapolation algorithm does not make assumptions about specific growth patterns beyond the spectral evolution model. We thus show that it performs particularly well for networks with irregular, but spectral, growth patterns.","PeriodicalId":310611,"journal":{"name":"Proceedings of the 19th ACM international conference on Information and knowledge management","volume":"582 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134477460","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mohamed E. Khalefa, M. Mokbel, Justin J. Levandoski
{"title":"Skyline query processing for uncertain data","authors":"Mohamed E. Khalefa, M. Mokbel, Justin J. Levandoski","doi":"10.1145/1871437.1871604","DOIUrl":"https://doi.org/10.1145/1871437.1871604","url":null,"abstract":"Recently, several research efforts have addressed answering skyline queries efficiently over large datasets. However, this research lacks methods to compute these queries over uncertain data, where uncertain values are represented as a range. In this paper, we define skyline queries over continuous uncertain data, and propose a novel, efficient framework to answer these queries. Query answers are probabilistic, where each object is associated with a probability value of being a query answer. Typically, users specify a probability threshold, that each returned object must exceed, and a tolerance value that defines the allowed error margin in probability calculation to reduce the computational overhead. Our framework employs an efficient two-phase query processing algorithm.","PeriodicalId":310611,"journal":{"name":"Proceedings of the 19th ACM international conference on Information and knowledge management","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130392642","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Manifold ranking with sink points for update summarization","authors":"Pan Du, J. Guo, Jin Zhang, Xueqi Cheng","doi":"10.1145/1871437.1871722","DOIUrl":"https://doi.org/10.1145/1871437.1871722","url":null,"abstract":"Update summarization aims to create a summary over a topic-related multi-document dataset based on the assumption that the user has already read a set of earlier documents of the same topic. Beyond the problems (i.e., topic relevance, salience, and diversity in extracted information) tackled by topic-focused multi-document summarization, the update summarization must address the novelty problem as well. In this paper, we propose a novel extractive approach based on manifold ranking with sink points for update summarization. Specifically, our approach leverages a manifold ranking process over the sentence manifold to find topic relevant and salient sentences. More important, by introducing the sink points into sentence manifold, the ranking process can further capture the novelty and diversity based on the intrinsic sentence manifold. Therefore, we are able to address the four challenging problems above for update summarization in a unified way. Experiments on benchmarks of TAC are performed and the evaluation results show that our approach can achieve comparative performance to the existing best performing systems in TAC tasks.","PeriodicalId":310611,"journal":{"name":"Proceedings of the 19th ACM international conference on Information and knowledge management","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125656499","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Taesup Moon, G. Dupret, Shihao Ji, Ciya Liao, Zhaohui Zheng
{"title":"User behavior driven ranking without editorial judgments","authors":"Taesup Moon, G. Dupret, Shihao Ji, Ciya Liao, Zhaohui Zheng","doi":"10.1145/1871437.1871650","DOIUrl":"https://doi.org/10.1145/1871437.1871650","url":null,"abstract":"We explore the potential of using users click-through logs where no editorial judgment is available to improve the ranking function of a vertical search engine. We base our analysis on the Cumulate Relevance Model, a user behavior model recently proposed as a way to extract relevance signal from click-through logs. We propose a novel way of directly learning the ranking function, effectively by-passing the need to have explicit editorial relevance label for each query-document pair. This approach potentially adjusts more closely the ranking function to a variety of user behaviors both at the individual and at the aggregate levels. We investigate two ways of using behavioral model; First, we consider the parametric approach where we learn the estimates of document relevance and use them as targets for the machine learned ranking schemes. In the second, functional approach, we learn a function that maximizes the behavioral model likelihood, effectively by-passing the need to estimate a substitute for document labels. Experiments using user session data collected from a commercial vertical search engine demonstrate the potential of our approach. While in terms of DCG, the editorial model out-perform the behavioral one, online experiments show that the behavioral model is on par --if not superior-- to the editorial model. To our knowledge, this is the first report in the Literature of a competitive behavioral model in a commercial setting","PeriodicalId":310611,"journal":{"name":"Proceedings of the 19th ACM international conference on Information and knowledge management","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129129551","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multi-document topic segmentation","authors":"Minwoo Jeong, Ivan Titov","doi":"10.1145/1871437.1871579","DOIUrl":"https://doi.org/10.1145/1871437.1871579","url":null,"abstract":"Multiple documents describing the same or closely related sets of events are common and often easy to obtain: for example, consider document clusters on a news aggregator site or multiple reviews of the same product or service. Even though each such document discusses a similar set of topics, they provide alternative views or complimentary information on each of these topics. We argue that revealing hidden relations by jointly segmenting the documents, or, equivalently, predicting links between topically related segments in different documents would help to visualize documents of interest and construct friendlier user interfaces. In this paper, we refer to this problem as multi-document topic segmentation. We propose an unsupervised Bayesian model for the considered problem that models both shared and document-specific topics, and utilizes Dirichlet process priors to determine the effective number of topics. We show that topic segmentation can be inferred efficiently using a simple split-merge sampling algorithm. The resulting method outperforms baseline models on four datasets for multi-document topic segmentation.","PeriodicalId":310611,"journal":{"name":"Proceedings of the 19th ACM international conference on Information and knowledge management","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132314796","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Exploiting user interests for collaborative filtering: interests expansion via personalized ranking","authors":"Qi Liu, Enhong Chen, Hui Xiong, C. Ding","doi":"10.1145/1871437.1871707","DOIUrl":"https://doi.org/10.1145/1871437.1871707","url":null,"abstract":"In real applications, a given user buys or rates an item based on his/her interests. Learning to leverage this interest information is often critical for recommender systems. However, in existing recommender systems, the information about latent user interests are largely under-explored. To that end, in this paper, we propose an interest expansion strategy via personalized ranking based on the topic model, named iExpand, for building an interest-oriented collaborative filtering framework. The iExpand method introduces a three-layer, user-interest-item, representation scheme, which leads to more interpretable recommendation results and helps the understanding of the interactions among users, items, and user interests. Moreover, iExpand strategically deals with many issues, such as the overspecialization and the cold-start problems. Finally, we evaluate iExpand on benchmark data sets, and experimental results show that iExpand outperforms state-of-the-art methods.","PeriodicalId":310611,"journal":{"name":"Proceedings of the 19th ACM international conference on Information and knowledge management","volume":" 6","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131976202","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Two-tier similarity model for story link detection","authors":"Tadashi Nomoto","doi":"10.1145/1871437.1871539","DOIUrl":"https://doi.org/10.1145/1871437.1871539","url":null,"abstract":"The paper presents a novel approach to story link detection, where the goal is to determine whether a pair of news stories are linked, i.e., talk about the same event. The present work marks a departure from the prior work in that we measure similarity at two distinct levels of textual organization, the document and its collection, and combine scores at both levels to determine how well stories are linked. Experiments on the TDT-5 corpus show that the present approach, which we call a 'two-tier similarity model,' comfortably beats conventional approaches such as Clarity enhanced KL divergence, while performing robustly across diverse languages.","PeriodicalId":310611,"journal":{"name":"Proceedings of the 19th ACM international conference on Information and knowledge management","volume":"81 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133271190","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}