Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management最新文献
Mingqiang Xue, Panagiotis Karras, Chedy Raïssi, H. Pung
{"title":"Utility-driven anonymization in data publishing","authors":"Mingqiang Xue, Panagiotis Karras, Chedy Raïssi, H. Pung","doi":"10.1145/2063576.2063945","DOIUrl":"https://doi.org/10.1145/2063576.2063945","url":null,"abstract":"Privacy-preserving data publication has been studied intensely in the past years. Still, all existing approaches transform data values by random perturbation or generalization. In this paper, we introduce a radically different data anonymization methodology. Our proposal aims to maintain a certain amount of patterns, defined in terms of a set of properties of interest that hold for the original data. Such properties are represented as linear relationships among data points. We present an algorithm that generates a set of anonymized data that strictly preserves these properties, thus maintaining specified patterns in the data. Extensive experiments with real and synthetic data show that our algorithm is efficient, and produces anonymized data that affords high utility in several data analysis tasks while safeguarding privacy.","PeriodicalId":74507,"journal":{"name":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","volume":"1 1","pages":"2277-2280"},"PeriodicalIF":0.0,"publicationDate":"2011-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76273776","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Discovering trending phrases on information streams","authors":"K. Kamath, James Caverlee","doi":"10.1145/2063576.2063937","DOIUrl":"https://doi.org/10.1145/2063576.2063937","url":null,"abstract":"We study the problem of efficient discovery of trending phrases from high-volume text streams -- be they sequences of Twitter messages, email messages, news articles, or other time-stamped text documents. Most existing approaches return top-k trending phrases. But, this approach neither guarantees that the top-k phrases returned are all trending, nor that all trending phrases are returned. In addition, the value of k is difficult to set and is indifferent to stream dynamics. Hence, we propose an approach that identifies all the trending phrases in a stream and is flexible to the changing stream properties.","PeriodicalId":74507,"journal":{"name":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","volume":"23 1","pages":"2245-2248"},"PeriodicalIF":0.0,"publicationDate":"2011-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73144037","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Social and collaborative information seeking: panel","authors":"Jeremy Pickens","doi":"10.1145/2063576.2064057","DOIUrl":"https://doi.org/10.1145/2063576.2064057","url":null,"abstract":"In recent years, information retrieval and information seeking have moved beyond their single-user roots and are becoming multi-user endeavors. However, there are multiple visions for how best to design multi-user interactions: social search versus collaborative search. The terms \"social\" and \"collaborative\" are overloaded with meaning, having been used to describe a wide variety of systems, user needs and goals, interaction styles, and algorithms. In this panel we adopt the following primary definitions: Information seeking tasks in which there are two or more people who lack the same information (share the same information need) and explicitly set out together to satisfy that need are known as collaborative. A collaborative information retrieval system provides mechanisms -- interfaces and mediation algorithms -- that allow the team to work together to find information that neither individual would have found when working alone. There is an inherent division of labor in collaborative work.\u0000 On the other hand, information seeking tasks in which only a single individual lacks information, but is willing or able to let an larger group assist in the satisfaction of that need, is known as social search. The larger group may be an community of like-minded individuals, or it might be a social network of friends and associates. But either way, the assumption is that someone in that community or network already possesses the information that the initial individual seeks. The goal of the system is therefore to correctly propagate or diffuse that existing knowledge throughout the network, to amplify and repeat information that has already been discovered by at least one person.\u0000 Despite these fundamental differences between collaborative (team-oriented, jointly-held information need) and social (network- and community-augmented, though ultimately solitary need), there are similarities in process. This panel will explore both these similarities and differences, and provide insight about whether one type of multi-user information seeking vision will ultimately eclipse the other, or whether each will remain separate but complementary.","PeriodicalId":74507,"journal":{"name":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","volume":"50 1","pages":"2647-2648"},"PeriodicalIF":0.0,"publicationDate":"2011-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74440072","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"TAKES: a fast method to select features in the kernel space","authors":"Ye Xu, S. Furao, Wei Ping, Jinxi Zhao","doi":"10.1145/2063576.2063677","DOIUrl":"https://doi.org/10.1145/2063576.2063677","url":null,"abstract":"Feature selection is an effective tool to deal with the \"curse of dimensionality\". To cope with the non-separable problem, feature selection in the kernel space has been investigated. However, previous study cannot adequately estimate the intrinsic dimensionality of the kernel space. Thus, it is difficult to accurately preserve the sketch of the kernel space using the learned basis, and the feature selection performance is affected. Moreover, the computing load of the algorithm reaches at least cubic with the number of training data. In this paper, we propose a fast framework to conduct feature selection in the kernel space. By designing a fast kernel subspace learning method, we automatically learn the intrinsic dimensionality and construct an orthogonal basis set of kernel space. The learned basis can accurately preserve the sketch of kernel space. Then backed by the constructed basis, we directly select features in kernel space. The whole proposed framework has a quadratic complexity with the number of training data, which is faster than existing kernel methods for feature selection. We evaluate our work under several typical datasets and find it not only preserves the sketch of the kernel space more accurately but also achieves better classification performance compared with many state-of-the-art methods.","PeriodicalId":74507,"journal":{"name":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","volume":"163 1","pages":"683-692"},"PeriodicalIF":0.0,"publicationDate":"2011-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72788581","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"LSDS-IR'11: the 9th workshop on large-scale and distributed systems for information retrieval","authors":"C. Lucchese, B. B. Cambazoglu","doi":"10.1145/2063576.2064054","DOIUrl":"https://doi.org/10.1145/2063576.2064054","url":null,"abstract":"The growth of the Web and user bases lead to important performance problems for large-scale Web search engines. The LSDS- IR '11 workshop focuses on research contributions related to the scalability and efficiency of distributed information retrieval (IR) systems. The workshop also encourages contributions that propose different ways of leveraging diversity and multiplicity of resources available in distributed systems. More specifically, we are interested in novel applications, models, and architectures that deal with efficiency and scalability of distributed IR systems.","PeriodicalId":74507,"journal":{"name":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","volume":"24 1","pages":"2643-2644"},"PeriodicalIF":0.0,"publicationDate":"2011-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72796219","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiangjun Dong, Z. Zheng, Longbing Cao, Yanchang Zhao, Chengqi Zhang, Jinjiu Li, Wei Wei, Yuming Ou
{"title":"e-NSP: efficient negative sequential pattern mining based on identified positive patterns without database rescanning","authors":"Xiangjun Dong, Z. Zheng, Longbing Cao, Yanchang Zhao, Chengqi Zhang, Jinjiu Li, Wei Wei, Yuming Ou","doi":"10.1145/2063576.2063695","DOIUrl":"https://doi.org/10.1145/2063576.2063695","url":null,"abstract":"Mining Negative Sequential Patterns (NSP) is much more challenging than mining Positive Sequential Patterns (PSP) due to the high computational complexity and huge search space required in calculating Negative Sequential Candidates (NSC). Very few approaches are available for mining NSP, which mainly rely on re-scanning databases after identifying PSP. As a result, they are very inefficient. In this paper, we propose an efficient algorithm for mining NSP, called e-NSP, which mines for NSP by only involving the identified PSP, without re-scanning databases. First, negative containment is defined to determine whether or not a data sequence contains a negative sequence. Second, an efficient approach is proposed to convert the negative containment problem to a positive containment problem. The supports of NSC are then calculated based only on the corresponding PSP. Finally, a simple but efficient approach is proposed to generate NSC. With e-NSP, mining NSP does not require additional database scans, and the existing PSP mining algorithms can be integrated into e-NSP to mine for NSP efficiently. e-NSP is compared with two currently available NSP mining algorithms on 14 synthetic and real-life datasets. Intensive experiments show that e-NSP takes as little as 3% of the runtime of the baseline approaches and is applicable for efficient mining of NSP in large datasets.","PeriodicalId":74507,"journal":{"name":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","volume":"18 1","pages":"825-830"},"PeriodicalIF":0.0,"publicationDate":"2011-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72666038","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Qiang Lu, Jack G. Conrad, Khalid Al-Kofahi, William Keenan
{"title":"Legal document clustering with built-in topic segmentation","authors":"Qiang Lu, Jack G. Conrad, Khalid Al-Kofahi, William Keenan","doi":"10.1145/2063576.2063636","DOIUrl":"https://doi.org/10.1145/2063576.2063636","url":null,"abstract":"Clustering is a useful tool for helping users navigate, summarize, and organize large quantities of textual documents available on the Internet, in news sources, and in digital libraries. A variety of clustering methods have also been applied to the legal domain, with various degrees of success. Some unique characteristics of legal content as well as the nature of the legal domain present a number of challenges. For example, legal documents are often multi-topical, contain carefully crafted, professional, domain-specific language, and possess a broad and unevenly distributed coverage of legal issues. Moreover, unlike widely accessible documents on the Internet, where search and categorization services are generally free, the legal profession is still largely a fee-for-service field that makes the quality (e.g., in terms of both recall and precision) a key differentiator of provided services. This paper introduces a classification-based recursive soft clustering algorithm with built-in topic segmentation. The algorithm leverages existing legal document metadata such as topical classifications, document citations, and click stream data from user behavior databases, into a comprehensive clustering framework. Techniques associated with the algorithm have been applied successfully to very large databases of legal documents, which include judicial opinions, statutes, regulations, administrative materials and analytical documents. Extensive evaluations were conducted to determine the efficiency and effectiveness of the proposed algorithm. Subsequent evaluations conducted by legal domain experts have demonstrated that the quality of the resulting clusters based upon this algorithm is similar to those created by domain experts.","PeriodicalId":74507,"journal":{"name":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","volume":"584 2","pages":"383-392"},"PeriodicalIF":0.0,"publicationDate":"2011-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/2063576.2063636","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72418761","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Inferring query aspects from reformulations using clustering","authors":"Van Dang, Xiaobing Xue, W. Bruce Croft","doi":"10.1145/2063576.2063904","DOIUrl":"https://doi.org/10.1145/2063576.2063904","url":null,"abstract":"When the information need is not clear from the user query, a good strategy would be to return documents that cover as many aspects of the query as possible. To do this, the possible aspects of the query need to be automatically identified. In this paper, we propose to do this by clustering reformulated queries generated from publicly available resources and using each cluster to represent an aspect of the query. Our results show that the automatically generated reformulations for the TREC Web Track queries match up quite well with actual sub-topics of these queries identified by TREC experts. Moreover, agglomerative clustering using query-to-query similarity based on co-occurrence in text passages can provide clusters of high quality that potentially can be used to identify aspects.","PeriodicalId":74507,"journal":{"name":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","volume":"101 1","pages":"2117-2120"},"PeriodicalIF":0.0,"publicationDate":"2011-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79385534","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"PDFMeat: managing publications on the semantic desktop","authors":"D. Aumüller, E. Rahm","doi":"10.1145/2063576.2064020","DOIUrl":"https://doi.org/10.1145/2063576.2064020","url":null,"abstract":"Researchers maintain bibliographies and extensive sets of PDF files of scholarly publications on their desktop. The lack of proper metadata of downloaded PDFs makes this task a tedious one. With PDFMeat we present a solution to automatically determine publication metadata for scholarly papers within the user's desktop environment and link the metadata to the files. PDFMeat effectively matches local full texts to an online repository. In an evaluation for more than 2.000 diverse PDF files it worked highly reliable and showed excellent accuracy of up to 98 percent. We demonstrate PDFMeat for different sets of papers, highlighting the semantic integration and use of the retrieved metadata within the file browser of the desktop environment.","PeriodicalId":74507,"journal":{"name":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","volume":"10 1","pages":"2565-2568"},"PeriodicalIF":0.0,"publicationDate":"2011-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79426372","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"DIGRank: using global degree to facilitate ranking in an incomplete graph","authors":"Xiang Niu, Lusong Li, Ke Xu","doi":"10.1145/2063576.2063950","DOIUrl":"https://doi.org/10.1145/2063576.2063950","url":null,"abstract":"PageRank has been broadly applied to get credible rank sequences of nodes in many networks such as the web, citation networks, or online social networks. However, in the real world, it is usually hard to ascertain a complete structure of a network, particularly a large-scale one. Some researchers have begun to explore how to get a relatively accurate rank more efficiently. They have proposed some local approximation methods, which are especially designed for quickly estimating the PageRank value of a new node, after it is just added to the network. Yet, these local approximation methods rely on the link server too much, and it is difficult to use them to estimate rank sequences of nodes in a group. So we propose a new method called DIGRank, which uses global Degree to facilitate Ranking in an Incomplete Graph and which takes into account the frequent need for applications to rank users in a community, retrieve pages in a particular area, or mine nodes in a fractional or limited network. Based on experiments in small-world and scale-free networks generated by models, the DIGRank method performs better than other local estimation methods on ranking nodes in a given subgraph. In the models, it tends to perform best in graphs that have low average shortest path length, high average degree, or weak community structure. Besides, compared with an local PageRank and an advanced local approximation method, it significantly reduces the computational cost and error rate.","PeriodicalId":74507,"journal":{"name":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","volume":"21 1","pages":"2297-2300"},"PeriodicalIF":0.0,"publicationDate":"2011-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79632102","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}