车间管理Pub Date : 2015-10-18DOI: 10.1145/2809890.2809895
Wissem Labbadi, J. Akaichi
{"title":"Efficient Top-k Query Answering through its Top-N Rewritings Using Views","authors":"Wissem Labbadi, J. Akaichi","doi":"10.1145/2809890.2809895","DOIUrl":"https://doi.org/10.1145/2809890.2809895","url":null,"abstract":"Recently, various algorithms were proposed to speed up top-k query answering by using multiple materialized query results. Nevertheless, for most of the proposed algorithms, a potentially costly view selection operation is required. In fact, the processing cost has been shown to be linear with respect to the number of views and can be exorbitant given the large number of views to be considered. In this paper, we address the problem of identifying the top-N promising views to use for top-k query answering in the presence of a collection of views. We propose a novel algorithm, for handling this problem, which aims to achieve significant reduction in query execution time. Indeed, it considers minimal amount of rewritings that are likely necessary to return the top-k tuples for a top-k query. We consider, also, the problem of how to efficiently exploit the output of the rewritings algorithm to retrieve the top-k tuples through two possible solutions. The results of a thorough experimental study indicate that the proposed algorithm offers a robust solution to the problem of efficient top-k query answering using views since it discards non-promising query rewritings from the view selection process.","PeriodicalId":67056,"journal":{"name":"车间管理","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2015-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89043799","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
车间管理Pub Date : 2015-10-18DOI: 10.1145/2809890.2809896
Radha Chitta, Anil K. Jain, Rong Jin
{"title":"Sparse Kernel Clustering of Massive High-Dimensional Data sets with Large Number of Clusters","authors":"Radha Chitta, Anil K. Jain, Rong Jin","doi":"10.1145/2809890.2809896","DOIUrl":"https://doi.org/10.1145/2809890.2809896","url":null,"abstract":"In clustering applications involving documents and images, in addition to the large number of data points (N) and their high dimensionality (d), the number of clusters (C) into which the data need to be partitioned is also large. Kernel-based clustering algorithms, which have been shown to perform better than linear clustering algorithms, have high running time complexity in terms of N, d and C. We propose an efficient sparse kernel k-means clustering algorithm, which incrementally samples the most informative points from the data set using importance sampling, and constructs a sparse kernel matrix using these sampled points. Each row in this matrix corresponds to a data point's similarity with its p-nearest neighbors among the sampled points (p -- N). This sparse kernel matrix is used to perform clustering and obtain the cluster labels. This combination of sampling and sparsity reduces both the running time and memory complexity of kernel clustering. In order to further enhance its efficiency, the proposed algorithm projects the data on to the top C eigenvectors of the sparse kernel matrix and clusters these eigenvectors using a modified k-means algorithm. The running time of the proposed sparse kernel k-means algorithm is linear in N and d, and logarithmic in C. We show analytically that only a small number of points need to be sampled from the data set, and the resulting approximation error is well-bounded. We demonstrate, using several large high-dimensional text and image data sets, that the proposed algorithm is significantly faster than classical kernel-based clustering algorithms, while maintaining clustering quality.","PeriodicalId":67056,"journal":{"name":"车间管理","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2015-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82773701","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Proceedings of the 8th Workshop on Ph.D. Workshop in Information and Knowledge Management","authors":"Mouna Kacimi, N. Preda, Maya Ramanath","doi":"10.1145/2809890","DOIUrl":"https://doi.org/10.1145/2809890","url":null,"abstract":"The publication date is one day earlier then the EST date to provide the proceedings to attendees in Australian on the first day of the conference \u0000 \u0000It is our pleasure to host PIKM, the PhD workshop in Information and Knowledge Management, in conjunction with the ACM CIKM 2015 conference in Melbourne, Australia. PIKM has been a mpopular event in CIKM since its inception in 2007. This is the 8th time PIKM is being held and has attracted participants from all over the world. \u0000 \u0000PIKM provides PhD students an opportunity to present their dissertation proposals and/or early doctoral research worldwide and get recognition for their work. It gives them valuable feedback at a relatively early stage from experts in their field in academia and industry. This helps them assess their work with respect to its novelty, technical contributions and real-world applications. Moreover, PIKM also presents a panorama of upcoming doctoral work to established researchers in information and knowledge management. It gives them an idea of the interesting topics that attract fresh doctorates. It could help them tap this potential at an early stage through summer internships, research collaborations and more. \u0000 \u0000There have been 16 submissions to PIKM2015 of which 5 have been accepted as full papers. A significant highlight of PIKM 2015 includes both poster and oral presentations for all accepted papers to increase visibility and interaction. Another distinguished aspect this year is a career development session consisting of a mentoring presentation, from an experienced researcher, which emphasizes the importance of seeking opportunities and developing the needed skills to be successful after the PhD. We encourage participants to attend the keynote and invited talks. These valuable and insightful talks can help PhD students in their career: \u0000Keynote: \"Why Researchers are Managers\", Dr. Gerard de Melo (Tsinghua University, China) \u0000Invited talk in the career development session: \"Beyond The Thesis: Completing A Successful PhD\", Prof. Justin Zobel (University of Melbourne, Australia) \u0000 \u0000 \u0000 \u0000The PIKM 2015 team includes Program Committee members from 11 countries spanning 4 continents. These comprise a good balance of industry and academia. We thank the reviewers for providing quick and useful feedback to the students amidst their busy schedule of work. In recent years, PIKM has been giving a best reviewer award in order to honor the exceptional contributions of a PC member, analogous to the best paper award that provides recognition to outstanding PhD student research. This year, the best paper award goes to Shady Elbassuoni from the American University of Beirut, Lebanon. We sincerely applaud him for his time and effort in providing excellent and detailed reviews. The best paper award will be announced during the PIKM workshop at the CIKM conference. Both these awards consist of ACM certificates.","PeriodicalId":67056,"journal":{"name":"车间管理","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2015-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78451273","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
车间管理Pub Date : 2015-10-18DOI: 10.1145/2809890.2809892
S. Gandhi, T. Oates, Arnold P. Boedihardjo, Crystal Chen, Jessica Lin, Pavel Senin, S. Frankenstein, Xing Wang
{"title":"A Generative Model For Time Series Discretization Based On Multiple Normal Distributions","authors":"S. Gandhi, T. Oates, Arnold P. Boedihardjo, Crystal Chen, Jessica Lin, Pavel Senin, S. Frankenstein, Xing Wang","doi":"10.1145/2809890.2809892","DOIUrl":"https://doi.org/10.1145/2809890.2809892","url":null,"abstract":"Discretization is a crucial first step in several time series mining applications. Our research proposes a novel method to discretize time series data and develops a similarity score based on the discretized representation. The similarity score allows us to compare two time series sequences and enables us to perform pattern learning tasks such as clustering, classification, and anomaly detection. We propose a generative model for discretization based on multiple normal distributions and create an optimization technique to learn parameters of these normal distributions. To show the effectiveness of our approach, we perform comprehensive experiments in classifying datasets from the UCR time series repository.","PeriodicalId":67056,"journal":{"name":"车间管理","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2015-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83348075","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Session details: Career Development Session (Invited Talk)","authors":"N. Preda","doi":"10.1145/3257876","DOIUrl":"https://doi.org/10.1145/3257876","url":null,"abstract":"","PeriodicalId":67056,"journal":{"name":"车间管理","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2015-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91478634","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
车间管理Pub Date : 2015-10-18DOI: 10.1145/2809890.2809894
Jiajia Huang, Min Peng, Hua Wang
{"title":"Topic Detection from Large Scale of Microblog Stream with High Utility Pattern Clustering","authors":"Jiajia Huang, Min Peng, Hua Wang","doi":"10.1145/2809890.2809894","DOIUrl":"https://doi.org/10.1145/2809890.2809894","url":null,"abstract":"With the popularity of social media, detecting topics from microblog streams have become an increasingly important task. However, it's a challenge due to microblog streams have the characteristics of high-dimension, short and noisy content, fast changing, huge volume and so on. In this paper, we propose a high utility pattern clustering (HUPC) framework over microblog streams. This framework first extracts a group of representative patterns from the microblog stream, and then groups these patterns into topic clusters. This approach works well on large scale of microblog streams because it clusters the patterns that perform better in describing topics, rather than clustering noises and microblogs directly. Furthermore, the proposed framework can detect coherent topics and new emerging topics simultaneously. Extensive experimental results on Twitter streams and Sina Weibo streams show that the developed method achieves better performance than other existing topic detection methods, leading to a desirable solution of detecting event from microblog streams.","PeriodicalId":67056,"journal":{"name":"车间管理","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2015-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91268456","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
车间管理Pub Date : 2015-10-18DOI: 10.1145/2809890.2815473
J. Zobel
{"title":"Beyond The Thesis: Completing A Successful PhD","authors":"J. Zobel","doi":"10.1145/2809890.2815473","DOIUrl":"https://doi.org/10.1145/2809890.2815473","url":null,"abstract":"Many factors lead to students undertaking a PhD. A student may, for example, be intellectually curious, and want to pursue an interest or understand a problem; or may be adventurous, and want to make a significant discovery; or be entrepreneurial, and want to create an innovation; or want to work with a particular scientist; or want to continue to participate in life on campus. Students may regard a PhD as an opportunity to acquire deep training in research in the field, and perhaps to distinguish themselves by completing a piece of major work, acquiring the title of 'doctor', and becoming a scientist. Perhaps surprisingly, many students seem to give only limited attention to the details of what their next step will be, even at the end of the PhD. While they may have a general goal to become an academic or researcher, these students have not explored what is involved in reaching that goal. Yet the activities of the PhD, perhaps even in the first year, can help shape each student's career. In particular, students need to be aware of their need to develop skills, and acquire experience, in areas beyond that of the core activities of research. Students do use the PhD to develop themselves. At the start of their PhDs, students are highly diverse, with individual strengths and weaknesses. The task of completing the PhD to some extent normalizes these differences: students find that they have to address their shortcomings, while exploiting their existing skills as they build an initial body of research. However, this development tends to be focused on the skills need for the PhD itself - writing, speaking, managing data, analysis of literature, design of experiments, and so on. Yet a PhD is also an opportunity for students to develop more broadly, and to position themselves for the career of their choice. Some students do not take advantage of this opportunity, while others, in their haste to finish, sidestep some of the aspects of PhD study from which they have the most to learn. In particular, an aspect of PhD study that is often overlooked is that it can be a period of intense personal development. The demands of undertaking such a long, concentrated piece of work can lead to intellectual rigor, intellectual independence, systematic work habits, and, perhaps most crucially, deepened self-assessment. The most successful scientists are not just technically capable, imaginative, lucid, and so on, but are aware of their limitations. In some cases these can be rectified through discipline and study; in others, they are factors to consider when choosing or shaping a career. Thus an effective student should approach the end of the PhD in a strategic way, seeking opportunities to develop the qualities that will help give an easy transition to the next career step, while taking a clear-eyed view of the likelihood of success in different kinds of work.","PeriodicalId":67056,"journal":{"name":"车间管理","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2015-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91260879","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
车间管理Pub Date : 2015-10-18DOI: 10.1145/2809890.2809893
Sanjay Rathee, Manohar Kaul, Arti Kashyap
{"title":"R-Apriori: An Efficient Apriori based Algorithm on Spark","authors":"Sanjay Rathee, Manohar Kaul, Arti Kashyap","doi":"10.1145/2809890.2809893","DOIUrl":"https://doi.org/10.1145/2809890.2809893","url":null,"abstract":"Association rule mining remains a very popular and effective method to extract meaningful information from large datasets. It tries to find possible associations between items in large transaction based datasets. In order to create these associations, frequent patterns have to be generated. The \"Apriori\" algorithm along with its set of improved variants, which were one of the earliest proposed frequent pattern generation algorithms still remain a preferred choice due to their ease of implementation and natural tendency to be parallelized. While many efficient single-machine methods for Apriori exist, the massive amount of data available these days is far beyond the capacity of a single machine. Hence, there is a need to scale across multiple machines to meet the demands of this ever-growing data. MapReduce is a popular fault-tolerant framework for distributed applications. Nevertheless, heavy disk I/O at each MapReduce operation hinders the implementation of efficient iterative data mining algorithms, such as Apriori, on MapReduce platforms. A newly proposed in-memory distributed dataflow platform called Spark overcomes the disk I/O bottlenecks in MapReduce. Therefore, Spark presents an ideal platform for distributed Apriori. However, in the implementation of Apriori, the most computationally expensive task is the generation of candidate sets having all possible pairs for singleton frequent items and comparing each pair with every transaction record. Here, we propose a new approach which dramatically reduces this computational complexity by eliminating the candidate generation step and avoiding costly comparisons. We conduct in-depth experiments to gain insight into the effectiveness, efficiency and scalability of our approach. Our studies show that our approach outperforms the classical Apriori and state-of-the-art on Spark by many times for different datasets.","PeriodicalId":67056,"journal":{"name":"车间管理","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2015-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75492012","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}