{"title":"On the Management and Analysis of Our LifeSteps","authors":"N. Pelekis, Y. Theodoridis, D. Janssens","doi":"10.1145/2594473.2594478","DOIUrl":"https://doi.org/10.1145/2594473.2594478","url":null,"abstract":"Huge volumes of location information are available nowadays due to the rapid growth of positioning devices (GPS-enabled smartphones and tablets, on-board navigation systems in vehicles, vessels and planes, smart chips for animals, etc.). In the near future, it is unavoidable that this explosion will contribute in what is called the Big Data era, raising high challenges for the data management research community. Instead of trying to manage bigger and bigger volumes of raw data, future Moving Object Database (MOD) systems need to extract and manage (the minimum necessary) semantics of movement. Such semantics can foster next-generation location-based services (LBS) and locationbased social networking (LBSN) applications, building more efficient and effective applications, while in parallel opening new research directions in the field of transportation, urban planning etc. In this article, we first present a novel model that enables the unified management of (raw GPS) trajectories and their semantic counterpart, and then we discuss challenges and solutions on the multidimensional analysis of such real-world semantic-aware mobility databases and data warehouses. Our recent experience from an interdisciplinary EU project we've been participating makes us confident that the envisioned approach will inspire the next wave of research in mobility data management and exploration field.","PeriodicalId":90050,"journal":{"name":"SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining","volume":"25 1","pages":"23-32"},"PeriodicalIF":0.0,"publicationDate":"2014-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85011440","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Research issues in outlier detection for data streams","authors":"Md. Shiblee Sadik, L. Gruenwald","doi":"10.1145/2594473.2594479","DOIUrl":"https://doi.org/10.1145/2594473.2594479","url":null,"abstract":"In applications, such as sensor networks and power usage monitoring, data are in the form of streams, each of which is an infinite sequence of data points with explicit or implicit timestamps and has special characteristics, such as transiency, uncertainty, dynamic data distribution, multidimensionality, and dynamic relationship. These characteristics introduce new research issues that make outlier detection for stream data more challenging than that for regular (non-stream) data. This paper discusses those research issues for applications where data come from a single stream as well as multiple streams.","PeriodicalId":90050,"journal":{"name":"SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining","volume":"19 1","pages":"33-40"},"PeriodicalIF":0.0,"publicationDate":"2014-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85090006","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Giacometti, Dominique H. Li, Patrick Marcel, Arnaud Soulet
{"title":"20 years of pattern mining: a bibliometric survey","authors":"A. Giacometti, Dominique H. Li, Patrick Marcel, Arnaud Soulet","doi":"10.1145/2594473.2594480","DOIUrl":"https://doi.org/10.1145/2594473.2594480","url":null,"abstract":"In 1993, Rakesh Agrawal, Tomasz Imielinski and Arun N. Swami published one of the founding papers of Pattern Mining: \"Mining Association Rules between Sets of Items in Large Databases\". Beyond the introduction to a new problem, it introduced a new methodology in terms of resolution and evaluation. For two decades, Pattern Mining has been one of the most active fields in Knowledge Discovery in Databases. This paper provides a bibliometric survey of the literature relying on 1,087 publications from five major international conferences: KDD, PKDD, PAKDD, ICDM and SDM. We first measured a slowdown of research dedicated to Pattern Mining while the KDD field continues to grow. Then, we quantified the main contributions with respect to languages, constraints and condensed representations to outline the current directions. We observe a sophistication of languages over the last 20 years, although association rules and itemsets are so far the most studied ones. As expected, the minimal support constraint predominates the extraction of patterns with approximately 50% of the publications. Finally, condensed representations used in 10% of the papers had relative success particularly between 2005 and 2008.","PeriodicalId":90050,"journal":{"name":"SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining","volume":"91 1","pages":"41-50"},"PeriodicalIF":0.0,"publicationDate":"2014-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73442038","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Mining heterogeneous information networks: a structural analysis approach","authors":"Yizhou Sun, Jiawei Han","doi":"10.1145/2481244.2481248","DOIUrl":"https://doi.org/10.1145/2481244.2481248","url":null,"abstract":"Most objects and data in the real world are of multiple types, interconnected, forming complex, heterogeneous but often semi-structured information networks. However, most network science researchers are focused on homogeneous networks, without distinguishing different types of objects and links in the networks. We view interconnected, multityped data, including the typical relational database data, as heterogeneous information networks, study how to leverage the rich semantic meaning of structural types of objects and links in the networks, and develop a structural analysis approach on mining semi-structured, multi-typed heterogeneous information networks. In this article, we summarize a set of methodologies that can effectively and efficiently mine useful knowledge from such information networks, and point out some promising research directions.","PeriodicalId":90050,"journal":{"name":"SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining","volume":"37 1","pages":"20-28"},"PeriodicalIF":0.0,"publicationDate":"2013-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83009604","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Outlier ensembles: position paper","authors":"C. Aggarwal","doi":"10.1145/2481244.2481252","DOIUrl":"https://doi.org/10.1145/2481244.2481252","url":null,"abstract":"Ensemble analysis is a widely used meta-algorithm for many data mining problems such as classification and clustering. Numerous ensemble-based algorithms have been proposed in the literature for these problems. Compared to the clustering and classification problems, ensemble analysis has been studied in a limited way in the outlier detection literature. In some cases, ensemble analysis techniques have been implicitly used by many outlier analysis algorithms, but the approach is often buried deep into the algorithm and not formally recognized as a general-purpose meta-algorithm. This is in spite of the fact that this problem is rather important in the context of outlier analysis. This paper discusses the various methods which are used in the literature for outlier ensembles and the general principles by which such analysis can be made more effective. A discussion is also provided on how outlier ensembles relate to the ensemble-techniques used commonly for other data mining problems.","PeriodicalId":90050,"journal":{"name":"SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining","volume":"62 1","pages":"49-58"},"PeriodicalIF":0.0,"publicationDate":"2013-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88892292","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Scaling big data mining infrastructure: the twitter experience","authors":"Jimmy J. Lin, D. Ryaboy","doi":"10.1145/2481244.2481247","DOIUrl":"https://doi.org/10.1145/2481244.2481247","url":null,"abstract":"The analytics platform at Twitter has experienced tremendous growth over the past few years in terms of size, complexity, number of users, and variety of use cases. In this paper, we discuss the evolution of our infrastructure and the development of capabilities for data mining on \"big data\". One important lesson is that successful big data mining in practice is about much more than what most academics would consider data mining: life \"in the trenches\" is occupied by much preparatory work that precedes the application of data mining algorithms and followed by substantial effort to turn preliminary models into robust solutions. In this context, we discuss two topics: First, schemas play an important role in helping data scientists understand petabyte-scale data stores, but they're insufficient to provide an overall \"big picture\" of the data available to generate insights. Second, we observe that a major challenge in building data analytics platforms stems from the heterogeneity of the various components that must be integrated together into production workflows---we refer to this as \"plumbing\". This paper has two goals: For practitioners, we hope to share our experiences to flatten bumps in the road for those who come after us. For academic researchers, we hope to provide a broader context for data mining in production environments, pointing out opportunities for future work.","PeriodicalId":90050,"journal":{"name":"SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining","volume":"15 1","pages":"6-19"},"PeriodicalIF":0.0,"publicationDate":"2013-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85003844","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Studying the source code of scientific research","authors":"Graham Cormode, S. Muthukrishnan, Jinyun Yan","doi":"10.1145/2481244.2481254","DOIUrl":"https://doi.org/10.1145/2481244.2481254","url":null,"abstract":"Just as inspecting the source code of programs tells us a lot about the process of programming, inspecting the \"source code\" of scientific papers informs on the process of scientific writing. We report on our study of the source of tens of thousands of papers from Computer Science and Mathematics.","PeriodicalId":90050,"journal":{"name":"SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining","volume":"1 1","pages":"59-62"},"PeriodicalIF":0.0,"publicationDate":"2013-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74494388","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Big graph mining: algorithms and discoveries","authors":"U. Kang, C. Faloutsos","doi":"10.1145/2481244.2481249","DOIUrl":"https://doi.org/10.1145/2481244.2481249","url":null,"abstract":"How do we find patterns and anomalies in very large graphs with billions of nodes and edges? How to mine such big graphs efficiently? Big graphs are everywhere, ranging from social networks and mobile call networks to biological networks and the World Wide Web. Mining big graphs leads to many interesting applications including cyber security, fraud detection, Web search, recommendation, and many more.\u0000 In this paper we describe Pegasus, a big graph mining system built on top of MapReduce, a modern distributed data processing platform. We introduce GIM-V, an important primitive that Pegasus uses for its algorithms to analyze structures of large graphs. We also introduce HEigen, a large scale eigensolver which is also a part of Pegasus. Both GIM-V and HEigen are highly optimized, achieving linear scale up on the number of machines and edges, and providing 9.2x and 76x faster performance than their naive counterparts, respectively.\u0000 Using Pegasus, we analyze very large, real world graphs with billions of nodes and edges. Our findings include anomalous spikes in the connected component size distribution, the 7 degrees of separation in a Web graph, and anomalous adult advertisers in the who-follows-whom Twitter social network.","PeriodicalId":90050,"journal":{"name":"SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining","volume":"20 1","pages":"29-36"},"PeriodicalIF":0.0,"publicationDate":"2013-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81570965","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Mining large streams of user data for personalized recommendations","authors":"X. Amatriain","doi":"10.1145/2481244.2481250","DOIUrl":"https://doi.org/10.1145/2481244.2481250","url":null,"abstract":"The Netflix Prize put the spotlight on the use of data mining and machine learning methods for predicting user preferences. Many lessons came out of the competition. But since then, Recommender Systems have evolved. This evolution has been driven by the greater availability of different kinds of user data in industry and the interest that the area has drawn among the research community. The goal of this paper is to give an up-to-date overview of the use of data mining approaches for personalization and recommendation. Using Netflix personalization as a motivating use case, I will describe the use of different kinds of data and machine learning techniques.\u0000 After introducing the traditional approaches to recommendation, I highlight some of the main lessons learned from the Netflix Prize. I then describe the use of recommendation and personalization techniques at Netflix. Finally, I pinpoint the most promising current research avenues and unsolved problems that deserve attention in this domain.","PeriodicalId":90050,"journal":{"name":"SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining","volume":"33 1","pages":"37-48"},"PeriodicalIF":0.0,"publicationDate":"2013-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86097311","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Mining big data: current status, and forecast to the future","authors":"Wei Fan, A. Bifet","doi":"10.1145/2481244.2481246","DOIUrl":"https://doi.org/10.1145/2481244.2481246","url":null,"abstract":"Big Data is a new term used to identify datasets that we can not manage with current methodologies or data mining software tools due to their large size and complexity. Big Data mining is the capability of extracting useful information from these large datasets or streams of data. New mining techniques are necessary due to the volume, variability, and velocity, of such data. The Big Data challenge is becoming one of the most exciting opportunities for the years to come. We present in this issue, a broad overview of the topic, its current status, controversy, and a forecast to the future. We introduce four articles, written by influential scientists in the field, covering the most interesting and state-of-the-art topics on Big Data mining.","PeriodicalId":90050,"journal":{"name":"SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining","volume":"70 1","pages":"1-5"},"PeriodicalIF":0.0,"publicationDate":"2013-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84128868","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}