SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining最新文献_第7页

On the Management and Analysis of Our LifeSteps 关于我们生活步骤的管理与分析

SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining Pub Date : 2014-03-17 DOI: 10.1145/2594473.2594478

N. Pelekis, Y. Theodoridis, D. Janssens

{"title":"On the Management and Analysis of Our LifeSteps","authors":"N. Pelekis, Y. Theodoridis, D. Janssens","doi":"10.1145/2594473.2594478","DOIUrl":"https://doi.org/10.1145/2594473.2594478","url":null,"abstract":"Huge volumes of location information are available nowadays due to the rapid growth of positioning devices (GPS-enabled smartphones and tablets, on-board navigation systems in vehicles, vessels and planes, smart chips for animals, etc.). In the near future, it is unavoidable that this explosion will contribute in what is called the Big Data era, raising high challenges for the data management research community. Instead of trying to manage bigger and bigger volumes of raw data, future Moving Object Database (MOD) systems need to extract and manage (the minimum necessary) semantics of movement. Such semantics can foster next-generation location-based services (LBS) and locationbased social networking (LBSN) applications, building more efficient and effective applications, while in parallel opening new research directions in the field of transportation, urban planning etc. In this article, we first present a novel model that enables the unified management of (raw GPS) trajectories and their semantic counterpart, and then we discuss challenges and solutions on the multidimensional analysis of such real-world semantic-aware mobility databases and data warehouses. Our recent experience from an interdisciplinary EU project we've been participating makes us confident that the envisioned approach will inspire the next wave of research in mobility data management and exploration field.","PeriodicalId":90050,"journal":{"name":"SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining","volume":"25 1","pages":"23-32"},"PeriodicalIF":0.0,"publicationDate":"2014-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85011440","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 17

Research issues in outlier detection for data streams 数据流异常点检测的研究问题

SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining Pub Date : 2014-03-17 DOI: 10.1145/2594473.2594479

Md. Shiblee Sadik, L. Gruenwald

引用次数: 77

20 years of pattern mining: a bibliometric survey 20年的模式挖掘:文献计量调查

SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining Pub Date : 2014-03-17 DOI: 10.1145/2594473.2594480

A. Giacometti, Dominique H. Li, Patrick Marcel, Arnaud Soulet

{"title":"20 years of pattern mining: a bibliometric survey","authors":"A. Giacometti, Dominique H. Li, Patrick Marcel, Arnaud Soulet","doi":"10.1145/2594473.2594480","DOIUrl":"https://doi.org/10.1145/2594473.2594480","url":null,"abstract":"In 1993, Rakesh Agrawal, Tomasz Imielinski and Arun N. Swami published one of the founding papers of Pattern Mining: \"Mining Association Rules between Sets of Items in Large Databases\". Beyond the introduction to a new problem, it introduced a new methodology in terms of resolution and evaluation. For two decades, Pattern Mining has been one of the most active fields in Knowledge Discovery in Databases. This paper provides a bibliometric survey of the literature relying on 1,087 publications from five major international conferences: KDD, PKDD, PAKDD, ICDM and SDM. We first measured a slowdown of research dedicated to Pattern Mining while the KDD field continues to grow. Then, we quantified the main contributions with respect to languages, constraints and condensed representations to outline the current directions. We observe a sophistication of languages over the last 20 years, although association rules and itemsets are so far the most studied ones. As expected, the minimal support constraint predominates the extraction of patterns with approximately 50% of the publications. Finally, condensed representations used in 10% of the papers had relative success particularly between 2005 and 2008.","PeriodicalId":90050,"journal":{"name":"SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining","volume":"91 1","pages":"41-50"},"PeriodicalIF":0.0,"publicationDate":"2014-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73442038","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 29

Mining heterogeneous information networks: a structural analysis approach 挖掘异构信息网络:一种结构分析方法

SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining Pub Date : 2013-04-30 DOI: 10.1145/2481244.2481248

Yizhou Sun, Jiawei Han

引用次数: 501

Outlier ensembles: position paper 异常集合:立场文件

SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining Pub Date : 2013-04-30 DOI: 10.1145/2481244.2481252

C. Aggarwal

引用次数: 13

Scaling big data mining infrastructure: the twitter experience 扩展大数据挖掘基础设施:twitter体验

SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining Pub Date : 2013-04-30 DOI: 10.1145/2481244.2481247

Jimmy J. Lin, D. Ryaboy

{"title":"Scaling big data mining infrastructure: the twitter experience","authors":"Jimmy J. Lin, D. Ryaboy","doi":"10.1145/2481244.2481247","DOIUrl":"https://doi.org/10.1145/2481244.2481247","url":null,"abstract":"The analytics platform at Twitter has experienced tremendous growth over the past few years in terms of size, complexity, number of users, and variety of use cases. In this paper, we discuss the evolution of our infrastructure and the development of capabilities for data mining on \"big data\". One important lesson is that successful big data mining in practice is about much more than what most academics would consider data mining: life \"in the trenches\" is occupied by much preparatory work that precedes the application of data mining algorithms and followed by substantial effort to turn preliminary models into robust solutions. In this context, we discuss two topics: First, schemas play an important role in helping data scientists understand petabyte-scale data stores, but they're insufficient to provide an overall \"big picture\" of the data available to generate insights. Second, we observe that a major challenge in building data analytics platforms stems from the heterogeneity of the various components that must be integrated together into production workflows---we refer to this as \"plumbing\". This paper has two goals: For practitioners, we hope to share our experiences to flatten bumps in the road for those who come after us. For academic researchers, we hope to provide a broader context for data mining in production environments, pointing out opportunities for future work.","PeriodicalId":90050,"journal":{"name":"SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining","volume":"15 1","pages":"6-19"},"PeriodicalIF":0.0,"publicationDate":"2013-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85003844","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 191

Studying the source code of scientific research 研究科研的源代码

SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining Pub Date : 2013-04-30 DOI: 10.1145/2481244.2481254

Graham Cormode, S. Muthukrishnan, Jinyun Yan

引用次数: 1

Big graph mining: algorithms and discoveries 大图挖掘:算法和发现

SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining Pub Date : 2013-04-30 DOI: 10.1145/2481244.2481249

U. Kang, C. Faloutsos

{"title":"Big graph mining: algorithms and discoveries","authors":"U. Kang, C. Faloutsos","doi":"10.1145/2481244.2481249","DOIUrl":"https://doi.org/10.1145/2481244.2481249","url":null,"abstract":"How do we find patterns and anomalies in very large graphs with billions of nodes and edges? How to mine such big graphs efficiently? Big graphs are everywhere, ranging from social networks and mobile call networks to biological networks and the World Wide Web. Mining big graphs leads to many interesting applications including cyber security, fraud detection, Web search, recommendation, and many more.\u0000 In this paper we describe Pegasus, a big graph mining system built on top of MapReduce, a modern distributed data processing platform. We introduce GIM-V, an important primitive that Pegasus uses for its algorithms to analyze structures of large graphs. We also introduce HEigen, a large scale eigensolver which is also a part of Pegasus. Both GIM-V and HEigen are highly optimized, achieving linear scale up on the number of machines and edges, and providing 9.2x and 76x faster performance than their naive counterparts, respectively.\u0000 Using Pegasus, we analyze very large, real world graphs with billions of nodes and edges. Our findings include anomalous spikes in the connected component size distribution, the 7 degrees of separation in a Web graph, and anomalous adult advertisers in the who-follows-whom Twitter social network.","PeriodicalId":90050,"journal":{"name":"SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining","volume":"20 1","pages":"29-36"},"PeriodicalIF":0.0,"publicationDate":"2013-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81570965","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 67

Mining large streams of user data for personalized recommendations 挖掘大量用户数据流以提供个性化推荐

SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining Pub Date : 2013-04-30 DOI: 10.1145/2481244.2481250

X. Amatriain

引用次数: 135

Mining big data: current status, and forecast to the future 挖掘大数据:现状，展望未来

SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining Pub Date : 2013-04-30 DOI: 10.1145/2481244.2481246

Wei Fan, A. Bifet

引用次数: 801