B. Ooi, K. Tan, Quoc Trung Tran, J. Yip, Gang Chen, Zheng Jye Ling, Thi Nguyen, A. Tung, Meihui Zhang
{"title":"Contextual crowd intelligence","authors":"B. Ooi, K. Tan, Quoc Trung Tran, J. Yip, Gang Chen, Zheng Jye Ling, Thi Nguyen, A. Tung, Meihui Zhang","doi":"10.1145/2674026.2674032","DOIUrl":"https://doi.org/10.1145/2674026.2674032","url":null,"abstract":"Most data analytics applications are industry/domain specific, e.g., predicting patients at high risk of being admitted to intensive care unit in the healthcare sector or predicting malicious SMSs in the telecommunication sector. Existing solutions are based on \"best practices\", i.e., the systems' decisions are knowledge-driven and/or data-driven. However, there are rules and exceptional cases that can only be precisely formulated and identified by subject-matter experts (SMEs) who have accumulated many years of experience. This paper envisions a more intelligent database management system (DBMS) that captures such knowledge to effectively address the industry/domain specific applications. At the core, the system is a hybrid human-machine database engine where the machine interacts with the SMEs as part of a feedback loop to gather, infer, ascertain and enhance the database knowledge and processing. We discuss the challenges towards building such a system through examples in healthcare predictive analysis -- a popular area for big data analytics.","PeriodicalId":90050,"journal":{"name":"SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining","volume":"122 1","pages":"39-46"},"PeriodicalIF":0.0,"publicationDate":"2014-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79461856","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
G. Krempl, I. Žliobaitė, D. Brzezinski, E. Hüllermeier, Mark Last, V. Lemaire, T. Noack, Ammar Shaker, S. Sievi, M. Spiliopoulou, J. Stefanowski
{"title":"Open challenges for data stream mining research","authors":"G. Krempl, I. Žliobaitė, D. Brzezinski, E. Hüllermeier, Mark Last, V. Lemaire, T. Noack, Ammar Shaker, S. Sievi, M. Spiliopoulou, J. Stefanowski","doi":"10.1145/2674026.2674028","DOIUrl":"https://doi.org/10.1145/2674026.2674028","url":null,"abstract":"Every day, huge volumes of sensory, transactional, and web data are continuously generated as streams, which need to be analyzed online as they arrive. Streaming data can be considered as one of the main sources of what is called big data. While predictive modeling for data streams and big data have received a lot of attention over the last decade, many research approaches are typically designed for well-behaved controlled problem settings, overlooking important challenges imposed by real-world applications. This article presents a discussion on eight open challenges for data stream mining. Our goal is to identify gaps between current research and meaningful applications, highlight open problems, and define new application-relevant research directions for data stream mining. The identified challenges cover the full cycle of knowledge discovery and involve such problems as: protecting data privacy, dealing with legacy systems, handling incomplete and delayed information, analysis of complex data, and evaluation of stream mining algorithms. The resulting analysis is illustrated by practical applications and provides general suggestions concerning lines of future research in data stream mining.","PeriodicalId":90050,"journal":{"name":"SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining","volume":"1 1","pages":"1-10"},"PeriodicalIF":0.0,"publicationDate":"2014-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91212297","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Interview: Michael Brodie, leading database researcher, industry leader, thinker","authors":"Gregory Piatetsky","doi":"10.1145/2674026.2674035","DOIUrl":"https://doi.org/10.1145/2674026.2674035","url":null,"abstract":"We discuss the most important database research advances, industry developments, role of relational and NoSQL databases, Computing Reality, Data Curation, Cloud Computing, Tamr and Jisto startups, what he learned as a chief Scientist of Verizon, Knowledge Discovery, Privacy Issues, and more.","PeriodicalId":90050,"journal":{"name":"SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining","volume":"16 1","pages":"57-63"},"PeriodicalIF":0.0,"publicationDate":"2014-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/2674026.2674035","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"64162050","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Mining text and social streams: a review","authors":"C. Aggarwal","doi":"10.1145/2641190.2641194","DOIUrl":"https://doi.org/10.1145/2641190.2641194","url":null,"abstract":"The large amount of text data which are continuously produced over time in a variety of large scale applications such as social networks results in massive streams of data. Typically massive text streams are created by very large scale interactions of individuals, or by structured creations of particular kinds of content by dedicated organizations. An example in the latter category would be the massive text streams created by news-wire services. Such text streams provide unprecedented challenges to data mining algorithms from an efficiency perspective. In this paper, we review text stream mining algorithms for a wide variety of problems in data mining such as clustering, classification and topic modeling. A recent challenge arises in the context of social streams, which are generated by large social networks such as Twitter. We also discuss a number of future challenges in this area of research.","PeriodicalId":90050,"journal":{"name":"SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining","volume":"5 1","pages":"9-19"},"PeriodicalIF":0.0,"publicationDate":"2014-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81179722","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Brain network analysis: a data mining perspective","authors":"Xiangnan Kong, Philip S. Yu","doi":"10.1145/2641190.2641196","DOIUrl":"https://doi.org/10.1145/2641190.2641196","url":null,"abstract":"Following the recent advances in neuroimaging technology, the research on brain network analysis becomes an emerging area in data mining community. Brain network data pose many unique challenges for data mining research. For example, in brain networks, the nodes (i.e., the brain regions) and edges (i.e., relationships between brain regions) are usually not given, but should be derived from the neuroimaging data. The network structure can be very noisy and uncertain. Therefore, innovative methods are required for brain network analysis. Many research efforts have been devoted to this area. They have achieved great success in various applications, such as brain network extraction, graph mining, neuroimaging data analysis. In this paper, we review some recent data mining methods which are used in the literature for mining brain network data.","PeriodicalId":90050,"journal":{"name":"SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining","volume":"4 1","pages":"30-38"},"PeriodicalIF":0.0,"publicationDate":"2014-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73662706","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shubhabrata Mukherjee, A. Varde, G. Javidi, E. Sheybani
{"title":"Predictive analysis of engine health for decision support","authors":"Shubhabrata Mukherjee, A. Varde, G. Javidi, E. Sheybani","doi":"10.1145/2641190.2641197","DOIUrl":"https://doi.org/10.1145/2641190.2641197","url":null,"abstract":"Data mining, the discovery of knowledge from data, bridges several disciplines such as database management, artificial intelligence, statistics, visualization and the domain of the data, e.g., biology or engineering. Knowledge discovered by mining the data can be used for various purposes such as developing decision support systems and intelligent tutors. In this paper we present such a data mining problem in the mechanical engineering domain where knowledge discovery from the data is performed using statistical approaches, to conduct predictive analysis for decision support. More specifically, we focus on the engine health problem which consists of using existing data on the behavior of an engine in order to predict whether the engine is capable of functioning well (i.e., it is healthy) and to offer suggestions on preventive maintenance. The data we use for this predictive analysis consists of graphs that plot process parameters such as the vibration and temperature of the engine with respect to time. In this paper we define the problem in detail, propose a solution based on statistical inference techniques, summarize our experimental evaluation and discuss the applications of this work in various fields from a decision support angle.","PeriodicalId":90050,"journal":{"name":"SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining","volume":"21 1","pages":"39-49"},"PeriodicalIF":0.0,"publicationDate":"2014-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84933008","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Mining social media with social theories: a survey","authors":"Jiliang Tang, Yi Chang, Huan Liu","doi":"10.1145/2641190.2641195","DOIUrl":"https://doi.org/10.1145/2641190.2641195","url":null,"abstract":"The increasing popularity of social media encourages more and more users to participate in various online activities and produces data in an unprecedented rate. Social media data is big, linked, noisy, highly unstructured and in- complete, and differs from data in traditional data mining, which cultivates a new research field - social media mining. Social theories from social sciences are helpful to explain social phenomena. The scale and properties of social media data are very different from these of data social sciences use to develop social theories. As a new type of social data, social media data has a fundamental question - can we apply social theories to social media data? Recent advances in computer science provide necessary computational tools and techniques for us to verify social theories on large-scale social media data. Social theories have been applied to mining social media. In this article, we review some key social theories in mining social media, their verification approaches, interesting findings, and state-of-the-art algorithms. We also discuss some future directions in this active area of mining social media with social theories.","PeriodicalId":90050,"journal":{"name":"SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining","volume":"1155 1","pages":"20-29"},"PeriodicalIF":0.0,"publicationDate":"2014-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91201947","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Clustering high dimensional data: examining differences and commonalities between subspace clustering and text clustering - a position paper","authors":"H. Kriegel, Eirini Ntoutsi","doi":"10.1145/2641190.2641192","DOIUrl":"https://doi.org/10.1145/2641190.2641192","url":null,"abstract":"The goal of this position paper is to contribute to a clear understanding of the commonalities and differences between subspace clustering and text clustering. Often text data is foisted as an ideal fit for subspace clustering due to its high dimensional nature and sparsity of the data. Indeed, the areas of subspace clustering and text clustering share similar challenges and the same goal, the simultaneous extraction of both clusters and the dimensions where these clusters are defined. However, there are fundamental differences between the two areas w.r.t object feature representation, dimension weighting and incorporation of these weights in the dissimilarity computation. We make an attempt to bridge these two domains in order to facilitate the exchange of ideas and best practices between them.","PeriodicalId":90050,"journal":{"name":"SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining","volume":"15 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2014-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/2641190.2641192","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"64158987","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Comprehensible classification models: a position paper","authors":"A. Freitas","doi":"10.1145/2594473.2594475","DOIUrl":"https://doi.org/10.1145/2594473.2594475","url":null,"abstract":"The vast majority of the literature evaluates the performance of classification models using only the criterion of predictive accuracy. This paper reviews the case for considering also the comprehensibility (interpretability) of classification models, and discusses the interpretability of five types of classification models, namely decision trees, classification rules, decision tables, nearest neighbors and Bayesian network classifiers. We discuss both interpretability issues which are specific to each of those model types and more generic interpretability issues, namely the drawbacks of using model size as the only criterion to evaluate the comprehensibility of a model, and the use of monotonicity constraints to improve the comprehensibility and acceptance of classification models by users.","PeriodicalId":90050,"journal":{"name":"SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining","volume":"28 1","pages":"1-10"},"PeriodicalIF":0.0,"publicationDate":"2014-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88046396","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Ensembles for unsupervised outlier detection: challenges and research questions a position paper","authors":"A. Zimek, R. Campello, J. Sander","doi":"10.1145/2594473.2594476","DOIUrl":"https://doi.org/10.1145/2594473.2594476","url":null,"abstract":"Ensembles for unsupervised outlier detection is an emerging topic that has been neglected for a surprisingly long time (although there are reasons why this is more difficult than supervised ensembles or even clustering ensembles). Aggarwal recently discussed algorithmic patterns of outlier detection ensembles, identified traces of the idea in the literature, and remarked on potential as well as unlikely avenues for future transfer of concepts from supervised ensembles. Complementary to his points, here we focus on the core ingredients for building an outlier ensemble, discuss the first steps taken in the literature, and identify challenges for future research.","PeriodicalId":90050,"journal":{"name":"SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining","volume":"23 1","pages":"11-22"},"PeriodicalIF":0.0,"publicationDate":"2014-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83567658","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}