SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining最新文献

筛选
英文 中文
Contextual crowd intelligence 情境群体智能
B. Ooi, K. Tan, Quoc Trung Tran, J. Yip, Gang Chen, Zheng Jye Ling, Thi Nguyen, A. Tung, Meihui Zhang
{"title":"Contextual crowd intelligence","authors":"B. Ooi, K. Tan, Quoc Trung Tran, J. Yip, Gang Chen, Zheng Jye Ling, Thi Nguyen, A. Tung, Meihui Zhang","doi":"10.1145/2674026.2674032","DOIUrl":"https://doi.org/10.1145/2674026.2674032","url":null,"abstract":"Most data analytics applications are industry/domain specific, e.g., predicting patients at high risk of being admitted to intensive care unit in the healthcare sector or predicting malicious SMSs in the telecommunication sector. Existing solutions are based on \"best practices\", i.e., the systems' decisions are knowledge-driven and/or data-driven. However, there are rules and exceptional cases that can only be precisely formulated and identified by subject-matter experts (SMEs) who have accumulated many years of experience. This paper envisions a more intelligent database management system (DBMS) that captures such knowledge to effectively address the industry/domain specific applications. At the core, the system is a hybrid human-machine database engine where the machine interacts with the SMEs as part of a feedback loop to gather, infer, ascertain and enhance the database knowledge and processing. We discuss the challenges towards building such a system through examples in healthcare predictive analysis -- a popular area for big data analytics.","PeriodicalId":90050,"journal":{"name":"SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining","volume":"122 1","pages":"39-46"},"PeriodicalIF":0.0,"publicationDate":"2014-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79461856","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Open challenges for data stream mining research 数据流挖掘研究的开放挑战
G. Krempl, I. Žliobaitė, D. Brzezinski, E. Hüllermeier, Mark Last, V. Lemaire, T. Noack, Ammar Shaker, S. Sievi, M. Spiliopoulou, J. Stefanowski
{"title":"Open challenges for data stream mining research","authors":"G. Krempl, I. Žliobaitė, D. Brzezinski, E. Hüllermeier, Mark Last, V. Lemaire, T. Noack, Ammar Shaker, S. Sievi, M. Spiliopoulou, J. Stefanowski","doi":"10.1145/2674026.2674028","DOIUrl":"https://doi.org/10.1145/2674026.2674028","url":null,"abstract":"Every day, huge volumes of sensory, transactional, and web data are continuously generated as streams, which need to be analyzed online as they arrive. Streaming data can be considered as one of the main sources of what is called big data. While predictive modeling for data streams and big data have received a lot of attention over the last decade, many research approaches are typically designed for well-behaved controlled problem settings, overlooking important challenges imposed by real-world applications. This article presents a discussion on eight open challenges for data stream mining. Our goal is to identify gaps between current research and meaningful applications, highlight open problems, and define new application-relevant research directions for data stream mining. The identified challenges cover the full cycle of knowledge discovery and involve such problems as: protecting data privacy, dealing with legacy systems, handling incomplete and delayed information, analysis of complex data, and evaluation of stream mining algorithms. The resulting analysis is illustrated by practical applications and provides general suggestions concerning lines of future research in data stream mining.","PeriodicalId":90050,"journal":{"name":"SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining","volume":"1 1","pages":"1-10"},"PeriodicalIF":0.0,"publicationDate":"2014-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91212297","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 267
Interview: Michael Brodie, leading database researcher, industry leader, thinker 访谈:Michael Brodie,著名数据库研究员,行业领袖,思想家
Gregory Piatetsky
{"title":"Interview: Michael Brodie, leading database researcher, industry leader, thinker","authors":"Gregory Piatetsky","doi":"10.1145/2674026.2674035","DOIUrl":"https://doi.org/10.1145/2674026.2674035","url":null,"abstract":"We discuss the most important database research advances, industry developments, role of relational and NoSQL databases, Computing Reality, Data Curation, Cloud Computing, Tamr and Jisto startups, what he learned as a chief Scientist of Verizon, Knowledge Discovery, Privacy Issues, and more.","PeriodicalId":90050,"journal":{"name":"SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining","volume":"16 1","pages":"57-63"},"PeriodicalIF":0.0,"publicationDate":"2014-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/2674026.2674035","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"64162050","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Mining text and social streams: a review 挖掘文本和社交流:综述
C. Aggarwal
{"title":"Mining text and social streams: a review","authors":"C. Aggarwal","doi":"10.1145/2641190.2641194","DOIUrl":"https://doi.org/10.1145/2641190.2641194","url":null,"abstract":"The large amount of text data which are continuously produced over time in a variety of large scale applications such as social networks results in massive streams of data. Typically massive text streams are created by very large scale interactions of individuals, or by structured creations of particular kinds of content by dedicated organizations. An example in the latter category would be the massive text streams created by news-wire services. Such text streams provide unprecedented challenges to data mining algorithms from an efficiency perspective. In this paper, we review text stream mining algorithms for a wide variety of problems in data mining such as clustering, classification and topic modeling. A recent challenge arises in the context of social streams, which are generated by large social networks such as Twitter. We also discuss a number of future challenges in this area of research.","PeriodicalId":90050,"journal":{"name":"SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining","volume":"5 1","pages":"9-19"},"PeriodicalIF":0.0,"publicationDate":"2014-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81179722","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 30
Brain network analysis: a data mining perspective 大脑网络分析:数据挖掘的视角
Xiangnan Kong, Philip S. Yu
{"title":"Brain network analysis: a data mining perspective","authors":"Xiangnan Kong, Philip S. Yu","doi":"10.1145/2641190.2641196","DOIUrl":"https://doi.org/10.1145/2641190.2641196","url":null,"abstract":"Following the recent advances in neuroimaging technology, the research on brain network analysis becomes an emerging area in data mining community. Brain network data pose many unique challenges for data mining research. For example, in brain networks, the nodes (i.e., the brain regions) and edges (i.e., relationships between brain regions) are usually not given, but should be derived from the neuroimaging data. The network structure can be very noisy and uncertain. Therefore, innovative methods are required for brain network analysis. Many research efforts have been devoted to this area. They have achieved great success in various applications, such as brain network extraction, graph mining, neuroimaging data analysis. In this paper, we review some recent data mining methods which are used in the literature for mining brain network data.","PeriodicalId":90050,"journal":{"name":"SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining","volume":"4 1","pages":"30-38"},"PeriodicalIF":0.0,"publicationDate":"2014-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73662706","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 45
Predictive analysis of engine health for decision support 用于决策支持的发动机运行状况预测分析
Shubhabrata Mukherjee, A. Varde, G. Javidi, E. Sheybani
{"title":"Predictive analysis of engine health for decision support","authors":"Shubhabrata Mukherjee, A. Varde, G. Javidi, E. Sheybani","doi":"10.1145/2641190.2641197","DOIUrl":"https://doi.org/10.1145/2641190.2641197","url":null,"abstract":"Data mining, the discovery of knowledge from data, bridges several disciplines such as database management, artificial intelligence, statistics, visualization and the domain of the data, e.g., biology or engineering. Knowledge discovered by mining the data can be used for various purposes such as developing decision support systems and intelligent tutors. In this paper we present such a data mining problem in the mechanical engineering domain where knowledge discovery from the data is performed using statistical approaches, to conduct predictive analysis for decision support. More specifically, we focus on the engine health problem which consists of using existing data on the behavior of an engine in order to predict whether the engine is capable of functioning well (i.e., it is healthy) and to offer suggestions on preventive maintenance. The data we use for this predictive analysis consists of graphs that plot process parameters such as the vibration and temperature of the engine with respect to time. In this paper we define the problem in detail, propose a solution based on statistical inference techniques, summarize our experimental evaluation and discuss the applications of this work in various fields from a decision support angle.","PeriodicalId":90050,"journal":{"name":"SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining","volume":"21 1","pages":"39-49"},"PeriodicalIF":0.0,"publicationDate":"2014-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84933008","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Mining social media with social theories: a survey 用社会理论挖掘社交媒体:一项调查
Jiliang Tang, Yi Chang, Huan Liu
{"title":"Mining social media with social theories: a survey","authors":"Jiliang Tang, Yi Chang, Huan Liu","doi":"10.1145/2641190.2641195","DOIUrl":"https://doi.org/10.1145/2641190.2641195","url":null,"abstract":"The increasing popularity of social media encourages more and more users to participate in various online activities and produces data in an unprecedented rate. Social media data is big, linked, noisy, highly unstructured and in- complete, and differs from data in traditional data mining, which cultivates a new research field - social media mining. Social theories from social sciences are helpful to explain social phenomena. The scale and properties of social media data are very different from these of data social sciences use to develop social theories. As a new type of social data, social media data has a fundamental question - can we apply social theories to social media data? Recent advances in computer science provide necessary computational tools and techniques for us to verify social theories on large-scale social media data. Social theories have been applied to mining social media. In this article, we review some key social theories in mining social media, their verification approaches, interesting findings, and state-of-the-art algorithms. We also discuss some future directions in this active area of mining social media with social theories.","PeriodicalId":90050,"journal":{"name":"SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining","volume":"1155 1","pages":"20-29"},"PeriodicalIF":0.0,"publicationDate":"2014-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91201947","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 138
Clustering high dimensional data: examining differences and commonalities between subspace clustering and text clustering - a position paper 聚类高维数据:检查子空间聚类和文本聚类之间的差异和共性-立场文件
H. Kriegel, Eirini Ntoutsi
{"title":"Clustering high dimensional data: examining differences and commonalities between subspace clustering and text clustering - a position paper","authors":"H. Kriegel, Eirini Ntoutsi","doi":"10.1145/2641190.2641192","DOIUrl":"https://doi.org/10.1145/2641190.2641192","url":null,"abstract":"The goal of this position paper is to contribute to a clear understanding of the commonalities and differences between subspace clustering and text clustering. Often text data is foisted as an ideal fit for subspace clustering due to its high dimensional nature and sparsity of the data. Indeed, the areas of subspace clustering and text clustering share similar challenges and the same goal, the simultaneous extraction of both clusters and the dimensions where these clusters are defined. However, there are fundamental differences between the two areas w.r.t object feature representation, dimension weighting and incorporation of these weights in the dissimilarity computation. We make an attempt to bridge these two domains in order to facilitate the exchange of ideas and best practices between them.","PeriodicalId":90050,"journal":{"name":"SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining","volume":"15 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2014-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/2641190.2641192","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"64158987","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Comprehensible classification models: a position paper 可理解的分类模型:立场文件
A. Freitas
{"title":"Comprehensible classification models: a position paper","authors":"A. Freitas","doi":"10.1145/2594473.2594475","DOIUrl":"https://doi.org/10.1145/2594473.2594475","url":null,"abstract":"The vast majority of the literature evaluates the performance of classification models using only the criterion of predictive accuracy. This paper reviews the case for considering also the comprehensibility (interpretability) of classification models, and discusses the interpretability of five types of classification models, namely decision trees, classification rules, decision tables, nearest neighbors and Bayesian network classifiers. We discuss both interpretability issues which are specific to each of those model types and more generic interpretability issues, namely the drawbacks of using model size as the only criterion to evaluate the comprehensibility of a model, and the use of monotonicity constraints to improve the comprehensibility and acceptance of classification models by users.","PeriodicalId":90050,"journal":{"name":"SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining","volume":"28 1","pages":"1-10"},"PeriodicalIF":0.0,"publicationDate":"2014-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88046396","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 532
Ensembles for unsupervised outlier detection: challenges and research questions a position paper 无监督异常值检测的集成:挑战和研究问题
A. Zimek, R. Campello, J. Sander
{"title":"Ensembles for unsupervised outlier detection: challenges and research questions a position paper","authors":"A. Zimek, R. Campello, J. Sander","doi":"10.1145/2594473.2594476","DOIUrl":"https://doi.org/10.1145/2594473.2594476","url":null,"abstract":"Ensembles for unsupervised outlier detection is an emerging topic that has been neglected for a surprisingly long time (although there are reasons why this is more difficult than supervised ensembles or even clustering ensembles). Aggarwal recently discussed algorithmic patterns of outlier detection ensembles, identified traces of the idea in the literature, and remarked on potential as well as unlikely avenues for future transfer of concepts from supervised ensembles. Complementary to his points, here we focus on the core ingredients for building an outlier ensemble, discuss the first steps taken in the literature, and identify challenges for future research.","PeriodicalId":90050,"journal":{"name":"SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining","volume":"23 1","pages":"11-22"},"PeriodicalIF":0.0,"publicationDate":"2014-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83567658","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 245
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信