StreamKDD '10Pub Date : 2011-03-31DOI: 10.1145/1833280.1833281
L. Becchetti, Ilaria Bordino, S. Leonardi, A. Rosén
{"title":"Fully decentralized computation of aggregates over data streams","authors":"L. Becchetti, Ilaria Bordino, S. Leonardi, A. Rosén","doi":"10.1145/1833280.1833281","DOIUrl":"https://doi.org/10.1145/1833280.1833281","url":null,"abstract":"In several emerging applications, data is collected in massive streams at several distributed points of observation. A basic and challenging task is to allow every node to monitor a neighbourhood of interest by issuing continuous aggregate queries on the streams observed in its vicinity. This class of algorithms is fully decentralized and diffusive in nature: collecting all data at few central nodes of the network is unfeasible in networks of low capability devices or in the presence of massive data sets.\u0000 The main difficulty in designing diffusive algorithms is to cope with duplicate detections. These arise both from the observation of the same event at several nodes of the network and/or receipt of the same aggregated information along multiple paths of diffusion.\u0000 In this paper, we consider fully decentralized algorithms that answer locally continuous aggregate queries on the number of distinct events, total number of events and the second frequency moment in the scenario outlined above. The proposed algorithms use in the worst case or on realistic distributions sublinear space at every node.\u0000 We also propose strategies that minimize the communication needed to update the aggregates when new events are observed. We finally present experimental analysis providing evidence for the efficiency and accuracy of our algorithms on realistic simulated scenarios.","PeriodicalId":383372,"journal":{"name":"StreamKDD '10","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131661872","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
StreamKDD '10Pub Date : 2010-07-25DOI: 10.1145/1833280.1833284
R. Shankar, G. V. Kiran, Vikram Pudi
{"title":"Evolutionary clustering using frequent itemsets","authors":"R. Shankar, G. V. Kiran, Vikram Pudi","doi":"10.1145/1833280.1833284","DOIUrl":"https://doi.org/10.1145/1833280.1833284","url":null,"abstract":"Evolutionary clustering is an emerging research area addressing the problem of clustering dynamic data. An evolutionary clustering should take care of two conflicting criteria: preserving the current cluster quality and not deviating too much from the recent history. In this paper we propose an algorithm for evolutionary clustering using frequent itemsets. A frequent itemset based approach for evolutionary clustering is natural and it automatically satisfy the two criteria of evolutionary clustering. We provide theoretical as well as experimental proofs to support our claims. We performed experiments on our approach using different datasets and the results show that our approach is comparable to most of the existing algorithms for evolutionary clustering.","PeriodicalId":383372,"journal":{"name":"StreamKDD '10","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126108342","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
StreamKDD '10Pub Date : 2010-07-25DOI: 10.1145/1833280.1833283
J. Gomes, Ernestina Menasalvas Ruiz, Pedro A. C. Sousa
{"title":"CALDS: context-aware learning from data streams","authors":"J. Gomes, Ernestina Menasalvas Ruiz, Pedro A. C. Sousa","doi":"10.1145/1833280.1833283","DOIUrl":"https://doi.org/10.1145/1833280.1833283","url":null,"abstract":"Drift detection methods in data streams can detect changes in incoming data so that learned models can be used to represent the underlying population. In many real-world scenarios context information is available and could be exploited to improve existing approaches, by detecting or even anticipating to recurring concepts in the underlying population. Several applications, among them health-care or recommender systems, lend themselves to use such information as data from sensors is available but is not being used. Nevertheless, new challenges arise when integrating context with drift detection methods. Modeling and comparing context information, representing the context-concepts history and storing previously learned concepts for reuse are some of the critical problems. In this work, we propose the Context-aware Learning from Data Streams (CALDS) system to improve existing drift detection methods by exploiting available context information. Our enhancement is seamless: we use the association between context information and learned concepts to improve detection and adaptation to drift when concepts reappear. We present and discuss our preliminary experimental results with synthetic and real datasets.","PeriodicalId":383372,"journal":{"name":"StreamKDD '10","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114566280","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
StreamKDD '10Pub Date : 2010-07-25DOI: 10.1145/1833280.1833285
H. Kriegel, Peer Kröger, Eirini Ntoutsi, A. Zimek
{"title":"Towards subspace clustering on dynamic data: an incremental version of PreDeCon","authors":"H. Kriegel, Peer Kröger, Eirini Ntoutsi, A. Zimek","doi":"10.1145/1833280.1833285","DOIUrl":"https://doi.org/10.1145/1833280.1833285","url":null,"abstract":"Todays data are high dimensional and dynamic, thus clustering over such kind of data is rather complicated. To deal with the high dimensionality problem, the subspace clustering research area has lately emerged that aims at finding clusters in subspaces of the original feature space. So far, the subspace clustering methods are mainly static and thus, cannot address the dynamic nature of modern data. In this paper, we propose an incremental version of the density based projected clustering algorithm PreDeCon, called incPreDeCon. The proposed algorithm efficiently updates only those subspace clusters that might be affected due to the population update.","PeriodicalId":383372,"journal":{"name":"StreamKDD '10","volume":"221 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116011392","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
StreamKDD '10Pub Date : 2010-07-25DOI: 10.1145/1833280.1833286
Milos Krstajic, E. Bertini, Florian Mansmann, D. Keim
{"title":"Visual analysis of news streams with article threads","authors":"Milos Krstajic, E. Bertini, Florian Mansmann, D. Keim","doi":"10.1145/1833280.1833286","DOIUrl":"https://doi.org/10.1145/1833280.1833286","url":null,"abstract":"The analysis of large quantities of news is an emerging area in the field of data analysis and visualization. International agencies collect thousands of news every day from a large number of sources and making sense of them is becoming increasingly complex due to the rate of the incoming news, as well as the inherent complexity of analyzing large quantities of evolving text corpora. Current visual techniques that deal with temporal evolution of such complex datasets, together with research efforts in related domains like text mining and topic detection and tracking, represent early attempts to understand, gain insight and make sense of these data. Despite these initial propositions, there is still a lack of techniques dealing directly with the problem of visualizing news streams in a \"on-line\" fashion, that is, in a way that the evolution of news can be monitored in real-time by the operator. In this paper we propose a purely visual technique that permits to see the evolution of news in real-time. The technique permits to show the stream of news as they enter into the system as well as a series of important threads which are computed on the fly. By merging single articles into threads, the technique permits to offload the visualization and retain only the most relevant information. The proposed technique is applied to the visualization of news streams generated by a news aggregation system that monitors over 4000 sites from 1600 key news portals world-wide and retrieves over 80000 reports per day in 43 languages.","PeriodicalId":383372,"journal":{"name":"StreamKDD '10","volume":"239 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124448504","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
StreamKDD '10Pub Date : 2010-07-25DOI: 10.1145/1833280.1833282
I. Assent, P. Kranen, C. Baldauf, T. Seidl
{"title":"Detecting outliers on arbitrary data streams using anytime approaches","authors":"I. Assent, P. Kranen, C. Baldauf, T. Seidl","doi":"10.1145/1833280.1833282","DOIUrl":"https://doi.org/10.1145/1833280.1833282","url":null,"abstract":"Data streams are gaining importance in many sensoring and monitoring environments. Frequent mining tasks on data streams include classification, modeling and outlier detection. Since often the data arrival rates vary, anytime algorithms have been proposed for stream clustering and classification, which can deliver a fast first result and improve their result if more time is available. In this work, we propose the novel concept of anytime outlier detection and introduce an algorithm for anytime outlier detection based on a hierarchical cluster representation. We show promising results in preliminary experiments and discuss future research for anytime outlier detection.","PeriodicalId":383372,"journal":{"name":"StreamKDD '10","volume":"262 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122753054","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
StreamKDD '10Pub Date : 2010-07-25DOI: 10.1145/1833280.1833288
Wenyan Wu, L. Gruenwald
{"title":"Research issues in mining multiple data streams","authors":"Wenyan Wu, L. Gruenwald","doi":"10.1145/1833280.1833288","DOIUrl":"https://doi.org/10.1145/1833280.1833288","url":null,"abstract":"There exist emerging applications of data streams that have mining requirements. Although single data stream mining has been extensively studied, little research has been done for mining multiple data streams (MDS), which are more complex than single data streams and involved in many real-world applications. This paper discusses the characteristics of MDS, proposes a formal definition for them, analyzes MDS application in terms of mining requirements, and identifies research issues for MDS mining.","PeriodicalId":383372,"journal":{"name":"StreamKDD '10","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116183400","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
StreamKDD '10Pub Date : 2010-07-25DOI: 10.1145/1833280.1833287
Rikard Laxhammar, G. Falkman
{"title":"Conformal prediction for distribution-independent anomaly detection in streaming vessel data","authors":"Rikard Laxhammar, G. Falkman","doi":"10.1145/1833280.1833287","DOIUrl":"https://doi.org/10.1145/1833280.1833287","url":null,"abstract":"This paper presents a novel application of the theory of conformal prediction for distribution-independent on-line learning and anomaly detection. We exploit the fact that conformal predictors give valid prediction sets at specified confidence levels under the relatively weak assumption that the (normal) training data together with (normal) observations to be predicted have been generated from the same distribution. If the actual observation is not included in the possibly empty prediction set, it is classified as anomalous at the corresponding significance level. Interpreting the significance level as an upper bound of the probability that a normal observation is mistakenly classified as anomalous, we can conveniently adjust the sensitivity to anomalies while controlling the rate of false alarms without having to find any application specific thresholds. The proposed method has been evaluated in the domain of sea surveillance using recorded data assumed to be normal. The validity of the prediction sets is justified by the empirical error rate which is just below the significance level. In addition, experiments with simulated anomalous data indicate that anomaly detection sensitivity is superior to that of two previously proposed methods.","PeriodicalId":383372,"journal":{"name":"StreamKDD '10","volume":"96 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122561956","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}