{"title":"Privacy Preserving Burst Detection of Distributed Time Series Data Using Linear Transforms","authors":"L. Singh, Mehmet Sayal","doi":"10.1109/CIDM.2007.368937","DOIUrl":"https://doi.org/10.1109/CIDM.2007.368937","url":null,"abstract":"In this paper, we consider burst detection within the context of privacy. In our scenario, multiple parties want to detect a burst in aggregated time series data, but none of the parties want to disclose their individual data. Our approach calculates bursts directly from linear transform coefficients using a cumulative sum calculation. In order to reduce the chance of a privacy breech, we present multiple data perturbation strategies and compare the varying degrees of privacy preserved. Our strategies do not share raw time series data and still detect significant bursts. We empirically demonstrate this using both real and synthetic distributed data sets. When evaluating both privacy guarantees and burst detection accuracy, we find that our percentage thresholding heuristic maintains a high degree of privacy while accurately identifying bursts of varying widths","PeriodicalId":423707,"journal":{"name":"2007 IEEE Symposium on Computational Intelligence and Data Mining","volume":"2806 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134165505","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Knowledge Based Stacking of Hyperspectral Data for Land Cover Classification","authors":"Yangchi Chen, M. Crawford, Joydeep Ghosh","doi":"10.1109/CIDM.2007.368890","DOIUrl":"https://doi.org/10.1109/CIDM.2007.368890","url":null,"abstract":"Hyperspectral data provide new capability for discriminating spectrally similar classes, but unfortunately such class signatures often overlap in multiple narrow bands. Thus, it is useful to incorporate reliable spatial information when possible. However, this can result in increased dimensionality of the feature vector, which is already large for hyperspectral data. Markov random field (MRF) approaches, such as iterated conditional modes (ICM), can provide evidence relative to the class of a neighbor through Gibbs' distribution, but suffer from computational requirements and curse of dimensionality issues when applied to hyperspectral data. In this paper, a new knowledge based stacking approach is presented to utilize spatial information within homogeneous regions and at class boundaries, while avoiding the curse of dimensionality. The approach learns the location of the class boundary and combines original bands with the extracted spectral information of a neighborhood to train a hierarchical support vector machine (HSVM) classifier. The new method is applied to hyperspectral data collected by the Hyperion sensor on the EO-1 satellite over the Okavango delta of Botswana. Classification accuracies are compared to those obtained by a pixel-wise HSVM classifier, majority filtering and ICM to demonstrate the advantage of the knowledge based stacking approach.","PeriodicalId":423707,"journal":{"name":"2007 IEEE Symposium on Computational Intelligence and Data Mining","volume":"109 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134355420","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fuzzy Wavelet Modeling Using Data Clustering","authors":"N. Sadati, B. Marami","doi":"10.1109/CIDM.2007.368861","DOIUrl":"https://doi.org/10.1109/CIDM.2007.368861","url":null,"abstract":"In this paper, a novel approach for tuning the parameters of fuzzy wavelet systems which are used for modeling of nonlinear and complex systems is proposed. In fuzzy inference system, each fuzzy rule is analogous to a wavelet basis function multiplied by a coefficient. Using clustering techniques, the center of these basis functions are located in the detected center of clusters. In this way, not only the approximation accuracy is increased, but also the number of unknown parameters is decreased. The feasibility of the proposed method is shown by modeling two highly nonlinear functions. The comparison of the results using the proposed approach, with the previous schemes, shows the effectiveness and superiority of this algorithm.","PeriodicalId":423707,"journal":{"name":"2007 IEEE Symposium on Computational Intelligence and Data Mining","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133826932","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Navigation Space Based Intranet Usability Analysis","authors":"P. Géczy, Noriaki Izumi, S. Akaho, K. Hasida","doi":"10.1109/CIDM.2007.368866","DOIUrl":"https://doi.org/10.1109/CIDM.2007.368866","url":null,"abstract":"Usability is a vital quality factor of electronic environments. Increasing complexity of Web based platforms and mining large data volumes constantly rise demands on effective usability analysis tools. The article presents a novel formalism allowing examination of usability characteristics with the spectrum of metrics. The approach was applied to the usability analysis of a large corporate intranet. Majority of the users were knowledge workers. Important usability features have been revealed. The knowledge workers efficiently utilized only a minor portion of available intranet resources and exhibited strong pattern formation tendency. The frequently repeating browsing patterns were generally easily executable","PeriodicalId":423707,"journal":{"name":"2007 IEEE Symposium on Computational Intelligence and Data Mining","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125229492","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Efficient Data Management Scheme Based on Spatial and Temporal Characteristics in Virtual Environments","authors":"Hsing-Jen Chen, D. Liu","doi":"10.1109/CIDM.2007.368883","DOIUrl":"https://doi.org/10.1109/CIDM.2007.368883","url":null,"abstract":"In a distributed interactive walkthrough system, there are two major bottlenecks which cause performance degradation. One is the server-side workload in the client-server architecture; the other is the network transmission delay. In this paper, we present a knowledge-based data management scheme which takes consideration of both internal (memory) and external (disk) data storage management to ease server-side workload and reduce network transmissions. Our system first analyzes users' logs to discover the spatial and temporal semantic patterns in the virtual environment. Using these patterns, we can determine the proper data layout on disk, and better improve our caching mechanism. Experimental results show good prediction rates and achieve improvements in overall system performance","PeriodicalId":423707,"journal":{"name":"2007 IEEE Symposium on Computational Intelligence and Data Mining","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131120831","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
R. Granat, G. Aydin, M. Pierce, Zhigang Qi, Y. Bock
{"title":"Analysis of streaming GPS measurements of surface displacement through a web services environment","authors":"R. Granat, G. Aydin, M. Pierce, Zhigang Qi, Y. Bock","doi":"10.1109/CIDM.2007.368951","DOIUrl":"https://doi.org/10.1109/CIDM.2007.368951","url":null,"abstract":"We present a method for performing mode classification of real-time streams of GPS surface position data. Our approach has two parts: an algorithm for robust, unconstrained fitting of hidden Markov models (HMMs) to continuous-valued time series, and SensorGrid technology that manages data streams through a series of filters coupled with a publish/subscribe messaging system. The SensorGrid framework enables strong connections between data sources, the HMM time series analysis software, and users. We demonstrate our approach through a Web portal environment through which users can easily access data from the SCIGN and SOPAC GPS networks in Southern California, apply the analysis method, and view results. Ongoing real-time mode classifications of streaming GPS data are displayed in a map-based visualization interface","PeriodicalId":423707,"journal":{"name":"2007 IEEE Symposium on Computational Intelligence and Data Mining","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133866665","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Collaborative Knowledge Discovery & Data Mining: From Knowledge to Experience","authors":"T. Horeis, B. Sick","doi":"10.1109/CIDM.2007.368905","DOIUrl":"https://doi.org/10.1109/CIDM.2007.368905","url":null,"abstract":"Experts have important qualitative knowledge about interrelations between more or less abstract concepts in an application area. However, the knowledge of a single expert is typically quite uncertain (e.g., incomplete or imprecise). By fusing the knowledge of several experts it would be possible to obtain more certain and, therefore, more valuable knowledge. Conventional systems for knowledge discovery (KD) and data mining (DM) have the ability to extract valid rules from huge data sets. These rules describe dependencies between attributes and classes in a quantitative way, for instance. By fusing this kind of knowledge with the combined, qualitative knowledge of several experts it would be possible to obtain more comprehensive knowledge about an application area. In this article, we propose a concept for a new KD & DM technique based on computational intelligence: collaborative knowledge discovery (CKD). These techniques combines the uncertain knowledge of several experts using methods based on Dempster-Shafer theory. The combined human knowledge is again fused with automatically extracted, well interpretable knowledge (fuzzy rules embedded in a radial basis function neural network) of a conventional KD system. Thus, a CKD system not only acquires more comprehensive knowledge, but also experience (knowledge about knowledge), meaning that it is able to explain automatically extracted rules to the human experts and to assess the interestingness (e.g., novelty or utility) of these rules. This can be done by adapting inference mechanisms from the field of probabilistic argumentation systems. A CKD system will comprise self-awareness mechanisms (it must know what it knows) as well as environment-awareness mechanisms (it must know what human experts know or what they want to now). In order to reduce the effort for knowledge acquisition, a CKD system must learn (pro-)actively. There are many application areas for such CKD systems, e.g., in the field of technical data mining (quality control, process monitoring, etc.)","PeriodicalId":423707,"journal":{"name":"2007 IEEE Symposium on Computational Intelligence and Data Mining","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134449089","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Induction Tree methods to classify M. tuberculosis spoligotypes","authors":"Georges Valétudie","doi":"10.1109/CIDM.2007.368859","DOIUrl":"https://doi.org/10.1109/CIDM.2007.368859","url":null,"abstract":"In this paper we compared and analyzed four graph induction methods to automatically classify spoligotypes. A spoligotype is a sequence of 43 binary values provided by a DNA analysis technique. This method is known to be useful and efficient to many supervised learning problems. We found it interesting to use these techniques especially for sequential data, in order to create a classifier based on one decision rule per class","PeriodicalId":423707,"journal":{"name":"2007 IEEE Symposium on Computational Intelligence and Data Mining","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122384332","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Selective Approach To Handling Topic Oriented Tasks On The World Wide Web","authors":"Amit Awekar, Jaewoo Kang","doi":"10.1109/CIDM.2007.368894","DOIUrl":"https://doi.org/10.1109/CIDM.2007.368894","url":null,"abstract":"We address the problem of handling topic oriented tasks on the World Wide Web. Our aim is to find most relevant and important pages for broad-topic queries while searching in a small set of candidate pages. We present a link analysis based algorithm SelHITS which is an improvement over Kleinberg's HITS algorithm. We introduce concept of virtual links to exploit latent information in the hyperlinked environment. Selective expansion of the root set and novel ranking strategy are the distinguishing features of our approach. Selective expansion method avoids topic drift and provides results consistent with only one interpretation of the query. Experimental evaluation and user feedback show that our algorithm indeed distills the most relevant and important pages for broad-topic queries. Trends in user feedback suggests that there exists a uniform notion of quality of search results within users","PeriodicalId":423707,"journal":{"name":"2007 IEEE Symposium on Computational Intelligence and Data Mining","volume":"86 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124808534","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Dynamic Graph Model for Analyzing Streaming News Documents","authors":"E. Hohman, D. Marchette","doi":"10.1109/CIDM.2007.368911","DOIUrl":"https://doi.org/10.1109/CIDM.2007.368911","url":null,"abstract":"In this paper we consider the problem of analyzing streaming documents, in particular streaming news stories. The system is designed to extract statistics from the document, incorporate these into a graph-based model, and discard the document to reduce storage requirements. The model is defined in terms of a changing lexicon and sub-lexicons at each node in the graph, with the nodes of the graph representing topics. An approximation to the TFIDF term weighting is introduced. We illustrate the methodology on a dataset of news articles, and discuss the dynamic nature of the model","PeriodicalId":423707,"journal":{"name":"2007 IEEE Symposium on Computational Intelligence and Data Mining","volume":"2015 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127666488","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}