{"title":"Mining Temporal Patterns with Quantitative Intervals","authors":"Thomas Guyet, R. Quiniou","doi":"10.1109/ICDMW.2008.16","DOIUrl":"https://doi.org/10.1109/ICDMW.2008.16","url":null,"abstract":"In this paper we consider the problem of discovering frequent temporal patterns in a database of temporal sequences, where a temporal sequence is a set of items with associated dates and durations. Since the quantitative temporal information appears to be fundamental in many contexts, it is taken into account in the mining processes and returned as part of the extracted knowledge. To this end, we have adapted the classical a priori (Agrawal and Srikant, 1995) framework to propose an efficient algorithm based on a hyper-cube representation of temporal sequences. The extraction of quantitative temporal information is performed using a density estimation of the distribution of event intervals from the temporal sequences. An evaluation on synthetic data sets shows that the proposed algorithm can robustly extract frequent temporal patterns with quantitative temporal extents.","PeriodicalId":175955,"journal":{"name":"2008 IEEE International Conference on Data Mining Workshops","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130914839","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Keigo Yoshida, M. Inui, T. Yairi, K. Machida, Masaki Shioya, Y. Masukawa
{"title":"Identification of Causal Variables for Building Energy Fault Detection by Semi-supervised LDA and Decision Boundary Analysis","authors":"Keigo Yoshida, M. Inui, T. Yairi, K. Machida, Masaki Shioya, Y. Masukawa","doi":"10.1109/ICDMW.2008.44","DOIUrl":"https://doi.org/10.1109/ICDMW.2008.44","url":null,"abstract":"This paper addresses the identification problem of causal variables for the system anomaly. In real-world complicated systems, even experts often fail to specify causal factors, thus they attempt to detect the anomaly with exploratory heuristics. Our goal is to offer further information that supports anomaly cause analysis using the incomplete empirical knowledge. Proposed technique discovers responsible factors for the fault by leveraging domain knowledge with an effective combination of semi-supervised linear discriminant analysis (LDA) and boundary-based discriminative subspace identification method. Experimental results on synthetic and real dataset confirmed validity of our approach. Moreover, we applied this method to the building energy fault diagnosis and succeeded in extracting causal variables for energy waste in a building.","PeriodicalId":175955,"journal":{"name":"2008 IEEE International Conference on Data Mining Workshops","volume":"114 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115377070","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Hierarchical Text Categorization in a Transductive Setting","authors":"Michelangelo Ceci","doi":"10.1109/ICDMW.2008.126","DOIUrl":"https://doi.org/10.1109/ICDMW.2008.126","url":null,"abstract":"Transductive learning is the learning setting that permits to learn from \"particular to particular'' and to consider both labelled and unlabelled examples when taking classification decisions. In this paper, we investigate the use of transductive learning in the context of hierarchical text categorization. At this aim, we exploit a modified version of an inductive hierarchical learning framework that permits to classify documents in internal and leaf nodes of a hierarchy of categories. Experimental results on real world datasets are reported.","PeriodicalId":175955,"journal":{"name":"2008 IEEE International Conference on Data Mining Workshops","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115455015","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Title-Composing Support System for Reaching New Audiences","authors":"Yoko Nishihara, W. Sunayama","doi":"10.1109/ICDMW.2008.24","DOIUrl":"https://doi.org/10.1109/ICDMW.2008.24","url":null,"abstract":"This paper proposes a support system for composing good titles for research papers in order to reach new audiences. Our system takes titles as input. The system evaluates title understandability and interest level of a title. The system ranks titles and outputs a title list. Users are able to recompose their titles by referring to the list and each evaluation value. Using the system, users can obtain new audiences who have not previously been interested in the userpsilas research area. Experimental results showed that our system is able to rank titles in descending order of audiencespsila choices.","PeriodicalId":175955,"journal":{"name":"2008 IEEE International Conference on Data Mining Workshops","volume":"283 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115634224","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sasi K. Pitchaimalai, C. Ordonez, Carlos Garcia-Alvarado
{"title":"Efficient Distance Computation Using SQL Queries and UDFs","authors":"Sasi K. Pitchaimalai, C. Ordonez, Carlos Garcia-Alvarado","doi":"10.1109/ICDMW.2008.135","DOIUrl":"https://doi.org/10.1109/ICDMW.2008.135","url":null,"abstract":"Distance computation is one of the most computationally intensive operations employed by many data mining algorithms. Performing such matrix computations within a DBMS creates many optimization challenges. We propose techniques to efficiently compute Euclidean distance using SQL queries and user-defined functions (UDFs). We concentrate on efficient Euclidean distance computation for the well-known K-means clustering algorithm. We present SQL query optimizations and a scalar UDF to compute Euclidean distance. We experimentally evaluate performance and scalability of our proposed SQL queries and UDF with large data sets on a modern DBMS. We benchmark distance computation on two important data mining techniques: clustering and classification. In general, UDFs are faster than SQL queries because they are executed in main memory. Data set size is the main factor impacting performance, followed by data set dimensionality.","PeriodicalId":175955,"journal":{"name":"2008 IEEE International Conference on Data Mining Workshops","volume":"514 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116207931","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Interactive Exploration of Model-Based Automatically Extracted Data","authors":"A. Coden, I. Sominsky, M. Tanenblatt","doi":"10.1109/ICDMW.2008.34","DOIUrl":"https://doi.org/10.1109/ICDMW.2008.34","url":null,"abstract":"We present an interactive system to query, explore and navigate data according to a hierarchical knowledge model that had been automatically populated from unstructured textual data. Our system differs from systems assisting in the navigation of domain ontologies and mining between pairs of concepts in that it enables access to unstructured data by abstract concepts and relations between them. Concepts in turn are specified by sets of models and their relations. However, some concepts may not have a direct representation in the text. In particular, the demonstration query by model/cancer (QbM/C) is based on unstructured pathology reports. The knowledge model represents both named entities such as diagnosis and anatomical site, and higher level concepts such as primary and metastatic tumor. Such concepts are based on the relations between named entities. We will present the data layout and access mechanism from the GUI to the data.","PeriodicalId":175955,"journal":{"name":"2008 IEEE International Conference on Data Mining Workshops","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116432400","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Robust Graph-Based Algorithm for Detection and Characterization of Anomalies in Noisy Multivariate Time Series","authors":"H. Cheng, P. Tan, C. Potter, S. Klooster","doi":"10.1109/ICDMW.2008.48","DOIUrl":"https://doi.org/10.1109/ICDMW.2008.48","url":null,"abstract":"Detection of anomalies in multivariate time series is an important data mining task with potential applications in medical diagnosis, ecosystem modeling, and network traffic monitoring. In this paper, we present a robust graph-based algorithm for detecting anomalies in noisy multivariate time series data. A key feature of the algorithm is the alignment of kernel matrices constructed from the time series. The aligned kernel enables the algorithm to capture the dependence relationship between different time series and to support the discovery of different types of anomalies (including subsequence-based and local anomalies). We have performed extensive experiments to demonstrate the effectiveness of the proposed algorithm. We also present a case study that shows the utility of applying our algorithm to detect ecosystem disturbances in Earth science data.","PeriodicalId":175955,"journal":{"name":"2008 IEEE International Conference on Data Mining Workshops","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126086933","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"ARUBAS: An Association Rule Based Similarity Framework for Associative Classifiers","authors":"B. Depaire, K. Vanhoof, G. Wets","doi":"10.1109/ICDMW.2008.58","DOIUrl":"https://doi.org/10.1109/ICDMW.2008.58","url":null,"abstract":"This article introduces ARUBAS, a new framework to build associative classifiers. In contrast with many existing associative classifiers, it uses class association rules to transform the feature space and uses instance-based reasoning to classify new instances. The framework allows the researcher to use any association rule mining algorithm to produce the class association rules. Every aspect of the framework is extensively introduced and discussed and five different fitness measures used for classification purposes are defined. The empirical results determine which fitness measure is the best and compares the framework with other classifiers. These results show that the ARUBAS framework is able to produce associative classifiers which are competitive with other classification techniques. More specifically, with ARUBAS-Scheffer-phi5 we have introduced a parameter-free algorithm which is competitive with classification techniques such as C4.5, RIPPER and CBA.","PeriodicalId":175955,"journal":{"name":"2008 IEEE International Conference on Data Mining Workshops","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126165619","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. Ambite, Craig A. Knoblock, Kristina Lerman, Anon Plangprasopchok, Thomas A. Russ, Cenk Gazen, Steven Minton, Mark James Carman
{"title":"Exploiting Data Semantics to Discover, Extract, and Model Web Sources","authors":"J. Ambite, Craig A. Knoblock, Kristina Lerman, Anon Plangprasopchok, Thomas A. Russ, Cenk Gazen, Steven Minton, Mark James Carman","doi":"10.1109/ICDMW.2008.134","DOIUrl":"https://doi.org/10.1109/ICDMW.2008.134","url":null,"abstract":"We describe Deimos, a system that automatically discovers and models new sources of information.The system exploits four core technologies developed by our group that makes an end-to-end solution to this problem possible. First, given an example source, Deimos finds other similar sources online. Second, it invokes and extracts data from these sources. Third, given the syntactic structure of a source, Deimos maps its inputs and outputs to semantic types. Finally, it infers the source's semantic definition, i.e., the function that maps the inputs to the outputs. Deimos is able to successfully automate these steps by exploiting a combination of background knowledge and data semantics. We describe the challenges in integrating separate components into a unified approach to discovering, extracting and modeling new online sources. We provide an end-to-end validation of the system in two information domains to show that it can successfully discover and model new data sources in those domains.","PeriodicalId":175955,"journal":{"name":"2008 IEEE International Conference on Data Mining Workshops","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126222534","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Unifying Unknown Nodes in the Internet Graph Using Semisupervised Spectral Clustering","authors":"Anat Almog, J. Goldberger, Y. Shavitt","doi":"10.1109/ICDMW.2008.12","DOIUrl":"https://doi.org/10.1109/ICDMW.2008.12","url":null,"abstract":"Most research on Internet topology is based on active measurement methods. A major difficulty in using these tools is that one comes across many unresponsive routers. Different methods of dealing with these anonymous nodes to preserve the connectivity of the real graph have been suggested. One of the more practical approaches involves using a placeholder for each unknown, resulting in multiple copies of every such node. This significantly distorts and inflates the inferred topology. Our goal in this work is to unify groups of placeholders in the IP-level graph. We introduce a novel clustering algorithm based on semisupervised spectral embedding of all the nodes followed by clustering of the anonymous nodes in the projected space. Experimental results on real internet data are provided, that show good similarity to the true networks.","PeriodicalId":175955,"journal":{"name":"2008 IEEE International Conference on Data Mining Workshops","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128848007","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}