{"title":"IQ estimation for accurate time-series classification","authors":"Krisztián Búza, A. Nanopoulos, L. Schmidt-Thieme","doi":"10.1109/CIDM.2011.5949441","DOIUrl":"https://doi.org/10.1109/CIDM.2011.5949441","url":null,"abstract":"Due to its various applications, time-series classification is a prominent research topic in data mining and computational intelligence. The simple k-NN classifier using dynamic time warping (DTW) distance had been shown to be competitive to other state-of-the art time-series classifiers. In our research, however, we observed that a single fixed choice for the number of nearest neighbors k may lead to suboptimal performance. This is due to the complexity of time-series data, especially because the characteristic of the data may vary from region to region. Therefore, local adaptations of the classification algorithm is required. In order to address this problem in a principled way by, in this paper we introduce individual quality (IQ) estimation. This refers to estimating the expected classification accuracy for each time series and each k individually. Based on the IQ estimations we combine the classification results of several k-NN classifiers as final prediction. In our framework of IQ, we develop two time-series classification algorithms, IQ-MAX and IQ-WV. In our experiments on 35 commonly used benchmark data sets, we show that both IQ-MAX and IQ-WV outperform two baselines.","PeriodicalId":211565,"journal":{"name":"2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132110255","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
V. Ribas, J. Lopez, J. Ruiz-Rodríguez, Adolf Ruiz-Sanmartin, J. Rello, A. Vellido
{"title":"On the use of decision trees for ICU outcome prediction in sepsis patients treated with statins","authors":"V. Ribas, J. Lopez, J. Ruiz-Rodríguez, Adolf Ruiz-Sanmartin, J. Rello, A. Vellido","doi":"10.1109/CIDM.2011.5949439","DOIUrl":"https://doi.org/10.1109/CIDM.2011.5949439","url":null,"abstract":"Sepsis is one of the main causes of death for noncoronary ICU (Intensive Care Unit) patients and has become the tenth most common cause of death in western societies. This is a transversal condition affecting immunocompromised patients, critically ill patients, post-surgery patients, patients with AIDS, and the elderly. In western countries, septic patients account for as much as 25% of ICU bed utilization and the pathology affects 1% – 2% of all hospitalizations. Its mortality rates range from 12.8% for sepsis to 45.7% for septic shock. Early administration of antibiotics is known to be crucial for ICU outcomes. In this regard, statins, a class of drug, have been shown to present good anti-inflammatory properties beyond their regulation of the biosynthesis of cholesterol. In this brief paper, we hypothesize that preadmission use of statins improves ICU outcomes. We test this hypothesis in a prospective study in patients admitted with severe sepsis and multiorgan failure at the ICU of Vall d' Hebron University Hospital (Barcelona, Spain), using statistic algebraic models and regression trees.","PeriodicalId":211565,"journal":{"name":"2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM)","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120947973","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"About the analysis of time series with temporal association rule mining","authors":"Tim Schlüter, Stefan Conrad","doi":"10.1109/CIDM.2011.5949303","DOIUrl":"https://doi.org/10.1109/CIDM.2011.5949303","url":null,"abstract":"This paper addresses the issue of analyzing time series with temporal association rule mining techniques. Since originally association rule mining was developed for the analysis of transactional data, as it occurs for instance in market basket analysis, algorithms and time series have to be adapted in order to apply these techniques gainfully to the analysis of time series in general. Continuous time series of different origins can be discretized in order to mine several temporal association rules, what reveals interesting coherences in one and between pairs of time series. Depending on the domain, the knowledge about these coherences can be used for several purposes, e.g. for the prediction of future values of time series. We present a short review on different standard and temporal association rule mining approaches and on approaches that apply association rule mining to time series analysis. In addition to that, we explain in detail how some of the most interesting kinds of temporal association rules can be mined from continuous time series and present an prototype implementation. We demonstrate and evaluate our implementation on two large datasets containing river level measurement and stock data.","PeriodicalId":211565,"journal":{"name":"2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131335548","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A framework for semi-automated process instance discovery from decorative attributes","authors":"Andrea Burattin, R. Vigo","doi":"10.1109/CIDM.2011.5949450","DOIUrl":"https://doi.org/10.1109/CIDM.2011.5949450","url":null,"abstract":"Process mining is a relatively new field of research: its final aim is to bridge the gap between data mining and business process modelling. In particular, the assumption underpinning this discipline is the availability of data coming from business process executions. In business process theory, once the process has been defined, it is possible to have a number of instances of the process running at the same time. Usually, the identification of different instances is referred to a specific “case id” field in the log exploited by process mining techniques. The software systems that support the execution of a business process, however, often do not record explicitly such information. This paper presents an approach that faces the absence of the “case id” information: we have a set of extra fields, decorating each single activity log, that are known to carry the information on the process instance. A framework is addressed, based on simple relational algebra notions, to extract the most promising case ids from the extra fields. The work is a generalization of a real business case.","PeriodicalId":211565,"journal":{"name":"2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM)","volume":"232 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122461430","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Geodesic distances for web document clustering","authors":"Selma Tekir, Florian Mansmann, D. Keim","doi":"10.1109/CIDM.2011.5949449","DOIUrl":"https://doi.org/10.1109/CIDM.2011.5949449","url":null,"abstract":"While traditional distance measures are often capable of properly describing similarity between objects, in some application areas there is still potential to fine-tune these measures with additional information provided in the data sets. In this work we combine such traditional distance measures for document analysis with link information between documents to improve clustering results. In particular, we test the effectiveness of geodesic distances as similarity measures under the space assumption of spherical geometry in a 0-sphere. Our proposed distance measure is thus a combination of the cosine distance of the term-document matrix and some curvature values in the geodesic distance formula. To estimate these curvature values, we calculate clustering coefficient values for every document from the link graph of the data set and increase their distinctiveness by means of a heuristic as these clustering coefficient values are rough estimates of the curvatures. To evaluate our work, we perform clustering tests with the k-means algorithm on the English Wikipedia hyperlinked data set with both traditional cosine distance and our proposed geodesic distance. The effectiveness of our approach is measured by computing micro-precision values of the clusters based on the provided categorical information of each article.","PeriodicalId":211565,"journal":{"name":"2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124873197","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Logistic sub-models for small size populations in credit scoring","authors":"Bouaguel Waad, F. Beninel, G. B. Mufti","doi":"10.1109/CIDM.2011.5949425","DOIUrl":"https://doi.org/10.1109/CIDM.2011.5949425","url":null,"abstract":"The credit scoring risk management is a fast growing field due to consumer's credit requests. Credit requests, of new and existing customers, are often evaluated by classical discrimination rules based on customers information. However, these kinds of strategies have serious limits and don't take into account the characteristics difference between current customers and the future ones. The aim of this paper is to measure credit worthiness for non customers borrowers and to model potential risk given a heterogeneous population formed by borrowers customers of the bank and others who are not. We hold on previous works done in generalized discrimination and transpose them into the logistic model to bring out efficient discrimination rules for non customers' subpopulation. Therefore we obtain seven simple models of connection between parameters of both logistic models associated respectively to the two subpopulations. The German credit data set is selected as the experimental data to compare the seven models. Experimental results show that the use of links between the two subpopulations improve the classification accuracy for the new loan applicants.","PeriodicalId":211565,"journal":{"name":"2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM)","volume":"110 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128005799","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Opening black box Data Mining models using Sensitivity Analysis","authors":"P. Cortez, M. Embrechts","doi":"10.1109/CIDM.2011.5949423","DOIUrl":"https://doi.org/10.1109/CIDM.2011.5949423","url":null,"abstract":"There are several supervised learning Data Mining (DM) methods, such as Neural Networks (NN), Support Vector Machines (SVM) and ensembles, that often attain high quality predictions, although the obtained models are difficult to interpret by humans. In this paper, we open these black box DM models by using a novel visualization approach that is based on a Sensitivity Analysis (SA) method. In particular, we propose a Global SA (GSA), which extends the applicability of previous SA methods (e.g. to classification tasks), and several visualization techniques (e.g. variable effect characteristic curve), for assessing input relevance and effects on the model's responses. We show the GSA capabilities by conducting several experiments, using a NN ensemble and SVM model, in both synthetic and real-world datasets.","PeriodicalId":211565,"journal":{"name":"2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115835704","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}