U '09Pub Date : 2009-06-28DOI: 10.1145/1610555.1610561
B. Quost, T. Denoeux
{"title":"Learning from data with uncertain labels by boosting credal classifiers","authors":"B. Quost, T. Denoeux","doi":"10.1145/1610555.1610561","DOIUrl":"https://doi.org/10.1145/1610555.1610561","url":null,"abstract":"In this article, we investigate supervised learning when training data are associated with uncertain labels. We tackle this problem within the theory of belief functions. Each training pattern xi is thus associated with a basic belief assignment, representing partial knowledge of its actual class. Here, we propose to use the approach known as boosting to solve the classification problem. We propose a variant of the AdaBoost algorithm where the outputs of the classifiers are interpreted as belief functions. During training, our algorithm estimates the reliability of each classifier to identify patterns from the various classes. During test phase, the outputs of the classifiers are first discounted according to these reliabilities, and then combined using a suitable rule. Experiments conducted on classical datasets show that our algorithm is comparable to AdaBoost in accuracy. Processing EEG data with imperfect labels clearly demonstrates the interest of taking into account the reliability of the labelling, and thus the relevance of our approach.","PeriodicalId":176906,"journal":{"name":"U '09","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129433622","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
U '09Pub Date : 2009-06-28DOI: 10.1145/1610555.1610563
C. Dudas, Henrik Boström
{"title":"Using uncertain chemical and thermal data to predict product quality in a casting process","authors":"C. Dudas, Henrik Boström","doi":"10.1145/1610555.1610563","DOIUrl":"https://doi.org/10.1145/1610555.1610563","url":null,"abstract":"Process and casting data from different sources have been collected and merged for the purpose of predicting, and determining what factors affect, the quality of cast products in a foundry. One problem is that the measurements cannot be directly aligned, since they are collected at different points in time, and instead they have to be approximated for specific time points, hence introducing uncertainty. An approach for addressing this problem is investigated, where uncertain numeric feature values are represented by intervals and random forests are extended to handle such intervals. A preliminary experiment shows that the suggested way of forming the intervals, together with the extension of random forests, results in higher predictive performance compared to using single (expected) values for the uncertain features together with standard random forests.","PeriodicalId":176906,"journal":{"name":"U '09","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116250688","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
U '09Pub Date : 2009-06-28DOI: 10.1145/1610555.1610562
W. Hendrix, Matthew C. Schmidt, P. Breimyer, N. Samatova
{"title":"On perturbation theory and an algorithm for maximal clique enumeration in uncertain and noisy graphs","authors":"W. Hendrix, Matthew C. Schmidt, P. Breimyer, N. Samatova","doi":"10.1145/1610555.1610562","DOIUrl":"https://doi.org/10.1145/1610555.1610562","url":null,"abstract":"The maximal clique enumeration (MCE) problem can be used to find very tightly-coupled collections of objects inside a network or graph of relationships. However, when such networks are based on noisy or uncertain data, the solutions to the MCE problem for several closely related graphs may be necessary to accurately define the collections.\u0000 Thus, we propose an algorithm that efficiently solves the MCE problem on altered, or perturbed, graphs. The algorithm utilizes the enumeration of a baseline graph and identifies only those maximal cliques that the perturbation adds and/or removes. We detail the algorithm and the underlying theory required to guarantee correctness. Further, we report average runtime speedups of 7 and 9 for our algorithm over traditional enumeration techniques in the cases of adding and removing edges, respectively, from graphs constructed from protein interaction data.","PeriodicalId":176906,"journal":{"name":"U '09","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117252270","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
U '09Pub Date : 2009-06-28DOI: 10.1145/1610555.1610560
Giorgio Corani, Marco Zaffalon
{"title":"Lazy naive credal classifier","authors":"Giorgio Corani, Marco Zaffalon","doi":"10.1145/1610555.1610560","DOIUrl":"https://doi.org/10.1145/1610555.1610560","url":null,"abstract":"We propose a local (or lazy) version of the naive credal classifier. The latter is an extension of naive Bayes to imprecise probability developed to issue reliable classifications despite small amounts of data, which may then be carrying highly uncertain information about a domain. Reliability is maintained because credal classifiers can issue set-valued classifications on instances that are particularly difficult to classify. We show by extensive experiments that the local classifier outperforms the original one, both in terms of accuracy of classification and because it leads to stronger conclusions (i.e., set-valued classifications made by fewer classes). By comparing the local credal classifier with a local version of naive Bayes, we also show that the former reliably deals with instances which are difficult to classify, unlike the local naive Bayes which leads to fragile classifications.","PeriodicalId":176906,"journal":{"name":"U '09","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133652876","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
U '09Pub Date : 2009-06-28DOI: 10.1145/1610555.1610558
B. Zadrozny, G. Pappa, Wagner Meira Jr, Marcos André Gonçalves, L. Rocha, Thiago Salles
{"title":"Exploiting contexts to deal with uncertainty in classification","authors":"B. Zadrozny, G. Pappa, Wagner Meira Jr, Marcos André Gonçalves, L. Rocha, Thiago Salles","doi":"10.1145/1610555.1610558","DOIUrl":"https://doi.org/10.1145/1610555.1610558","url":null,"abstract":"Uncertainty is often inherent to data and still there are just a few data mining algorithms that handle it. In this paper we focus on how to account for uncertainty in classification algorithms, in particular when data attributes should not be considered completely truthful for classifying a given sample. Our starting point is that each piece of data comes from a potentially different context and, by estimating context probabilities of an unknown sample, we may derive a weight that quantifies their influence. We propose a lazy classification strategy that incorporates the uncertainty into both the training and usage of classifiers. We also propose uK-NN, an extension of the traditional K-NN that implements our approach. Finally, we illustrate uK-NN, which is currently being evaluated experimentally, using a document classification toy example.","PeriodicalId":176906,"journal":{"name":"U '09","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125372428","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
U '09Pub Date : 2009-06-28DOI: 10.1145/1610555.1610557
C. Leung, Dale A. Brajczuk
{"title":"Efficient algorithms for mining constrained frequent patterns from uncertain data","authors":"C. Leung, Dale A. Brajczuk","doi":"10.1145/1610555.1610557","DOIUrl":"https://doi.org/10.1145/1610555.1610557","url":null,"abstract":"Mining of frequent patterns is one of the popular knowledge discovery and data mining (KDD) tasks. It also plays an essential role in the mining of many other patterns such as correlation, sequences, and association rules. Hence, it has been the subject of numerous studies since its introduction. Most of these studies find all the frequent patterns from collection of precise data, in which the items within each datum or transaction are definitely known and precise. However, there are many real-life situations in which the user is interested in only some tiny portions of these frequent patterns. Finding all frequent patterns would then be redundant and waste lots of computation. This calls for constrained mining, which aims to find only those frequent patterns that are interesting to the user. Moreover, there are also many reallife situations in which the data are uncertain. This calls for uncertain data mining. In this paper, we propose an algorithm to efficiently find constrained frequent patterns from collections of uncertain data.","PeriodicalId":176906,"journal":{"name":"U '09","volume":"43 5","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134226465","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
U '09Pub Date : 2009-06-28DOI: 10.1145/1610555.1610556
Chia-Hui Chang, Jun-Hong Lin
{"title":"Decision support and profit prediction for online auction sellers","authors":"Chia-Hui Chang, Jun-Hong Lin","doi":"10.1145/1610555.1610556","DOIUrl":"https://doi.org/10.1145/1610555.1610556","url":null,"abstract":"Online auction has become a very popular e-commerce transaction type. The immense business opportunities attract a lot of individuals as well as online stores. With more sellers engaged in, the competition between sellers is more intense. For sellers, how to maximize their profit by proper auction setting becomes the critical success factor in online auction market. In this paper, we provide a selling recommendation service which can predict the expected profit before listing and, based on the expected profit, recommend the seller whether to use current auction setting or not. We collect data from five kinds of digital camera from eBay and apply machine learning algorithm to predict sold probability and end-price. In order to get genuine sold probability and end-price prediction (even for unsold items), we apply probability calibration and sample selection bias correction when building the prediction models. To decide whether to list a commodity or not, we apply cost-sensitive analysis to decide whether to use current auction setting. We compare the profits using three different approaches: probability-based, end-price based, and our expected-profit based recommendation service. The experiment result shows that our recommendation service based on expected profit gives higher earnings and probability is a key factor that maintains the profit gain when ultra cost incurs for unsold items due to stocking.","PeriodicalId":176906,"journal":{"name":"U '09","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129816981","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}