Tony Lindgren, P. Papapetrou, Isak Samsten, L. Asker
{"title":"Example-Based Feature Tweaking Using Random Forests","authors":"Tony Lindgren, P. Papapetrou, Isak Samsten, L. Asker","doi":"10.1109/IRI.2019.00022","DOIUrl":"https://doi.org/10.1109/IRI.2019.00022","url":null,"abstract":"In certain application areas when using predictive models, it is not enough to make an accurate prediction for an example, instead it might be more important to change a prediction from an undesired class into a desired class. In this paper we investigate methods for changing predictions of examples. To this end, we introduce a novel algorithm for changing predictions of examples and we compare this novel method to an existing method and a baseline method. In an empirical evaluation we compare the three methods on a total of 22 datasets. The results show that the novel method and the baseline method can change an example from an undesired class into a desired class in more cases than the competitor method (and in some cases this difference is statistically significant). We also show that the distance, as measured by the euclidean norm, is higher for the novel and baseline methods (and in some cases this difference is statistically significantly) than for state-of-the-art. The methods and their proposed changes are also evaluated subjectively in a medical domain with interesting results.","PeriodicalId":295028,"journal":{"name":"2019 IEEE 20th International Conference on Information Reuse and Integration for Data Science (IRI)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134486125","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Title Page i","authors":"","doi":"10.1109/iri.2019.00001","DOIUrl":"https://doi.org/10.1109/iri.2019.00001","url":null,"abstract":"","PeriodicalId":295028,"journal":{"name":"2019 IEEE 20th International Conference on Information Reuse and Integration for Data Science (IRI)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132812253","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"DDM: Data-Driven Modeling of Physical Phenomenon with Application to METOC","authors":"S. Rubin, Lydia Bouzar-Benlabiod","doi":"10.1109/IRI.2019.00057","DOIUrl":"https://doi.org/10.1109/IRI.2019.00057","url":null,"abstract":"The problem addressed by this paper pertains to the representation, acquisition, and randomization of experiential knowledge for autonomous systems in expert reconnaissance. Such systems are characterized by the requirement to render proper decisions not explicitly programmed for. Cases are defined to consist of domain-specific data (e.g., heterogeneous sensory data), which may not be fully general due to the inclusion of (a) extraneous predicates and/or because (b) the predicates are overly specific. Rules satisfy the definition of cases and result from cases (rules), which have undergone at least one of the aforementioned generalizations. Extraneous antecedent predicates may be discovered from cases (rules) sharing a common consequent, if binary tautologies are found in case (rule) pairings, or if higher tautologies are found in a multiplicity of such cases (rules). Eliminating such extraneous antecedent predicates allows for the discovery of possible additional extraneous antecedent predicates - where the antecedent of one is a proper subset of the other. Candidate rules are formed from the intersection of combinations of two or more case (rule) antecedent sets implying a common consequent. The removed antecedent subsets are acquired as new rules implying the common consequent, which are conditioned to fire by the non-monotonic actions of their common antecedent (i.e., by way of an embedded antecedent predicate) - reducing the specificity of the parents by generalizing them into smaller, more reusable rules. Similarly, more general consequent sequences are formed from common subsequences shared by two or more consequent sequences being non-deterministically implied by a common antecedent. The removed consequent subsequences are acquired as new rules, which are set to fire before or after that of its parent's common dependency - reducing the specificity of the parents by generalizing them into smaller, more reusable rules. The rule to fire first will non-monotonically trigger the rule to fire next. This process iterates, since randomization of one side may enable further randomization of the other side. Tautologies are extracted and common subsets or subsequences form candidate rules as previously described (i.e., without creating duplicate productions). The context for the transformations is provided by the cases (rules), which are effectively acquired as previously described. Knowledge is segmented on the basis of whether it is a case, or a rule. Knowledge is further dynamically segmented on the basis of maximally shared left-hand sides (LHS) and maximally shared right-hand sides (RHS) - using logical pointers to minimize space-time requirements. It is proven that the allowance for non determinism is required, which implies that candidate rules cannot be invalidated by syntactically checking them for contradiction with a known valid dependency. A possibility metric is provided for each production, which cumulatively tracks the similarity of t","PeriodicalId":295028,"journal":{"name":"2019 IEEE 20th International Conference on Information Reuse and Integration for Data Science (IRI)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134194356","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Gavindya Jayawardena, Anne M. P. Michalek, S. Jayarathna
{"title":"Eye Tracking Area of Interest in the Context of Working Memory Capacity Tasks","authors":"Gavindya Jayawardena, Anne M. P. Michalek, S. Jayarathna","doi":"10.1109/IRI.2019.00042","DOIUrl":"https://doi.org/10.1109/IRI.2019.00042","url":null,"abstract":"Adults diagnosed with Attention-Deficit / Hyperactivity Disorder (ADHD) have reduced working memory capacity, indicating attention control deficits. Such deficits affect the characteristic movements of human gaze, thus making it a potential avenue to investigate attention disorders. This paper presents a converging operations approach toward the objective detection of neurocognitive indices of ADHD symptomatology that is grounded in the cognitive neuroscience literature of ADHD. The development of these objective measures of ADHD will facilitate its diagnosis. We hypothesize that the characteristic movements of human gaze within specific areas of interests (AOIs) may be used to estimate psychometric measures and that distinct eye movement scan patterns can be used to better understand ADHD. The results of this feasibility study confirm the utility of a combination of fixation and saccade feature set captured within specific AOIs indexing Working Memory Capacity (WMC) as a predictor of a diagnosis of ADHD in adults. Tree-based classifiers performed best in-terms of predicting ADHD with 86% percent accuracy using physiological measures of sustained visual attention during a WMC task.","PeriodicalId":295028,"journal":{"name":"2019 IEEE 20th International Conference on Information Reuse and Integration for Data Science (IRI)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130353672","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Richard A. Bauder, Matthew Herland, T. Khoshgoftaar
{"title":"Evaluating Model Predictive Performance: A Medicare Fraud Detection Case Study","authors":"Richard A. Bauder, Matthew Herland, T. Khoshgoftaar","doi":"10.1109/IRI.2019.00016","DOIUrl":"https://doi.org/10.1109/IRI.2019.00016","url":null,"abstract":"Evaluating a machine learning model's predictive performance is vital for establishing the practical usability in real-world applications. The use of separate training and test datasets, and cross-validation are common when evaluating machine learning models. The former uses two distinct datasets, whereas cross-validation splits a single dataset into smaller training and test subsets. In real-world production applications, it is critical to establish a model's usefulness by validating it on completely new input data, and not just using the crossvalidation results on a single historical dataset. In this paper, we present results for both evaluation methods, to include performance comparisons. In order to provide meaningful comparative analyses between methods, we perform real-world fraud detection experiments using 2013 to 2016 Medicare durable medical equipment claims data. This Medicare dataset is split into training (2013 to 2015 individual years) and test (2016 only). Using this Medicare case study, we assess the fraud detection performance, across three learners, for both model evaluation methods. We find that using the separate training and test sets generally outperforms cross-validation, indicating a better real-world model performance evaluation. Even so, cross-validation has comparable, but conservative, fraud detection results.","PeriodicalId":295028,"journal":{"name":"2019 IEEE 20th International Conference on Information Reuse and Integration for Data Science (IRI)","volume":"27 34","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114044032","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Enhancing Software Requirements Cluster Labeling Using Wikipedia","authors":"S. Reddivari","doi":"10.1109/IRI.2019.00031","DOIUrl":"https://doi.org/10.1109/IRI.2019.00031","url":null,"abstract":"Clustering plays an important role in reusable requirements retrieval from the ever-growing software project repositories. The literature on requirements cluster labeling is still emerging. Researchers have investigated clustering to support various software engineering activities such as requirements prioritization, feature identification, automated tracing, and code navigation. The primary task in analyzing the clustering results is to \"label\" the clusters by means of some representative words to summarize and comprehend the requirements data. Despite the development of automatic cluster labeling techniques for software requirements, very little is understood about enhancing the cluster labels using external knowledge sources such as Wikipedia. In this paper, we review the literature on enhancing cluster labeling, present a framework for requirements cluster labeling and conduct an experiment to evaluate how the Wikipedia-based enhancement performs in labeling requirements clusters. The results show that Wikipedia-based labeling outperforms traditional Information Retrieval (IR) techniques. Our work sheds light on improving automated ways to support information reuse and management in the context of requirements engineering (RE).","PeriodicalId":295028,"journal":{"name":"2019 IEEE 20th International Conference on Information Reuse and Integration for Data Science (IRI)","volume":"146 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124667562","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"EmoWei : Emotion-Oriented Personalized Weight Management System Based on Sentiment Analysis","authors":"Jihyeon Kim, Uran Oh","doi":"10.1109/IRI.2019.00060","DOIUrl":"https://doi.org/10.1109/IRI.2019.00060","url":null,"abstract":"A number of online communities and commercial apps exist to assist people with weight management. However, these systems are limited to logging and tracking meals or workouts without considering one's emotional state, which is known to have a strong impact on health (e.g., stress-related eating). To confirm the feasibility of monitoring emotion from personal logs such as online posts, we first conducted a Recurrent Neural Network (RNN) based sentiment analysis on 17,735 weight loss-related tweets and 200 posts from an online weight management community called FatSecret in comparisons to general tweets. The results suggest that we can infer one's emotion based on their written text and their progress in managing weight. Based on the findings, we propose EmoWei, a new weight management system that integrates users' emotions to provide personalized assistance to achieve their weight loss goals with minimum stress.","PeriodicalId":295028,"journal":{"name":"2019 IEEE 20th International Conference on Information Reuse and Integration for Data Science (IRI)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125211972","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Texture Image Categorization in Wavelet Domain via Naive Bayes Classifier Based on Laplace and Generalized Gaussian Distribution","authors":"Muhammad Azam, N. Bouguila","doi":"10.1109/IRI.2019.00034","DOIUrl":"https://doi.org/10.1109/IRI.2019.00034","url":null,"abstract":"In this paper, we have investigated recently proposed feature extraction technique for texture image representation. In the introduced method, features are extracted via bounded Laplace mixture model (BLMM) in wavelet domain. Due to nature of wavelet coefficients that can be modeled accurately with Laplace distribution, it is proposed to apply classifiers based on this distribution, which leads us to introduce Naive Bayes classifier with Laplace distribution for image categorization. The proposed approach is validated through experiments on different texture image datasets and it has shown very good results as compared to the model based on Gaussian distribution. The generalized Gaussian distribution is a generalization of both Laplace and Gaussian distributions, thus we have introduced also Naive Bayes classifier with generalized Gaussian distribution to achieve better performance as compared to the above two models. The proposed approach is also validated through extensive experiments and it is observed that by taking into account the nature of data, proposed models have very good performance. Classification results are presented by different performance metrics to ensure the effectiveness of proposed algorithms in texture image classification.","PeriodicalId":295028,"journal":{"name":"2019 IEEE 20th International Conference on Information Reuse and Integration for Data Science (IRI)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115284841","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fake News Detection Using Bayesian Inference","authors":"Fatma Najar, Nuha Zamzami, N. Bouguila","doi":"10.1109/IRI.2019.00066","DOIUrl":"https://doi.org/10.1109/IRI.2019.00066","url":null,"abstract":"Given the huge volume of information available on social media, making a distinction between false information and a real one is a challenging task. In fact, several statistical models dealing with this problem are based on multinomial distributions. However, a new family of distributions that is an exponential family approximation to the Dirichlet Compound Multinomial (EDCM) has been introduced to be more adjustable to high-dimensional data and to overcome the drawbacks of the multinomial assumption. Thus, in this paper, we tackle the problem of fake news detection using finite mixture models of EDCM distributions. In particular, we develop a Bayesian approach based on Markov Chain Monte Carlo and Metropolis-Hastings algorithm for the learning of these mixture models. The proposed method is validated via extensive simulations and a comparison with multinomial-based mixture models is provided.","PeriodicalId":295028,"journal":{"name":"2019 IEEE 20th International Conference on Information Reuse and Integration for Data Science (IRI)","volume":"122 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115433151","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Machine Learning for Classification of Economic Recessions","authors":"Bruce Jackson, M. Rege","doi":"10.1109/IRI.2019.00019","DOIUrl":"https://doi.org/10.1109/IRI.2019.00019","url":null,"abstract":"The ability to quickly and accurately classify economic activity into periods of recession and expansion is of great interest to economists and policy makers. Machine Learning methods can potentially be applied to the classification of business cycles. This paper describes two machine learning methods, K-Nearest Neighbor and Neural Networks, and compares them to a Dynamic Factor Markov Switching model for determining business cycle turning points. We conclude that machine learning techniques can offer more accurate classifiers that are worthy of additional study.","PeriodicalId":295028,"journal":{"name":"2019 IEEE 20th International Conference on Information Reuse and Integration for Data Science (IRI)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114776401","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}