{"title":"Deep Learning Solutions to Computational Phenotyping in Health Care","authors":"Zhengping Che, Yan Liu","doi":"10.1109/ICDMW.2017.156","DOIUrl":"https://doi.org/10.1109/ICDMW.2017.156","url":null,"abstract":"Exponential growth in electronic health record (EHR) data has resulted in new opportunities and urgent needs to discover meaningful data-driven representations and patterns of diseases, i.e., computational phenotyping. Recent success and development of deep learning provides promising solutions to the problem of prediction and feature discovery tasks, while lots of challenges still remain and prevent people from applying standard deep learning models directly. In this paper, we discussed three key challenges in this field: how to deal with missing data, how to build scalable models, and how to get interpretations of features and models. We proposed novel and effective deep learning solutions to each of them respectively. All proposed solutions are evaluated on several real-world health care datasets and experimental results demonstrated their superiority over existing baselines.","PeriodicalId":389183,"journal":{"name":"2017 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"26 9","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120916308","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Analyzing Dynamical Activities of Co-occurrence Patterns for Cooking Ingredients","authors":"Y. Kikuchi, Masahito Kumano, M. Kimura","doi":"10.1109/ICDMW.2017.10","DOIUrl":"https://doi.org/10.1109/ICDMW.2017.10","url":null,"abstract":"Due to the increasing popularity of cooking-recipe sharing sites and the success of complex network science, attention has recently been devoted to developing an effective networkbased method of analyzing the characteristics of ingredient combinations used in recipes. Unlike previous approaches dealing with static properties, we aim at analyzing the dynamical changes in ingredient pairs jointly used in recipes, and propose an efficient method of extracting the change patterns for co-occurrence activities of ingredients. Based on the extracted change patterns, we build an active network among ingredients at every timestep, and identify active co-occurrence patterns. Moreover, we provide a method of interpreting active co-occurrence patterns in terms of recipes, and present a framework for visually analyzing their dynamical changes. Using real data from a Japanese recipe sharing site, we quantitatively demonstrate the effectiveness of the proposed method for extracting the activity change patterns for ingredient pairs, and uncover the characteristics of the seasonal changes in ingredient pairs jointly used in Japanese recipes by applying the proposed method.","PeriodicalId":389183,"journal":{"name":"2017 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121126726","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Finding the Best Job Applicants for a Job Posting: A Comparison of Human Resources Search Strategies","authors":"Christopher G. Harris","doi":"10.1109/ICDMW.2017.31","DOIUrl":"https://doi.org/10.1109/ICDMW.2017.31","url":null,"abstract":"Finding the best candidates to match a set of job requirements can be viewed as both an art and a science. In this paper, we conduct an empirical study using actual job candidates and job applicants. We compare the ranked lists generated by executive recruiting experts with the list generated by three search strategies: one using crowdworkers in a gamified environment, a second using information retrieval-based search methods, and a third method which combines information retrieval methods and weighted feature-based approach. We examine these three strategies across two separate job categories – technical and non-technical (management). Our study finds the gamified-enhanced crowdsourcing environment works best for ranking candidates for technical jobs while the text mining and gamified crowdsourcing environments perform equally well for ranking candidates for non-technical jobs. Last, we discuss possible reasons for our results as well as suggest possible enhancements to reduce the gap between our strategies and the HR executive recruiting experts.","PeriodicalId":389183,"journal":{"name":"2017 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127152826","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Twitter Stance Detection — A Subjectivity and Sentiment Polarity Inspired Two-Phase Approach","authors":"K. Dey, Ritvik Shrivastava, Saroj Kaushik","doi":"10.1109/ICDMW.2017.53","DOIUrl":"https://doi.org/10.1109/ICDMW.2017.53","url":null,"abstract":"The problem of stance detection from Twitter tweets, has recently gained significant research attention. This paper addresses the problem of detecting the stance of given tweets, with respect to given topics, from user-generated text (tweets). We use the SemEval 2016 stance detection task dataset. The labels comprise of positive, negative and neutral stances, with respect to given topics. We develop a two-phase feature-driven model. First, the tweets are classified as neutral vs. non-neutral. Next, non-neutral tweets are classified as positive vs. negative. The first phase of our work draws inspiration from the subjectivity classification and the second phase from the sentiment classification literature. We propose the use of two novel features, which along with our streamlined approach, plays a key role deriving the strong results that we obtain. We use traditional support vector machine (SVM) based machine learning. Our system (F-score: 74.44 for SemEval 2016 Task A and 61.57 for Task B) significantly outperforms the state of the art (F-score: 68.98 for Task A and 56.28 for Task B). While the performance of the system on Task A shows the effectiveness of our model for targets on which the model was trained upon, the performance of the system on Task B shows the generalization that our model achieves. The stance detection problem in Twitter is applicable for user opinion mining related applications and other social influence and information flow modeling applications, in real life.","PeriodicalId":389183,"journal":{"name":"2017 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"194 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126988835","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Survival Random Forest to Predict Time to Fill","authors":"Summer M. Husband, J. Roberts","doi":"10.1109/ICDMW.2017.32","DOIUrl":"https://doi.org/10.1109/ICDMW.2017.32","url":null,"abstract":"Traditionally, the time-to-fill metric is used as a scorecard for past performance. An organization may use time to fill to assess the performance of its internal recruiting team, or as a way to set service level agreements with outsourced recruiting partners. By first developing a set of quantifiable job features and then applying survival analysis to historical time-to-fill data, we build a predictor to assess the probability a job will remain open beyond its target time-to-fill date, enabling us to commit additional resources to high risk jobs at the beginning of the recruiting process.","PeriodicalId":389183,"journal":{"name":"2017 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121472235","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nami Iino, Satoshi Nishimura, Ken Fukuda, Kentaro Watanabe, Kristiina Jokinen, Takuichi Nishimura
{"title":"Development and Use of an Activity Model Based on Structured Knowledge: A Music Teaching Support System","authors":"Nami Iino, Satoshi Nishimura, Ken Fukuda, Kentaro Watanabe, Kristiina Jokinen, Takuichi Nishimura","doi":"10.1109/ICDMW.2017.82","DOIUrl":"https://doi.org/10.1109/ICDMW.2017.82","url":null,"abstract":"To collect and explicate meaningful knowledge of a community, we propose an Activity Model based on structured knowledge. The following issues arise related to the model development: (a) difficulties in capturing activities; (b) difficulty of acquiring knowledge; and (c) difficulty in optimizing the activities to newly adopted technologies. Therefore, we are developing technologies that use on-site activities and knowledge as a knowledge-based Activity Model to enhance community intelligence (observation, judgment, cooperation). As an example of its application domain, we chose classical guitar. Classical guitar was established in the late 19th century, and systematization of classical guitar rendition has not yet been completed. Therefore, instructions for playing the guitar sometimes differ greatly among teachers, which confuses students. We thus aim at developing an instruction support system which collects the knowledge of guitar rendition while recording the student improvement process. For this study, we conducted workshops by professional guitarists and structured a knowledge model that systematizes the latest acquisition process of guitar rendition. The model will allow structuring of various activities and help improve and reconstruct music teaching services.","PeriodicalId":389183,"journal":{"name":"2017 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"770 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116412135","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Exploiting PubMed for Protein Molecular Function Prediction via NMF Based Multi-label Classification","authors":"S. Fodeh, Aditya Tiwari, Hong Yu","doi":"10.1109/ICDMW.2017.64","DOIUrl":"https://doi.org/10.1109/ICDMW.2017.64","url":null,"abstract":"Gene ontology (GO) defines terms and classes used to describe gene functions and relationships between them. GO has been the standard to describing the functions of specific genes in different model organisms. GO annotation which tags genes with GO terms has mostly been a manual and timeconsuming curation process. In this paper we describe the development and evaluation of an innovative predictive system to automatically assign a gene its molecular functions (GO terms) using biomedical literature as a resource. We treated a GO term assignment as a multi-label multi-class classification problem. Rather than the commonly used bag-of-words approach, we used non-negative matrix factorization (NMF) for feature reduction and then performed the classification of genes. To address the multi-label aspect of the data, we used the binary-relevance method. We experimented with different classifiers and found that the combination of binary relevance and K-nearest neighbor (KNN) classifier gave the best performance. Our evaluation on UniProtKB/Swiss-Prot dataset showed the best performance of .83 in terms of F-measure.","PeriodicalId":389183,"journal":{"name":"2017 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"222 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115529164","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Mobile E-Commerce Data Processing Using Relational Memory","authors":"P. Aarabi","doi":"10.1109/ICDMW.2017.125","DOIUrl":"https://doi.org/10.1109/ICDMW.2017.125","url":null,"abstract":"In this paper, we propose a very simple method for learning relationships between events by accounting for the spatial or temporal sequence of occurrence of the events. The underlying idea behind our proposed method is that for certain data processing application, such as data collected from retail shoppers, relational access to data is more useful and immediately informative than sequential access. We apply the proposed RElational Memory (REM) model on a large retail data consisting of 24,193 shoppers and 915 purchases using a popular mobile retailing iOS app. We illustrate how temporal relativity can play a role in determining the relationships between user actions.","PeriodicalId":389183,"journal":{"name":"2017 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114490477","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Minh Nguyen, Zaki Hasnain, Ming Li, T. Dorff, David Quinn, S. Purushotham, Luciano Nocera, P. Newton, P. Kuhn, J. Nieva, C. Shahabi
{"title":"Mining Human Mobility to Quantify Performance Status","authors":"Minh Nguyen, Zaki Hasnain, Ming Li, T. Dorff, David Quinn, S. Purushotham, Luciano Nocera, P. Newton, P. Kuhn, J. Nieva, C. Shahabi","doi":"10.1109/ICDMW.2017.168","DOIUrl":"https://doi.org/10.1109/ICDMW.2017.168","url":null,"abstract":"Human mobility has been studied extensively in various biomedical contexts with applications in clinical rehabilitation, disease diagnosis, health risk prognosis, and general performance assessments. In this paper, we present ATOMHP (Analytical Technologies to Objectively Measure Human Performance) Kinect: a system to objectively quantify human performance using the Microsoft Kinect as a single camera sensor to capture human mobility. We explore the viability of this noninvasive performance assessment system by studying a cohort of cancer patients undergoing various therapy regimens who are assigned a performance score based on a qualitative clinical test. The ATOM-HP Kinect is a clinically usable system which consists of tools for Kinect, clinical data collection, data quality validation, and mobility feature extraction, which can be used for downstream analysis of performance. Preliminary results based on the clinical case study indicate that ATOM-HP Kinect can quantify changes in kinematic parameters, and that these features are correlated with clinically measured risk factors which could be used for early prediction of diseases, or making decision on treatment modification.","PeriodicalId":389183,"journal":{"name":"2017 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125771519","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Learning-Based Method with Valence Shifters for Sentiment Analysis","authors":"Ruihua Cheng, J. Loh","doi":"10.1109/ICDMW.2017.52","DOIUrl":"https://doi.org/10.1109/ICDMW.2017.52","url":null,"abstract":"Automatic sentiment classification is becoming a popular and effective way to help online users or companies process and make sense of customer reviews. In this article, a learning-based method for classification of online reviews that achieves better classification accuracy is obtained by (a) combining valence shifters and opinion words into bigrams for use as features in an ordinal margin classifier and (b) using relational information between unigrams/bigrams in the form of a graph to constrain the parameters of the classifier. By using these two components, it is possible to extract more information present in the unstructured data than other methods such as support vector machines and random forest, hence gaining the potential of better classification performance. Indeed, our simulation results show a higher classification accuracy on empirical real data with ground truth and on simulated data.","PeriodicalId":389183,"journal":{"name":"2017 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126583369","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}