S. Kaushik, Abhinav Choudhury, Nataraj Dasgupta, Sayee Natarajan, Larry A. Pickett, V. Dutt
{"title":"Using LSTMs for Predicting Patient's Expenditure on Medications","authors":"S. Kaushik, Abhinav Choudhury, Nataraj Dasgupta, Sayee Natarajan, Larry A. Pickett, V. Dutt","doi":"10.1109/MLDS.2017.9","DOIUrl":"https://doi.org/10.1109/MLDS.2017.9","url":null,"abstract":"Managing expenditure on medications is a serious challenge faced by patients, in particular for those who cannot afford costly health care. Predicting patient's spending on medications becomes crucial for efficient planning, budgeting, and decision-making. However, little attention has been given to predicting patient expenditure using deep time-series forecasting methods. The primary objective of this paper is the time-series forecasting of patient expenditures on medications using both traditional and deep time-series forecasting methods. A traditional Auto Regressive Integrated Moving Average (ARIMA) model; and, two deep models, a standard Long Short-Term Memory (LSTM) model and a stacked LSTM model were calibrated to predict the monthly expenditure on medication for 50,000+ patients in the US between 2011 and 2015. The first 48 months were used for training the models and the remaining 12 months were used for testing the models. Results revealed that the stacked LSTM model performed better than both the standard LSTM and ARIMA models during test conditions. Overall, both the deep time-series models performed better than the traditional time-series ARIMA model. We highlight the implications of our results for forecasting time-series data involving patient journeys.","PeriodicalId":248656,"journal":{"name":"2017 International Conference on Machine Learning and Data Science (MLDS)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128341221","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yongsoo Choi, Dongshik Kang, Jae-Jeong Hwang, K. Rhee
{"title":"JPEG Compression Detection Based on Edge-Corner Features Using SVM","authors":"Yongsoo Choi, Dongshik Kang, Jae-Jeong Hwang, K. Rhee","doi":"10.1109/MLDS.2017.25","DOIUrl":"https://doi.org/10.1109/MLDS.2017.25","url":null,"abstract":"This paper focuses on the detection of JPEG compression (JC) image forensics and extracts a feature vector that composed of the Hough line, peaks, and the Harris-Stephens corner features to classify the JC and the other type images. The longest Hough line is computed by the Hough transform with the Canny line, then the coordinates of the line’s endpoints would be the feature set. Also, the coordinates of the deep Hough peaks is defined as the feature set. Lastly, the coordinates of the Harris-Stephens corners would be the feature set, respectively. They are to be combined the feature vector for the JC detection. The defined feature vector is trained inSVM (Support Vector Machine) classifier for the JC detection of the forged images. The performance of the proposed JC detection is measured with the chose four types of the forged images in the experiment: unaltered, median filtering (3 × 3), averaging filter (3 × 3) and downscaling (0.9), respectively. Subsequently, the experimental items; the AUC (Area Under Curve) by the sensitivity and 1-specificity, PTP at PFP = 0.01, Pe (a minimal average decision error), and the classification are evaluating the performance of the proposed JC detector scheme. Thus, it confirmed that the grade evaluation of the proposed algorithm is 'Excellent (A)'.","PeriodicalId":248656,"journal":{"name":"2017 International Conference on Machine Learning and Data Science (MLDS)","volume":"79 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114580321","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Validation of Inspection Reviews over Variable Features Set Threshold","authors":"Maninder Singh, G. Walia, Anurag Goswami","doi":"10.1109/MLDS.2017.16","DOIUrl":"https://doi.org/10.1109/MLDS.2017.16","url":null,"abstract":"Background: Mining software requirement reviews involve natural language processing (NLP) to efficiently validate a true-fault as useful and false-positive as non-useful. Aim: The aim of this paper is to evaluate our proposed mining approach to automate the validation of requirement reviews generated during an inspection of NL requirements document. Method: Our approach utilized two training models; one from requirement reviews and other from online movies. We conducted an empirical study to test our approach using part of speech (POS) against these two trained models and observed trends w.r.t. F-measure and G-mean along with percentage of features used to train two models. Results: The results showed that using training reviews from two different domains report similar trend across evaluation metrics. Our results show that the most stable and promising validation results for F-measure and G-mean are obtained when a model over inspection and movies reviews are trained using feature set threshold value 65% and 45% respectively.","PeriodicalId":248656,"journal":{"name":"2017 International Conference on Machine Learning and Data Science (MLDS)","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115650798","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"INGC: Graph Clustering & Outlier Detection Algorithm Using Label Propagation","authors":"Vandana Bhatia, Bharti Saneja, Rinkle Rani","doi":"10.1109/MLDS.2017.14","DOIUrl":"https://doi.org/10.1109/MLDS.2017.14","url":null,"abstract":"In the last decade, the size of data have increased at tremendous rate. To extract knowledgeable insights from this huge amount of data, data mining has to be done. To get the useful insights the connection in between data is sometimes of high interest. This connection can be efficiently represented as graphs. It provides an influential way to provide efficient illustrations for many applications spanning from biological networks, social networks to web networks. Graph mining techniques such as clustering and outlier detection can be beneficial in gathering the useful information. In this paper, an efficient influence based graph clustering and outlier detection algorithm (INGC) is proposed based on label propagation. The proposed algorithm improves the performance of the traditional Label Propagation algorithm by making it more robust. The proposed INGC saves time by labeling only high influential vertices of network. Further the labels are propagated among the rest of the nodes of network. And, the nodes with same vertex label are gathered to form a cluster. The vertices to which no label has been assigned during clustering are identified as outliers. Experiments were carried out on three real life graph datasets. It is shown that the proposed INGC outperforms the state-of art clustering algorithms in terms of F-Measure and Modularity. INGC also proved to be efficient in terms of detection rate of outliers.","PeriodicalId":248656,"journal":{"name":"2017 International Conference on Machine Learning and Data Science (MLDS)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133887253","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Internet of Things and Decision Support System for eHealth - Applied to Cardiometabolic Diseases","authors":"Parag Chatterjee, R. Armentano, L. Cymberknop","doi":"10.1109/MLDS.2017.22","DOIUrl":"https://doi.org/10.1109/MLDS.2017.22","url":null,"abstract":"Recent years have seen a phenomenal change in healthcare paradigms and Internet of Things (IoT) clubbed with data analytics has been a key player in this field. IoT enables a common platform for seamless exchange between healthcare devices and stakeholders followed by advanced analysis of the shared pool of data. Also it marks the foundation of Clinical Decision Support Systems which act as an assistive tool for the medical personnel in getting a deeper insight to patients' health data and design more efficient and personalized treatment strategy. This work discusses a specific aspect of this emerging field of IoT-based eHealth related to remote patient monitoring system applied to cardiometabolic diseases. Such system counts significant towards decision support systems thanks to its efficient data analytics, enabling medical personnel to have a holistic visualization of the healthcare scenario.","PeriodicalId":248656,"journal":{"name":"2017 International Conference on Machine Learning and Data Science (MLDS)","volume":"95 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123992656","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Spotted Hyena Optimizer for Solving Engineering Design Problems","authors":"Gaurav Dhiman, A. Kaur","doi":"10.1109/MLDS.2017.5","DOIUrl":"https://doi.org/10.1109/MLDS.2017.5","url":null,"abstract":"This paper presents a recently developed metaheuristic optimization algorithm named as Spotted Hyena Optimizer (SHO) which is inspired by the social behaviors of spotted hyenas. The three basic steps of SHO are searching for prey, encircling, and attacking prey which are mathematically modeled and discussed. The main concept of this work is to applied the SHO algorithm on two very challenging real-life constrained engineering design problems (i.e., 25-bar truss design and multiple disk clutch brake design) and compared it with other various metaheuristic algorithms. The experimental results of engineering design problems reveal that SHO algorithm performs better than the other competitor metaheuristic algorithms.","PeriodicalId":248656,"journal":{"name":"2017 International Conference on Machine Learning and Data Science (MLDS)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121970256","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Data Driven Decision Support to Fund Graduate Studies in Abroad Universities","authors":"Shahriar Yazdipour, Nahid Taherian","doi":"10.1109/MLDS.2017.17","DOIUrl":"https://doi.org/10.1109/MLDS.2017.17","url":null,"abstract":"Each year, many undergraduate students from developing countries try to continue their graduate studies in foreign universities. Admission is not easy, especially for those who seek positions with full funding and scholarship. Chance of acceptance and getting fund is dependent on many factors like GPA, GRE, IELTS scores, and the field of study, university name and number of papers. Since the process of application is cost and time consuming, students should just apply for universities with a high chance of acceptance and funding. Students usually reach out to those who applied before and use their experience to have a smarter choice. In some countries like Iran, there are portals and websites in which previously admitted students share their experience and information. In this paper, we use the data provided by these students to build models for predicting the chance of a student for getting fund from different universities. After cleaning and preprocessing, we build decision trees which take the person data as input and calculate the probability of getting financial support from various universities. This model also helps us to find out the most important factors in succeeding to achieve funding. Also, admission seekers can follow the provided rules to estimate their chance of getting fund and obtain ideas about how to improve their profiles to increase their chances. Additionally, we use a k-nearest neighbor algorithm to find k most similar records to user's profile. These similar records are used to predict the chance of acceptance and getting fund. Undoubtedly these models1 are beneficial for students who have profound desire as well as students who are trying to pursue higher study abroad with financial support.","PeriodicalId":248656,"journal":{"name":"2017 International Conference on Machine Learning and Data Science (MLDS)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132705733","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Carlos Alejandro Aguirre, Sneha Gullapalli, María F. De la Torre, Alice Lam, J. Weese, W. Hsu
{"title":"Learning to Filter Documents for Information Extraction Using Rapid Annotation","authors":"Carlos Alejandro Aguirre, Sneha Gullapalli, María F. De la Torre, Alice Lam, J. Weese, W. Hsu","doi":"10.1109/MLDS.2017.24","DOIUrl":"https://doi.org/10.1109/MLDS.2017.24","url":null,"abstract":"Corpus-driven approaches to information extraction from documents face problems of relevance determination, namely determining which documents are of requisite type, structure, and content for a specified query and context. In this paper, we discuss the problem of learning to filter documents crawled from the web with respect to such relevance criteria, and in particular how to annotate document corpora for supervised classification learning approaches to this problem. For context, we describe a system aimed at extracting experimental data from scientific publications, with the long-term goal of extracting procedural information from relevant sections on experimental methodology. We consider motivating use cases for our learning filter, using the documents passed by the filter: marking up sections (or passages); capturing entities and relationships; and explaining to a domain expert why a document is relevant. These distinct use cases make the annotation task multi-faceted. Our approach focuses on speeding up annotation in learning to filter while minimizing loss of precision or recall on the learning task, using a reconfigurable user interface. We develop such an interface, report on its use in tandem with classification on a real extraction task, and discuss extensions of this work to visual scene filtering and annotation.","PeriodicalId":248656,"journal":{"name":"2017 International Conference on Machine Learning and Data Science (MLDS)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115415724","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Classification of Cancerous Profiles Using Machine Learning","authors":"Aman Sharma, Rinkle Rani","doi":"10.1109/mlds.2017.6","DOIUrl":"https://doi.org/10.1109/mlds.2017.6","url":null,"abstract":"There are a variety of options available for cancer treatment. The type of treatment recommended for an individual is influenced by various factors such as cancer-type, the severity of a cancer (stage) and most important the genetic heterogeneity. In such a complex environment, the targeted drug treatments are likely to be irresponsive or respond differently. To study anti-cancer drug response we need to understand cancerous profiles. These cancerous profiles carry information which can reveal the underlying factors responsible for cancer growth. Hence, there is need to analyze cancer data for predicting optimal treatment options. Analysis of such profiles can help to predict and discover potential drug targets and drugs. In this paper the main aim is to provide machine learning based classification technique for cancerous profiles.","PeriodicalId":248656,"journal":{"name":"2017 International Conference on Machine Learning and Data Science (MLDS)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117065890","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Comprehensive Review On Supervised Machine Learning Algorithms","authors":"Rishabh Choudhary, Hemant Kumar Gianey","doi":"10.1109/MLDS.2017.11","DOIUrl":"https://doi.org/10.1109/MLDS.2017.11","url":null,"abstract":"Machine learning is an area of computer science in which the computer predicts the next task to perform by analyzing the data provided to it. The data accessed by the computer can be in the form of digitized training sets or via interaction with the environment. The algorithms of machine learning are constructed in such a way as to learn and make predictions from the data unlike the static programming algorithms that need explicit human instruction. There have been different supervised and unsupervised techniques proposed in order to solve problems, such as, Rule-based techniques, Logic-based techniques, Instance-based techniques, stochastic techniques. The primary objective of our paper is to provide a general comparison among various state-of-the-art supervised machine learning algorithms.","PeriodicalId":248656,"journal":{"name":"2017 International Conference on Machine Learning and Data Science (MLDS)","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129317422","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}