{"title":"Multi-query Optimization in Federated Databases Using Evolutionary Algorithm","authors":"Sameen Mansha, F. Kamiran","doi":"10.1109/ICMLA.2015.125","DOIUrl":"https://doi.org/10.1109/ICMLA.2015.125","url":null,"abstract":"Multi Query Optimization in federated database systems is a well-studied area. Studies have shown that similar problem arises in wide range of applications, e.g., distributed stream processing systems and wireless sensor networks. In this paper, a general distributed multiquery processing problem motivated by the need to speedup data acquisition in federated databases using evolutionary algorithm is studied. We setup a simple framework in which each individual in population is evolved in terms of cost, uniform labeling of hyper edges and validity of resource constraints through a number of generations. Variations of our general problem can be shown to be NP-Hard. Our extensive empirical evaluation over five different synthetic datasets shows a significant improvement of 8 percent in results as compared to the state-of-the-art methods.","PeriodicalId":288427,"journal":{"name":"2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127831019","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Population Migration Using Dominance in Multi-population Cultural Algorithms","authors":"Santosh Upadhyayula, Ziad Kobti","doi":"10.1109/ICMLA.2015.102","DOIUrl":"https://doi.org/10.1109/ICMLA.2015.102","url":null,"abstract":"In this study we introduce a new method to enable the migration of individuals from one population to another using the concept of dominance in Multi-Population Cultural Algorithms (MPCA's). The MPCA's artificial population comprises of agents that belong to a certain sub-population. Multiple sub-populations are generated, each running its own Cultural Algorithm (CA). In this work we create a dominance-MPCA (D-MPCA) with a network of populations that implements a dominance strategy. We hypothesize that the evolutionary advantage of dominance can help improve the performance of MPCA in general optimization problems. The Sphere function from the CEC 2013 benchmark optimization functions is used to calculate the fitness value of the individuals. We observe how the populations adapt to the changes. Preliminary results show improved performance in our proposed D-MPCA over traditional MPCA.","PeriodicalId":288427,"journal":{"name":"2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA)","volume":"411 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127599696","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Probabilistic Graphical Models and Deep Belief Networks for Prognosis of Breast Cancer","authors":"M. Khademi, N. Nedialkov","doi":"10.1109/ICMLA.2015.196","DOIUrl":"https://doi.org/10.1109/ICMLA.2015.196","url":null,"abstract":"We propose a probabilistic graphical model (PGM) for prognosis and diagnosis of breast cancer. PGMs are suitable for building predictive models in medical applications, as they are powerful tools for making decisions under uncertainty from big data with missing attributes and noisy evidence. Previous work relied mostly on clinical data to create a predictive model. Moreover, practical knowledge of an expert was needed to build the structure of a model, which may not be accurate. In our opinion, since cancer is basically a genetic disease, the integration of microarray and clinical data can improve the accuracy of a predictive model. However, since microarray data is high-dimensional, including genomic variables may lead to poor results for structure and parameter learning due to the curse of dimensionality and small sample size problems. We address these problems by applying manifold learning and a deep belief network (DBN) to microarray data. First, we construct a PGM and a DBN using clinical and microarray data, and extract the structure of the clinical model automatically by applying a structure learning algorithm to the clinical data. Then, we integrate these two models using softmax nodes. Extensive experiments using real-world databases, such as METABRIC and NKI, show promising results in comparison to Support Vector Machines (SVMs) and k-Nearest Neighbors (k-NN) classifiers, for classifying tumors and predicting events like recurrence and metastasis.","PeriodicalId":288427,"journal":{"name":"2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA)","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127400397","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Hybrid Method for Intrusion Detection","authors":"Yavuz Canbay, Ş. Sağiroğlu","doi":"10.1109/ICMLA.2015.197","DOIUrl":"https://doi.org/10.1109/ICMLA.2015.197","url":null,"abstract":"Intrusion Detection Systems (IDSs) are used to detect malicious actions on information systems such as computing and networking systems. Abnormal behaviors or activities on the network systems could be detected by security systems. But, conventional security systems such as anti-virus and firewall cannot be successful in many malicious actions. To overcome this problem, better and more intelligent IDS solutions are required. In this study, a hybrid approach was proposed to use to detect network attacks. Genetic Algorithm (GA) and K-Nearest Neighbor (KNN) methods were combined to model and detect the attacks. KNN was employed to classify the attacks and GA was used to select k neighbors of an attack sample. This hybrid system was first applied in intrusion detection field. The system provides advantages such as, decreasing dependency of full training data set and providing plausible solution for intrusion detection. The results showed that the proposed system provides better results than single system.","PeriodicalId":288427,"journal":{"name":"2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA)","volume":"76 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114942266","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The Effect of Dataset Size on Training Tweet Sentiment Classifiers","authors":"Joseph D. Prusa, T. Khoshgoftaar, Naeem Seliya","doi":"10.1109/ICMLA.2015.22","DOIUrl":"https://doi.org/10.1109/ICMLA.2015.22","url":null,"abstract":"Using automated methods of labeling tweet sentiment, large volumes of tweets can be labeled and used to train classifiers. Millions of tweets could be used to train a classifier, however, doing so is computationally expensive. Thus, it is valuable to establish how many tweets should be utilized to train a classifier, since using additional instances with no gain in performance is a waste of resources. In this study, we seek to find out how many tweets are needed before no significant improvements are observed for sentiment analysis when adding additional instances. We train and evaluate classifiers using C4.5 decision tree, Naïve Bayes, 5 Nearest Neighbor and Radial Basis Function Network, with seven datasets varying from 1000 to 243,000 instances. Models are trained using four runs of 5-fold cross validation. Additionally, we conduct statistical tests to verify our observations and examine the impact of limiting features using frequency. All learners were found to improve with dataset size, with Naïve Bayes being the best performing learner. We found that Naïve Bayes did not significantly benefit from using more than 81,000 instances. To the best of our knowledge, this is the first study to investigate how learners scale in respect to dataset size with results verified using statistical tests and multiple models trained for each learner and dataset size. Additionally, we investigated using feature frequency to greatly reduce data grid size with either a small increase or decrease in classifier performance depending on choice of learner.","PeriodicalId":288427,"journal":{"name":"2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA)","volume":"94 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133588966","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"BreakFast: Analyzing Celerity of News","authors":"Shuguang Wang, Eui-Hong Han","doi":"10.1109/ICMLA.2015.25","DOIUrl":"https://doi.org/10.1109/ICMLA.2015.25","url":null,"abstract":"In the hypercompetitive news market, news outlets race to break news first. In order to provide better breaking news service and improve the reader experience, news agencies need to understand how to identify bottlenecks and streamline their reporting and delivery processes. With that in mind, we built a system, BreakFast, to measure and compare the speed of delivery of breaking news from various news sources to readers. One of the primary challenges of this comparison is how to identify which breaking news items are about the same emerging event but reported by different news agencies with different headlines and content. To tackle this problem, we extracted keywords automatically from the content, identified important topics, and then developed a classification model. The model identifies the same breaking stories from multiple news sources with an accuracy of approximately 90%. We also proposed new metrics to evaluate the speed of breaking news services and built real-time dashboards to monitor performance over time. We deployed BreakFast into the breaking news service at The Washington Post. This integrated system narrowed in on bottlenecks in its breaking news generation and delivery process, and improved its breaking news service in terms of time by more than 50%.","PeriodicalId":288427,"journal":{"name":"2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116575222","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Data-Driven Kernels via Semi-supervised Clustering on the Manifold","authors":"Jared Lundell, Charles DuHadway, D. Ventura","doi":"10.1109/ICMLA.2015.135","DOIUrl":"https://doi.org/10.1109/ICMLA.2015.135","url":null,"abstract":"We present an approach to transductive learning that employs semi-supervised clustering of all available data (both labeled and unlabeled) to produce a data-dependent SVM kernel. In the general case where the domain includes irrelevant or redundant attributes, we constrain the clustering to occur on the manifold prescribed by the data (both labeled and unlabeled). Empirical results show that the approach performs comparably to more traditional kernels while providing significant reduction in the number of support vectors used. Further, the kernel construction technique provides some of the benefits that would normally be provided by dimensionality reduction preprocessing step.","PeriodicalId":288427,"journal":{"name":"2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131735096","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
P. Sharma, Gary Holness, Y. Markushin, N. Melikechi
{"title":"A Family of Chisini Mean Based Jensen-Shannon Divergence Kernels","authors":"P. Sharma, Gary Holness, Y. Markushin, N. Melikechi","doi":"10.1109/ICMLA.2015.86","DOIUrl":"https://doi.org/10.1109/ICMLA.2015.86","url":null,"abstract":"Jensen-Shannon divergence is an effective method for measuring the distance between two probability distributions. When the difference between these two distributions is subtle, Jensen-Shannon divergence does not provide adequate separation to draw distinctions from subtly different distributions. We extend Jensen-Shannon divergence by reformulating it using alternate operators that provide different properties concerning robustness. Furthermore, we prove a number of important properties for this extension: the lower limits of its range, and its relationship to Shannon Entropy and Kullback-Leibler divergence. Finally, we propose a family of new kernels, based on Chisini mean Jensen-Shannon divergence, and demonstrate its utility in providing better SVM classification accuracy over RBF kernels for amino acid spectra. Because spectral methods capture phenomenon at subatomic levels, differences between complex compounds can often be subtle. While the impetus behind this work began with spectral data, the methods are generally applicable to domains where subtle differences are important.","PeriodicalId":288427,"journal":{"name":"2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA)","volume":"79 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134301803","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Martha Ganser, Sauptik Dhar, Unmesh Kurup, Carlos Cunha, Aca Gacic
{"title":"Patient Identification for Telehealth Programs","authors":"Martha Ganser, Sauptik Dhar, Unmesh Kurup, Carlos Cunha, Aca Gacic","doi":"10.1109/ICMLA.2015.100","DOIUrl":"https://doi.org/10.1109/ICMLA.2015.100","url":null,"abstract":"Telehealth provides an opportunity to reduce healthcare costs through remote patient monitoring, but is not appropriate for all individuals. Our goal was to identify the patients for whom telehealth has the greatest impact. Challenges included the high variability of medical costs and the effect of selection bias on the cost difference between intervention patients and controls. Using Medicare claims data, we computed cost savings by comparing each telehealth patient to a group of control patients who had similar healthcare resource utilization. These estimates were then used to train a predictive model using logistic regression. Filtering the patients based on the model resulted in an average cost savings of $10K, an improvement over the current expected loss of $2K (without filtering).","PeriodicalId":288427,"journal":{"name":"2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA)","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133214603","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"NewsCubeSum: A Personalized Multidimensional News Update Summarization System","authors":"Dingding Wang, Lei Li, Tao Li","doi":"10.1109/ICMLA.2015.129","DOIUrl":"https://doi.org/10.1109/ICMLA.2015.129","url":null,"abstract":"Popular online publishers produce huge amount of news articles every day, so it is important to summarize the most up-to-the-minute information to help users quickly know the progresses of their interested news events. In this paper, we develop NewsCubeSum, a novel personalized news summarization system utilizing OLAP and supervised sentence selection techniques to generate brief summaries delivering news updates in multiple dimensions (such as time, entity, and topic). An illustrative case study and experimental results on summarization performance comparisons are provided to show the effectiveness of NewsCubeSum.","PeriodicalId":288427,"journal":{"name":"2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA)","volume":"466 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113982412","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}