2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA)最新文献_第6页

Multi-query Optimization in Federated Databases Using Evolutionary Algorithm 基于进化算法的联邦数据库多查询优化

2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA) Pub Date : 2015-12-01 DOI: 10.1109/ICMLA.2015.125

Sameen Mansha, F. Kamiran

引用次数: 6

Population Migration Using Dominance in Multi-population Cultural Algorithms 基于优势的多种群文化算法的种群迁移

2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA) Pub Date : 2015-12-01 DOI: 10.1109/ICMLA.2015.102

Santosh Upadhyayula, Ziad Kobti

引用次数: 1

Probabilistic Graphical Models and Deep Belief Networks for Prognosis of Breast Cancer 乳腺癌预后的概率图模型和深度信念网络

2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA) Pub Date : 2015-12-01 DOI: 10.1109/ICMLA.2015.196

M. Khademi, N. Nedialkov

{"title":"Probabilistic Graphical Models and Deep Belief Networks for Prognosis of Breast Cancer","authors":"M. Khademi, N. Nedialkov","doi":"10.1109/ICMLA.2015.196","DOIUrl":"https://doi.org/10.1109/ICMLA.2015.196","url":null,"abstract":"We propose a probabilistic graphical model (PGM) for prognosis and diagnosis of breast cancer. PGMs are suitable for building predictive models in medical applications, as they are powerful tools for making decisions under uncertainty from big data with missing attributes and noisy evidence. Previous work relied mostly on clinical data to create a predictive model. Moreover, practical knowledge of an expert was needed to build the structure of a model, which may not be accurate. In our opinion, since cancer is basically a genetic disease, the integration of microarray and clinical data can improve the accuracy of a predictive model. However, since microarray data is high-dimensional, including genomic variables may lead to poor results for structure and parameter learning due to the curse of dimensionality and small sample size problems. We address these problems by applying manifold learning and a deep belief network (DBN) to microarray data. First, we construct a PGM and a DBN using clinical and microarray data, and extract the structure of the clinical model automatically by applying a structure learning algorithm to the clinical data. Then, we integrate these two models using softmax nodes. Extensive experiments using real-world databases, such as METABRIC and NKI, show promising results in comparison to Support Vector Machines (SVMs) and k-Nearest Neighbors (k-NN) classifiers, for classifying tumors and predicting events like recurrence and metastasis.","PeriodicalId":288427,"journal":{"name":"2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA)","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127400397","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 47

A Hybrid Method for Intrusion Detection 一种混合入侵检测方法

2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA) Pub Date : 2015-12-01 DOI: 10.1109/ICMLA.2015.197

Yavuz Canbay, Ş. Sağiroğlu

引用次数: 30

The Effect of Dataset Size on Training Tweet Sentiment Classifiers 数据集大小对Tweet情感分类器训练的影响

2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA) Pub Date : 2015-12-01 DOI: 10.1109/ICMLA.2015.22

Joseph D. Prusa, T. Khoshgoftaar, Naeem Seliya

{"title":"The Effect of Dataset Size on Training Tweet Sentiment Classifiers","authors":"Joseph D. Prusa, T. Khoshgoftaar, Naeem Seliya","doi":"10.1109/ICMLA.2015.22","DOIUrl":"https://doi.org/10.1109/ICMLA.2015.22","url":null,"abstract":"Using automated methods of labeling tweet sentiment, large volumes of tweets can be labeled and used to train classifiers. Millions of tweets could be used to train a classifier, however, doing so is computationally expensive. Thus, it is valuable to establish how many tweets should be utilized to train a classifier, since using additional instances with no gain in performance is a waste of resources. In this study, we seek to find out how many tweets are needed before no significant improvements are observed for sentiment analysis when adding additional instances. We train and evaluate classifiers using C4.5 decision tree, Naïve Bayes, 5 Nearest Neighbor and Radial Basis Function Network, with seven datasets varying from 1000 to 243,000 instances. Models are trained using four runs of 5-fold cross validation. Additionally, we conduct statistical tests to verify our observations and examine the impact of limiting features using frequency. All learners were found to improve with dataset size, with Naïve Bayes being the best performing learner. We found that Naïve Bayes did not significantly benefit from using more than 81,000 instances. To the best of our knowledge, this is the first study to investigate how learners scale in respect to dataset size with results verified using statistical tests and multiple models trained for each learner and dataset size. Additionally, we investigated using feature frequency to greatly reduce data grid size with either a small increase or decrease in classifier performance depending on choice of learner.","PeriodicalId":288427,"journal":{"name":"2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA)","volume":"94 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133588966","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 42

BreakFast: Analyzing Celerity of News 早餐:分析新闻的快慢

2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA) Pub Date : 2015-12-01 DOI: 10.1109/ICMLA.2015.25

Shuguang Wang, Eui-Hong Han

{"title":"BreakFast: Analyzing Celerity of News","authors":"Shuguang Wang, Eui-Hong Han","doi":"10.1109/ICMLA.2015.25","DOIUrl":"https://doi.org/10.1109/ICMLA.2015.25","url":null,"abstract":"In the hypercompetitive news market, news outlets race to break news first. In order to provide better breaking news service and improve the reader experience, news agencies need to understand how to identify bottlenecks and streamline their reporting and delivery processes. With that in mind, we built a system, BreakFast, to measure and compare the speed of delivery of breaking news from various news sources to readers. One of the primary challenges of this comparison is how to identify which breaking news items are about the same emerging event but reported by different news agencies with different headlines and content. To tackle this problem, we extracted keywords automatically from the content, identified important topics, and then developed a classification model. The model identifies the same breaking stories from multiple news sources with an accuracy of approximately 90%. We also proposed new metrics to evaluate the speed of breaking news services and built real-time dashboards to monitor performance over time. We deployed BreakFast into the breaking news service at The Washington Post. This integrated system narrowed in on bottlenecks in its breaking news generation and delivery process, and improved its breaking news service in terms of time by more than 50%.","PeriodicalId":288427,"journal":{"name":"2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116575222","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Data-Driven Kernels via Semi-supervised Clustering on the Manifold 基于流形上半监督聚类的数据驱动核

2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA) Pub Date : 2015-12-01 DOI: 10.1109/ICMLA.2015.135

Jared Lundell, Charles DuHadway, D. Ventura

引用次数: 0

A Family of Chisini Mean Based Jensen-Shannon Divergence Kernels 一类基于Chisini均值的Jensen-Shannon散度核

2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA) Pub Date : 2015-12-01 DOI: 10.1109/ICMLA.2015.86

P. Sharma, Gary Holness, Y. Markushin, N. Melikechi

引用次数: 12

Patient Identification for Telehealth Programs 远程医疗项目的病人识别

2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA) Pub Date : 2015-12-01 DOI: 10.1109/ICMLA.2015.100

Martha Ganser, Sauptik Dhar, Unmesh Kurup, Carlos Cunha, Aca Gacic

引用次数: 2

NewsCubeSum: A Personalized Multidimensional News Update Summarization System NewsCubeSum:个性化多维新闻更新汇总系统

2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA) Pub Date : 2015-12-01 DOI: 10.1109/ICMLA.2015.129

Dingding Wang, Lei Li, Tao Li

引用次数: 0