{"title":"A Formal Design for the Lexical and Syntax Analyzer of a Pedagogically Effective Subset of C++","authors":"M. Farooq, A. Abid, R. Fox","doi":"10.1109/ICMLA.2016.0074","DOIUrl":"https://doi.org/10.1109/ICMLA.2016.0074","url":null,"abstract":"In this article, we have argued that a programming language can be improved for both teaching and learning by extracting its simpler subset, and by enforcing some useful constraints. We have further chosen a well known first programming language C++, and have defined its pedagogically effective subset, named Eazy, for teaching a first course in computer programming, generally known as CS1. In order to enforce the usage of the defined subset and to apply the constraints we need to modify the preprocessor of the language. To this end, we present a formal design for the lexical analyzer, and syntax analyzer for Eazy.","PeriodicalId":356182,"journal":{"name":"2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116732281","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Applying the Meta-heuristic Prediction Algorithm for Modeling Power Density in Wind Power Plant","authors":"H. Kahraman, M. Ayaz, I. Colak, R. Bayindir","doi":"10.1109/ICMLA.2016.0079","DOIUrl":"https://doi.org/10.1109/ICMLA.2016.0079","url":null,"abstract":"In this paper, a robust artificial intelligence (AI) algorithm is applied to overcome challenges at power density prediction especially at the installation process of wind power plant. This algorithm also explores relationships between the meteorological parameters and power density. Importance degree of parameters on power density is converted numerical weighting values independently from each other. Thus, the effects of the wind speed, the wind direction, the temperature, the damp, the pressure on power density could be modelled. Besides, experimental study shows that the prediction accuracy and stability of the applied method superior than traditional AI-based techniques.","PeriodicalId":356182,"journal":{"name":"2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"113 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116224861","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Raed I. Seetan, J. Bible, Michael Karavias, Wael Seitan, S. Thangiah
{"title":"Consensus Clustering: A Resampling-Based Method for Building Radiation Hybrid Maps","authors":"Raed I. Seetan, J. Bible, Michael Karavias, Wael Seitan, S. Thangiah","doi":"10.1109/ICMLA.2016.0047","DOIUrl":"https://doi.org/10.1109/ICMLA.2016.0047","url":null,"abstract":"Building Radiation Hybrid (RH) maps is a challenging process. Traditional RH mapping techniques are very time consuming, and do not work well on noisy datasets. In this presented research, we propose a new approach that uses resampling technique with consensus clustering technique to filter out unreliable markers, and build robust RH maps in a short time. The main aims of using the proposed approach is: first to reduce the mapping computational complexity, thus speeding up the mapping process. And second, to filter out unreliable markers, and map the remaining reliable markers to build robust maps. The proposed approach maps RH datasets in four steps, as follows: 1) uses Jackknife resampling technique to resample the RH dataset, and groups all resampled datasets into clusters. 2) Builds consensus clusters and filters out unreliable markers. 3) Maps the consensus clusters. 4) Connects the consensus clusters' maps to form the final map. To demonstrate the performance of our proposed approach, we compare the accuracy of the constructed maps with the corresponding physical maps. Also, we compare the running time of our constructed maps with the Carthagene tool maps running time. The results show that the proposed approach can construct robust maps in a comparatively very short time.","PeriodicalId":356182,"journal":{"name":"2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125328813","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ebberth L. Paula, M. Ladeira, Rommel N. Carvalho, Thiago Marzagão
{"title":"Deep Learning Anomaly Detection as Support Fraud Investigation in Brazilian Exports and Anti-Money Laundering","authors":"Ebberth L. Paula, M. Ladeira, Rommel N. Carvalho, Thiago Marzagão","doi":"10.1109/ICMLA.2016.0172","DOIUrl":"https://doi.org/10.1109/ICMLA.2016.0172","url":null,"abstract":"Normally exports of goods and products are transactions encouraged by the governments of countries. Typically these incentives are promoted by tax exemptions or lower tax collections. However, exports fraud may occur with objectives not related to tax evasion, for example money laundering. This article presents the results obtained in implementing the unsupervised Deep Learning model to classify Brazilian exporters regarding the possibility of committing fraud in exports. Assuming that the vast majority of exporters have explanatory features of their export volume which interrelate in a standard way, we used the AutoEncoder to detect anomalous situations with regards to the data pattern. The databases used in this work come from exports of goods and products that occurred in Brazil in 2014, provided by the Secretariat of Federal Revenue of Brazil. From attributes that characterize export companies, the model was able to detect anomalies in at least twenty exporters.","PeriodicalId":356182,"journal":{"name":"2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"129 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125805336","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Hourly Solar Irradiance Forecasting Based on Machine Learning Models","authors":"F. Melzi, Taieb Touati, A. Samé, L. Oukhellou","doi":"10.1109/ICMLA.2016.0078","DOIUrl":"https://doi.org/10.1109/ICMLA.2016.0078","url":null,"abstract":"In recent years, many research studies are conducted into the use of smart meters data for developping decision-making tools including both analytical, forecasting and display purposes. Forecasting energy generation or forecasting energy consumption demand are indeed central problems for urban stakeholders (electricity companies and urban planners). These issues are helpful to allow them ensuring an efficient planning and optimization of energy resources. This paper investigates the problem for forecasting the hourly solar irradiance within a Machine Learning (ML) framework using Similarity method (SIM), Support Vector Machine (SVM) and Neural Network (NN). These approaches rely on a methodology which takes into account the previous hours of the predicting day and also the days having the same number of sunshine hours in the history. The study is conducted on a real data set collected on the Paris suburb of Alfortville. A comparison with two time series approaches namely Naive method and Autoregressive Moving Average Model (ARMA) is performed. This study is the first step towards the development of the hourly solar irradiance forecasting hybrid models.","PeriodicalId":356182,"journal":{"name":"2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126125679","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Automatic Algorithm Selection in Computational Software Using Machine Learning","authors":"M. Simpson, Qing Yi, J. Kalita","doi":"10.1109/ICMLA.2016.0064","DOIUrl":"https://doi.org/10.1109/ICMLA.2016.0064","url":null,"abstract":"Computational software programs, such as Maple and Mathematica, heavily rely on superfunctions and meta-algorithms to select the optimal algorithm for a given task. These meta-algorithms may require intensive mathematical proof to formulate, incur large computational overhead, or fail to consistently select the best algorithm. Machine learning demonstrates a promising alternative for automatic algorithm selection by easing the design process and overhead while also attaining high accuracy in selection. In a case study on the resultant superfunction, a trained neural network is able to select the best algorithm out of the four available 86% of the time in Maple and 78% of the time in Mathematica. When used as a replacement for pre-existing meta-algorithms, the neural network brings about a 68% runtime improvement in Maple and 49% improvement in Mathematica. Random forests, k-nearest neighbors, and both linear and RBF kernel SVMs are also compared to the neural network model, the latter of which offers the best performance out of the tested machine learning methods.","PeriodicalId":356182,"journal":{"name":"2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122007653","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Bag of Bags: Nested Multi Instance Classification for Prostate Cancer Detection","authors":"F. Khalvati, Junjie Zhang, A. Wong, M. Haider","doi":"10.1109/ICMLA.2016.0032","DOIUrl":"https://doi.org/10.1109/ICMLA.2016.0032","url":null,"abstract":"Computer-aided detection (CAD) algorithms have been proposed for auto-detection of different types of cancer. CAD algorithms rely on machine learning methods to classify regions of interest in images into cancerous and healthy regions. In cancer screening, the foremost problem to solve is whether a patient has cancer, regardless of the location of cancerous regions in the organ. This allows early detection of the disease leading to a right course of action in terms of treatment to be taken. In machine learning, this problem has been formulated as multi-instance learning (MIL) where bags of instances are classified rather than the individual instances. In this paper, we propose a bag of bags (BoB) nested MIL algorithm where high-level bags (or parent bags), each contains multiple smaller bags of instances. We applied the proposed BoB MIL algorithm to prostate cancer detection problem using magnetic resonance imaging data to first detect which patients have cancer and consequently, to detect which slices in the 3D volume imaging data of the detected patients contain cancerous regions. Experimental results obtained from the imaging data of 30 patients with ground-truth data based on biopsy results show that the proposed algorithm is not only capable of detecting prostate cancer at patient level, it is also able to detect the cancerous regions at slice level of imaging data with high accuracy.","PeriodicalId":356182,"journal":{"name":"2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124322360","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Spatial Dependency and Hedonic Housing Regression Model","authors":"T. Oladunni, Sharad Sharma","doi":"10.1109/ICMLA.2016.0097","DOIUrl":"https://doi.org/10.1109/ICMLA.2016.0097","url":null,"abstract":"The location of a real estate property has a considerable impact on its appraised value. Accounting for geograph-ical information eliminates some reducible errors in the accuracy of a hedonic housing regression model. An im-proved performance will benefit home buyers, sellers, government and real estate professionals. This paper investigates the spatial dependency and substitutability of submarket and geospatial attributes in a hedonic housing regression model using mutual information (MI) and variance inflation factor (VIF). Best subset linear regression and regression tree predictive models were built as learning algorithms. Bayesian Information Criterion (BIC) and Residual Mean Deviance (RDM) measured the performance of the linear regression and regression trees respectively. The BIC of the linear regression model indicated a best fit at 14 and 11 variables for submarket and geospatial models respectively. Optimization of the submarket tree was attained with 9 parameters comprising of 15 terminal nodes, while 7 parameters comprising of 13 terminal nodes achieved optimization in the geospa-tial tree. While geospatial models have a slight edge over the submarket model, the experiment suggested the substi-tutability of the models. The dataset consisted of single family's homes in 8 counties between January and De-cember 2006 extracted from the Multiple Listing Service repository.","PeriodicalId":356182,"journal":{"name":"2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121816234","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Learning Fairness under Constraints: A Decentralized Resource Allocation Game","authors":"Qinyun Zhu, J. Oh","doi":"10.1109/ICMLA.2016.0043","DOIUrl":"https://doi.org/10.1109/ICMLA.2016.0043","url":null,"abstract":"We study multi-type resource allocation in multi-agent system, where some constraints are enforced upon resource providers and users. These constraints are limitations of resource types and connection availabilities, which may make the collaboration between agents infeasible. We discuss the notion of distributed resource fairness under these constraints. Then we propose a game theory and reinforcement learning based solution for collaborative resource allocation, so that resources are assigned to users fairly and tasks are assigned to resource agents efficiently. We utilize data from Google data center as our input to simulations. Results show that our learning approach outperforms a greedy and random explorations in terms of resource utilization and fairness.","PeriodicalId":356182,"journal":{"name":"2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121551218","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Elisabeth Baseman, S. Blanchard, Zongze Li, Song Fu
{"title":"Relational Synthesis of Text and Numeric Data for Anomaly Detection on Computing System Logs","authors":"Elisabeth Baseman, S. Blanchard, Zongze Li, Song Fu","doi":"10.1109/ICMLA.2016.0158","DOIUrl":"https://doi.org/10.1109/ICMLA.2016.0158","url":null,"abstract":"Monitoring high performance computing systems has become increasingly difficult as researchers and system analysts face the challenge of synthesizing a wide range of monitoring information in order to detect system problems on ever larger machines. We present a method for anomaly detection on syslog data, one of the most important data streams for determining system health. Syslog messages pose a difficult question for analysis because they include a mix of structured natural language text as well as numeric values. We present an anomaly detection framework that combines graph analysis, relational learning, and kernel density estimation to detect unusual syslog messages. We design an event block detector, which finds groups of related syslog messages, to retrieve the entire section of syslog messages associated with a single anomalous line. Our novel approach successfully retrieves anomalous behaviors inserted into syslog files from a virtual machine, including messages indicating serious system problems. We also test our approach on syslog messages from the Trinity supercomputer and find that our methods do not generate significant false positives.","PeriodicalId":356182,"journal":{"name":"2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"383 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131562027","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}