{"title":"A Long-Short Term Memory Network for Detecting CRISPR Arrays","authors":"Shantanu Deshmukh, P. Heller, Natalia Khuri","doi":"10.1109/ICMLA.2019.00114","DOIUrl":"https://doi.org/10.1109/ICMLA.2019.00114","url":null,"abstract":"Clustered Regularly Interspaced Short Palindromic Repeat is a pattern found in the DNA sequences of some archeal and bacterial organisms. Together with CRISPR associated genes, CRISPR arrays provide immunity against phages and other mobile exogenous elements. CRISPR-based immunity mechanism can be manipulated to perform genome editing at low cost. To improve the specificity of CRISPR-based genome editing, better software and experimental tools are needed, and accurate detection of CRISPR arrays in DNA sequences is the first step toward this goal. In this work, a CRISPR array detection pipeline, CRISPRLstm, is presented that leverages the power of artificial intelligence. More specifically, Long-Short Term Memory models are used to discriminate between valid and invalid arrays. The predictions by CRISPRLstm are better or in good agreement with other freely available tools, and CRISPRLstm outperforms Random forest classifier in identifying valid repeat sequences. CRISPRLstm predictor is publicly available as a web-based application with an interactive user interface.","PeriodicalId":436714,"journal":{"name":"2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA)","volume":"110 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116020922","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Radar Gesture Recognition System in Presence of Interference using Self-Attention Neural Network","authors":"Souvik Hazra, Avik Santra","doi":"10.1109/ICMLA.2019.00230","DOIUrl":"https://doi.org/10.1109/ICMLA.2019.00230","url":null,"abstract":"Gesture recognition provides an easy, convenient and intuitive way of remotely controlling several consumer electronics devices such as audio devices, television sets, projector or gaming consoles. In recent years, radar sensors have been shown to be effective sensing modality to sense and recognize fine-grained dynamic finger-gestures in watch or smartphone and thus offers an user-friendly human-computer interface in ultrashort range applications. However, hand-gesture recognition from a farther distance such as to control consumer devices like TV or projector pose challenge particularly arising due to interferences from multiple humans in the field of view. In this paper, we present a novel unguided spatio-Doppler attention mechanism to enable hand-gesture recognition in presence of multiple humans using a low power, compact 60-GHz FMCW radar operated in 500MHz ISM frequency band. The spatio-Doppler mechanism in 2D deep convolutional neural network with long short term memory (2D CNN-LSTM) makes use of the range-Doppler images and range-angle images. We experimentally present the classification accuracy of 94.75% of our proposed system on test dataset using eight gestures, namely wave, push forward, pull, left swipe, right swipe, clockwise rotate, anti-clockwise rotate, cross, in presence of interfering people, such as walking or arbitrary movements.","PeriodicalId":436714,"journal":{"name":"2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116330452","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Anyone here? Smart Embedded Low-Resolution Omnidirectional Video Sensor to Measure Room Occupancy","authors":"T. Callemein, Kristof Van Beeck, T. Goedemé","doi":"10.1109/ICMLA.2019.00319","DOIUrl":"https://doi.org/10.1109/ICMLA.2019.00319","url":null,"abstract":"In this paper, we present a room occupancy sensing solution with unique properties: (i) It is based on an omnidirectional vision camera, capturing rich scene info over a wide angle, enabling to count the number of people in a room and even their position. (ii) Although it uses a camera-input, no privacy issues arise because its extremely low image resolution, rendering people unrecognisable. (iii) The neural network inference is running entirely on a low-cost processing platform embedded in the sensor, reducing the privacy risk even further. (iv) Limited manual data annotation is needed, because of the self-training scheme we propose. Such a smart room occupancy rate sensor can be used in e.g. meeting rooms and flex-desks. Indeed, by encouraging flex-desking, the required office space can be reduced significantly. In some cases, however, a flex-desk that has been reserved remains unoccupied without an update in the reservation system. A similar problem occurs with meeting rooms, which are often under-occupied. By optimising the occupancy rate a huge reduction in costs can be achieved. Therefore, in this paper, we develop such system which determines the number of people present in office flex-desks and meeting rooms. Using an omnidirectional camera mounted in the ceiling, combined with a person detector, the company can intelligently update the reservation system based on the measured occupancy. Next to the optimisation and embedded implementation of such a self-training omnidirectional people detection algorithm, in this work we propose a novel approach that combines spatial and temporal image data, improving performance of our system on extreme low-resolution images.","PeriodicalId":436714,"journal":{"name":"2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114307277","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Graph-Based Analysis of Similarities between Word Frequency Distributions of Various Corpora for Complex Word Identification","authors":"Yo Ehara","doi":"10.1109/ICMLA.2019.00317","DOIUrl":"https://doi.org/10.1109/ICMLA.2019.00317","url":null,"abstract":"Complex word identification (CWI) is a fundamental task in educational NLP and applied linguistics which involves the identification of complex words in a text for various applications, including text simplification. Recent studies have independently reported that when word-frequency features from some uncommon corpora are used in combination with those from a general corpus, they improve the CWI accuracy; this suggests that they can be used as adjustments for a general corpus. However, although previous studies have analyzed similarity values between each pair of corpora, the significance of the similarity in the entire set of corpora is unclear. This complicates the analysis of the combination of general and uncommon corpora aimed at improving CWI accuracy; thus, the search for effective types of corpora would have to be exhaustive. To contribute to a better understanding and a non-exhaustive search, this paper proposes a novel graph-based analysis method. We first calculate various similarities among the word frequency distributions of various corpora in an unsupervised manner. Subsequently, we regard each similarity as a weighted graph and analyze the importance of a pair of corpora, or an edge, within the entire graph structure. Through our experiments, it was found that our analysis method can successfully explain why the previously reported combinations of corpora were effective; Furthermore, it can find effective corpus combinations.","PeriodicalId":436714,"journal":{"name":"2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA)","volume":"99 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114502385","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
K. Gadiraju, B. Ramachandra, Ashwin Shashidharan, B. Dutton, Ranga Raju Vatsavai
{"title":"Scalable Data Parallel Approaches to Anomaly Detection in Climate Data using Gaussian Processes","authors":"K. Gadiraju, B. Ramachandra, Ashwin Shashidharan, B. Dutton, Ranga Raju Vatsavai","doi":"10.1109/ICMLA.2019.00090","DOIUrl":"https://doi.org/10.1109/ICMLA.2019.00090","url":null,"abstract":"Anomaly detection on large scale spatio-temporal data such as climate data is a challenging task depending on the spatial and temporal resolution and autocorrelation of the data. When considering global gridded daily temperature data, the number of locations and the length of time period considered makes anomaly detection a big data problem. Gaussian Process (GP) Learning is a method that is well-suited to identify the complex spatial and temporal autocorrelation properties of spatio-temporal data. One of the primary challenges with using GP is the computational complexity associated with inverting a covariance matrix. This is further compounded when considering data on a national/global scale and performing anomaly detection using such methods often requires dedicated high performance computing platforms. In this paper, we describe a purely temporal scalable anomaly detection technique for gridded temperature data based on GP Learning that ignore the spatial autocorrelation between neighboring grids and perform anomaly detection on each of the grids in parallel, thereby reducing the execution time. We introduce three methods: a standalone data parallel approach using a single GPU, a distributed memory version on multi-node clusters using MPI, and a mixed parallel approach using multiple GPUs. In comparison to a sequential approach, they are 17.2x, 47.1x, and 88.9x faster, respectively.","PeriodicalId":436714,"journal":{"name":"2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA)","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116059618","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Anuj Dimri, Suraj Yerramilli, Peng Lee, Sardar Afra, Andrew Jakubowski
{"title":"Enhancing Claims Handling Processes with Insurance Based Language Models","authors":"Anuj Dimri, Suraj Yerramilli, Peng Lee, Sardar Afra, Andrew Jakubowski","doi":"10.1109/ICMLA.2019.00284","DOIUrl":"https://doi.org/10.1109/ICMLA.2019.00284","url":null,"abstract":"Insurance companies manage a large number of claims on a daily basis as new claims are reported and existing claims are serviced. A key component for servicing a claim is the ability for Claims personnel to enter in raw text, aka claims notes. Claims notes contain invaluable information often beyond that of structured data, capturing this information in a machine learning setting offers remarkable benefits to many downstream tasks in a Claims department. The ability to leverage claims notes enables an insurance company not only to make data-driven and insightful decisions while handling claims, but to create value through working more efficiently and serve their customers more effectively. To best leverage the information contained claims notes, we develop insurance-based language models (IBLMs) by further pre-training existing general domain language models (ULMFiT and BERT) on a large number of claim notes with enhanced vocabulary. Furthermore, we tested these IBLMs against three downstream binary classification tasks: (1) identification of auto claims with attorney retention, (2) bodily injury prediction, and (3) auto claims fraud investigation detection. We train different classifiers based on claims notes available on day 1 and through day 10 from when the claim was reported. We found that IBLMs show a significant improvement over the traditional classification approaches. Further, we provide practical insight into how an insurance company might use these models through the analysis of volume (capacity) thresholds.","PeriodicalId":436714,"journal":{"name":"2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126620661","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Exploiting Stereo Sound Channels to Boost Performance of Neural Network-Based Music Transcription","authors":"Xian Wang, Lingqiao Liu, Javen Qinfeng Shi","doi":"10.1109/ICMLA.2019.00220","DOIUrl":"https://doi.org/10.1109/ICMLA.2019.00220","url":null,"abstract":"In recent years deep learning begins to show great potential for automatic music transcription that reproduces MIDI-like music composition information, such as note pitches and onset and offset times, from music recordings. In the literature without exception the two stereo sound channels coming with music recordings were averaged into a single channel to alleviate the computation overhead, which, from an entropy standpoint, definitely sacrifices information. In this paper we propose a method to properly combine the two sound channels for deep learning-based pitch detection. In particular, through modifying the loss function the network is forced to focus on the worse performing sound channel. This method achieves start-of-the-art frame-wise pitch detection performance on the MAPS dataset.","PeriodicalId":436714,"journal":{"name":"2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126669994","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Cícero A. L. Pahins, Fabrício D'Morison, Thiago M. Rocha, Larissa M. Almeida, Arthur F. Batista, Diego F. Souza
{"title":"T-REC: Towards Accurate Bug Triage for Technical Groups","authors":"Cícero A. L. Pahins, Fabrício D'Morison, Thiago M. Rocha, Larissa M. Almeida, Arthur F. Batista, Diego F. Souza","doi":"10.1109/ICMLA.2019.00154","DOIUrl":"https://doi.org/10.1109/ICMLA.2019.00154","url":null,"abstract":"With ever-larger software development systems involving more people with different skills, it is necessary to think about the process of bug assignment to groups of developers and not just to a developer alone. This work aims to leverage Bug Triage Process by suggesting a list of specialized groups of developers, or Technical Groups (TG's), to be attributed to a new bug report, based on other bugs that are similar and have been resolved by these TG's in the past. In the dataset used in our experiments, the mean time to correctly assign bug reports to their proper TG is 14 days, and just by then, the bug fixing process starts. This is a critical problem for software development and management since issues tend to accumulate a high-resolution time, which compromises developer performance and deliveries. In order to enhance the Bug Triage Process, we propose T-REC, an auxiliary SW Project Management system that accurately and efficiently analyzes similar issues to provide personalized TG recommendation. T-REC is a method that ensemble Machine Learning (ML) and Information Retrieval (IR) algorithms to recommend a list of TG's to handle an issue. Our experiments show that T-REC recommendation reaches an overall Acc@1 of 50.9%, Acc@2 of 63.2%, Acc@5 of 76.1%, Acc@10 of 83.6%, and Acc@20 of 89.7%. To the best of our knowledge, our work is the first to associate multiple machine learning strategies (classifiers, attributes, and training history) on the prediction of specialized groups of developers. We validate our approach on a real-world dataset from a large company that comprises 9.5M mobile-related bug reports from January 2001 to January 2019.","PeriodicalId":436714,"journal":{"name":"2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126391365","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Learning Mobile Application Usage - A Deep Learning Approach","authors":"Jingyi Shen, M. O. Shafiq","doi":"10.1109/ICMLA.2019.00054","DOIUrl":"https://doi.org/10.1109/ICMLA.2019.00054","url":null,"abstract":"With more sensors embedded and functions added, mobile phones tend to be more critical to daily life. Researchers have been using the sensor data to recognize human activity these days; meanwhile, the mobile application usage prediction is also gradually brought into the spotlight. In this paper, we leveraged a state-of-the-art technique, which is LSTM, to model the mobile application usage data, also introduced a data fusion technique that eventually accomplished an over 90% of prediction accuracy. To validate the generality of our proposed solution, we applied the model on a public dataset. Our proposed solution treated the mobile application usage as a time series problem which is novel in the related field; it has the advantages of low resource consumption, short training time, as well as a generality. With the growth of users' reliance on mobile phones, mobile application usage prediction will be more useful in the future.","PeriodicalId":436714,"journal":{"name":"2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124771567","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Software Fault Prediction Based on Fault Probability and Impact","authors":"Salim Moudache, M. Badri","doi":"10.1109/ICMLA.2019.00195","DOIUrl":"https://doi.org/10.1109/ICMLA.2019.00195","url":null,"abstract":"Nowadays, software tests prioritization is a crucial task. Indeed, testing exhaustively the whole software system can be very difficult, heavily time and resources consuming. Using machine learning algorithms to predict which parts of a software system are fault-prone can help testers to focus on high-risk parts of the code and improve resources allocation. This paper aims to investigate the potential of a risk-based model to predict fault-prone classes. The risk of classes is evaluated based on two factors: the probability that a class is fault-prone and its impact on the rest of the system. We used object-oriented metrics to capture the two risk factors. The risk of a class is modeled using the Euclidean distance. We built various variants of the risk-based model using a data-set from five versions of the ANT system. We used different machine learning algorithms (Naive Bayes, J48, Random Forest, Support Vector Machines, Multilayer Perceptron and Logistic Regression) to construct various models for fault and level of severity prediction. The objective was to distinguish between classes containing trivial and high severity faults. The considered model achieves good results for binary fault prediction. In addition, the overall multi-classification of severity levels is more than acceptable.","PeriodicalId":436714,"journal":{"name":"2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA)","volume":"30 16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130450550","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}