{"title":"Robust Collaborative Fraudulent Transaction Detection using Federated Learning","authors":"Delton Myalil, M. Rajan, Manoj M. Apte, S. Lodha","doi":"10.1109/ICMLA52953.2021.00064","DOIUrl":"https://doi.org/10.1109/ICMLA52953.2021.00064","url":null,"abstract":"Fraudulent transaction detection is a difficult problem for an individual bank, since the number of fraudulent transactions within a single bank’s records is significantly less compared to the day-to-day regular transactions it processes. Hence, due to this extreme data imbalance, training a classifier is difficult. Also, the model will not be able to learn from different types of fraudulent transactions, which a single bank’s database lacks. Collaboration between banks is the only way to achieve a generalized model, but banks will not share their data with each other due to competition and regulatory restrictions. Federated Learning can be leveraged here to solve this problem. However, in a cross-silo setting like this, the data held by different banks will be different in terms of distribution and hence follows a non-IID scenario across the participants’ datasets. Moreover, we are considering that a minority of the banks could be malicious and will try to disrupt this federated learning process. Hence the problem is to perform federated learning in a non-IID setting with active adversaries involved, which is a new research area under fraud detection. We perform non-IID partitioning of the transaction dataset to simulate 10 banks or silos. Then, for benchmark, we perform federated averaging with a subset of the banks set as malicious. Furthermore, we propose a novel algorithm - Epsilon Cluster Selection, a filter-based aggregation technique to recognize and prevent malicious nodes from contributing to the global model being trained. We apply this algorithm to the same setting with malicious banks and compare the results.","PeriodicalId":6750,"journal":{"name":"2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"49 1","pages":"373-378"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88113969","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Decoder Transformer for Temporally-Embedded Health Outcome Predictions","authors":"O. Boursalie, Reza Samavi, T. Doyle","doi":"10.1109/ICMLA52953.2021.00235","DOIUrl":"https://doi.org/10.1109/ICMLA52953.2021.00235","url":null,"abstract":"Deep learning models are increasingly being used to predict patients’ diagnoses by analyzing electronic health records. Medical records represent observations of a patient’s health over time. A commonly used approach to analyze health records is to encode them as a sequence of ordered diagnoses (diagnostic-level encoding). Transformer models then analyze the sequence of diagnoses to learn disease patterns. However, the elapsed time between medical visits is not considered when transformers are used to analyze health records. In this paper, we present DT-THRE: Decoder Transformer for Temporally-Embedded Health Records Encoding that predicts patients’ diagnoses by analyzing their medical histories. In DTTHRE, instead of diagnostic-level encoding, we propose an encoding representation for health records called THRE: Temporally-Embedded Health Records Encoding. THRE encodes patient histories as a sequence of medical events such as age, sex, and diagnostic embedding while incorporating the elapsed time between visits. We evaluate a proof-of-concept DTTHRE on a real-world medical dataset and compare our model’s performance to an existing diagnostic transformer model in the literature. DTTHRE was successful on a medical dataset to predict patients’ final diagnosis with improved predictive performance (78.54± 0.22%) compared to the existing model in the literature (40.51± 0.13%).","PeriodicalId":6750,"journal":{"name":"2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"6 1","pages":"1461-1467"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88279220","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"BuiltNet: Graph based Spatio-Temporal Indoor Thermal Variation Detection","authors":"Naima Khan, Nirmalya Roy","doi":"10.1109/ICMLA52953.2021.00270","DOIUrl":"https://doi.org/10.1109/ICMLA52953.2021.00270","url":null,"abstract":"Monitoring thermal condition with thermal cameras is a potential non-intrusive way to supervise the structural well-being of buildings. Thermal variation can infer various structural damages or construction deficiencies including air leakages through inside and outside surfaces of buildings. Frequent monitoring with thermal images can track the thermal characteristics of different places of built environments which helps to prevent damages beforehand. Previous literature studied thermal conditions in buildings with thermal images are limited to specific regions with constrained environmental settings. In this work, we propose an automated scalable framework BuiltNet for analyzing spatial and temporal temperature variation over various building elements i.e., walls, windows, doors, etc. using longitudinal thermal images. We collected thermal images from a residential apartment home for 10 minutes in consecutive 4-5 hours on different days. The spatial and temporal relations among different spots in a region from sequential thermal images of the corresponding region are represented by graph. We propose an unsupervised deep clustering algorithm based on graph neural network, considering both spatial and temporal features from longitudinal thermal images. Our analysis on the spatial and temporal features of regions in the collected thermal images (from both day and night of different weather conditions) identifies the thermal variation and characterizes the spatiotemporal dynamics over different places in the built environment.","PeriodicalId":6750,"journal":{"name":"2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"67 1","pages":"1696-1703"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86116886","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Arnab Sharma, Caglar Demir, A. N. Ngomo, H. Wehrheim
{"title":"MLCHECK– Property-Driven Testing of Machine Learning Classifiers","authors":"Arnab Sharma, Caglar Demir, A. N. Ngomo, H. Wehrheim","doi":"10.1109/ICMLA52953.2021.00123","DOIUrl":"https://doi.org/10.1109/ICMLA52953.2021.00123","url":null,"abstract":"An increasing amount of software with machine learning components is being deployed. This poses the question of quality assurance for such components: how can we validate whether specified requirements are fulfilled by a machine learned software? Current testing and verification approaches either focus on a single requirement (e.g., fairness) or specialize in a single type of machine learning model (e.g., neural networks). We propose the property-driven testing of machine learning models. Our approach MLCHECK encompasses (1) a language for property specification, and (2) a technique for systematic test case generation. The specification language is comparable to property-based testing languages. The test case generation employs an elaborate verification method for a systematic, property-dependent construction of test suites, without additional user-supplied generator functions. We evaluate MLCHECK using requirements and data sets from three different application areas (software discrimination, learning on knowledge graphs and security). Our evaluation shows that in addition to its generality, MLCHECK can outperform specialised testing approaches while having a comparable runtime.","PeriodicalId":6750,"journal":{"name":"2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"82 1","pages":"738-745"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83749346","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Identification and validation of a radiomic signature for predicting survival outcomes in non-small-cell lung cancer treated with radiation therapy","authors":"Jin Li, Yixin Liu, Jingquan Wu","doi":"10.1109/ICMLA52953.2021.00095","DOIUrl":"https://doi.org/10.1109/ICMLA52953.2021.00095","url":null,"abstract":"Radiomics is a novel tool which extracts quantitative features from medical imaging, and combines key features into an image-based radiomic signature for cancer diagnostics. We aimed to develop a quantitative radiomic signature for predicting survival outcomes in non-small-cell lung cancer (NSCLC) patients treated with radiation therapy. Based on computed tomography (CT) imaging of NSCLC, we applied a forward selection procedure for the establishment of a radiomic signature in a cohort with 107 NSCLC patients treated with radiation therapy, and validated it in a dataset with 88 patients. The radiomics signatures were significantly associated with NSCLC patients’ survival time. In a Testing dataset, the predicted high risk patients had significantly shorter overall survival than the predicted low risk patients (log-rank $P=$ 0.0004, HR $=$ 2.75, 95% CIs: 1.58–4.80, C-index $=$ 0.64). Further, the novel proposed radiomic nomogram combining the radiomic signature and clinicopathological factors improved the prognostic performance. The CT-based radiomic signature exhibited a good performance for noninvasively identifying patients with NSCLC who should receive postoperative radiation therapy. These results provide a more precise reference for the accurate diagnosis and treatment of NSCLC in clinical.","PeriodicalId":6750,"journal":{"name":"2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"2 1","pages":"570-574"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82884290","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sérgio Viademonte, B. Gomes, A. Siravenha, W. Gomes, Caio Rodrigues, R. A. Tourinho
{"title":"An Unsupervised Learning Methodology for Increasing Human Productivity via VR Training","authors":"Sérgio Viademonte, B. Gomes, A. Siravenha, W. Gomes, Caio Rodrigues, R. A. Tourinho","doi":"10.1109/ICMLA52953.2021.00210","DOIUrl":"https://doi.org/10.1109/ICMLA52953.2021.00210","url":null,"abstract":"In recent years the mining industry has witnessed a steady drop in productivity. This decline has been driven by a number of factors such as inefficient workforce. Some of the reasons for workforce concerns are inexperienced workers associated with inadequate training protocols for increasing task-specific human abilities. In this study, we propose an unsupervised machine learning (ML) methodology for increasing human productivity in the mining industry via Virtual Reality (VR) training sessions. Our results reported an increase in average productivity performance for operators that are below the desired production level, which can potentially lead to significant margins of profit as well as provide a safer working environment.","PeriodicalId":6750,"journal":{"name":"2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"5 1","pages":"1294-1298"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82829337","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Orjuela-Cañón, Juan Carlos Figueroa–García, Roman Neruda
{"title":"Automated Machine Learning Strategies to Damage Identification of Neurofibromatosis Mutations","authors":"A. Orjuela-Cañón, Juan Carlos Figueroa–García, Roman Neruda","doi":"10.1109/ICMLA52953.2021.00217","DOIUrl":"https://doi.org/10.1109/ICMLA52953.2021.00217","url":null,"abstract":"Machine learning tools have been employed for problem solutions in bioinformatics. However, the parameters tuning of these models cam imply additional difficulties around the specific technique used to classify. In this work data from protein sequences was applied to three auto machine learning strategies to determine the type of mutation for the Neurofibromatosis disease. Results show that the parameters in the machine learning models were found automatically. In addition, these tools were relevant to determine relations between the amino-acids in the protein sequence.","PeriodicalId":6750,"journal":{"name":"2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"1 1","pages":"1341-1344"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90221331","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Deep Learning for Range Localization via Over-Water Electromagnetic Signals","authors":"Evan Witz, M. Barger, R. Paffenroth","doi":"10.1109/ICMLA52953.2021.00247","DOIUrl":"https://doi.org/10.1109/ICMLA52953.2021.00247","url":null,"abstract":"Neural networks are widely applied in domains such as image processing, natural language processing, and time series forecasting. However, neural networks have seen less use in problems arising in the physical sciences. This is unfortunate, since the physical domain has a wealth of problems that can benefit from application of neural networks. These problems hold substantial significance to many areas such as manufacturing, material science, and many others. In the current text we demonstrate that knowledge of the physical systems of interest can be combined with effective data preprocessing and neural network training to achieve prediction effectiveness which is greater than the sum of its parts. In particular, we study the challenging problem of range estimation from the measurement of electromagnetic scattering of radio waves reflected off the surface of the ocean and the atmosphere. Our key finding is a that good performance can only be achieved by combining physical principles with careful data preprocessing and network training.","PeriodicalId":6750,"journal":{"name":"2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"46 1","pages":"1537-1544"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79090937","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kristina Kirsten, Bjarne Pfitzner, Lando Löper, B. Arnrich
{"title":"Sensor-Based Obsessive-Compulsive Disorder Detection With Personalised Federated Learning","authors":"Kristina Kirsten, Bjarne Pfitzner, Lando Löper, B. Arnrich","doi":"10.1109/ICMLA52953.2021.00058","DOIUrl":"https://doi.org/10.1109/ICMLA52953.2021.00058","url":null,"abstract":"The mental illness Obsessive-Compulsive Disorder (OCD) is characterised by obsessive thoughts and compulsive actions. The latter can occur as repetitive activities to ensure that severe fears do not come true. A diagnosis of the disease is usually very late due to a lack of knowledge and shame of the patient. Nevertheless, early detection can significantly increase the success of therapy.With the development of new wearable sensors, it is possible to recognise human activities. Accordingly, wearables can also be used to identify recurring activities that indicate an OCD. Through this form of an automatic detection system, a diagnosis can be made earlier and thus therapy can be started sooner.Since compulsive behaviour is very individual and varies from patient to patient, this paper deals with personalised federated machine learning models. We first adapt the publicly available OPPORTUNITY dataset to simulate OCD behaviour. Secondly, we evaluate two existing personalised federated learning algorithms against baseline approaches. Finally, we propose a hybrid approach that merges the two evaluated algorithms and reaches a mean area under the precision-recall curve (AUPRC) of 0.954 across clients.","PeriodicalId":6750,"journal":{"name":"2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"33 1","pages":"333-339"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79207123","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Graph Convolutional Networks for Categorizing Online Harassment on Twitter","authors":"M. Saeidi, E. Milios, N. Zeh","doi":"10.1109/ICMLA52953.2021.00156","DOIUrl":"https://doi.org/10.1109/ICMLA52953.2021.00156","url":null,"abstract":"Twitter is one of the social media platforms that people express themselves freely. Harassment is one consequence of these such platforms, which is hard to obstruct. Text categorization and classification is a task that aims to solve this problem. Several studies applied classical machine learning methods and recent deep neural networks to categorize the text. However, only a few studies have explored graph convolutional neural networks while using classical approaches to categorize harassment Tweets. In this work, we propose using graph convolutional networks (GCN) for tweet categorization. Second, we explore this categorization task using classical machine learning approaches and compare the results with the GCN model. Third, we show the effectiveness of the GCN model on this problem by the other evaluation of the model on fewer sample datasets. In addition, we used different embedding approaches to find the best representation for the dataset in each of the models and represent the best embedding approach to use in this problem.","PeriodicalId":6750,"journal":{"name":"2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"65 1","pages":"946-951"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84759721","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}