{"title":"A Survey on GAN Techniques for Data Augmentation to Address the Imbalanced Data Issues in Credit Card Fraud Detection","authors":"Emilija Strelcenia, S. Prakoonwit","doi":"10.3390/make5010019","DOIUrl":"https://doi.org/10.3390/make5010019","url":null,"abstract":"Data augmentation is an important procedure in deep learning. GAN-based data augmentation can be utilized in many domains. For instance, in the credit card fraud domain, the imbalanced dataset problem is a major one as the number of credit card fraud cases is in the minority compared to legal payments. On the other hand, generative techniques are considered effective ways to rebalance the imbalanced class issue, as these techniques balance both minority and majority classes before the training. In a more recent period, Generative Adversarial Networks (GANs) are considered one of the most popular data generative techniques as they are used in big data settings. This research aims to present a survey on data augmentation using various GAN variants in the credit card fraud detection domain. In this survey, we offer a comprehensive summary of several peer-reviewed research papers on GAN synthetic generation techniques for fraud detection in the financial sector. In addition, this survey includes various solutions proposed by different researchers to balance imbalanced classes. In the end, this work concludes by pointing out the limitations of the most recent research articles and future research issues, and proposes solutions to address these problems.","PeriodicalId":93033,"journal":{"name":"Machine learning and knowledge extraction","volume":"49 6 1","pages":"304-329"},"PeriodicalIF":0.0,"publicationDate":"2023-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77605747","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Painting the Black Box White: Experimental Findings from Applying XAI to an ECG Reading Setting","authors":"Federico Cabitza, Andrea Campagner, Chiara Natali, Enea Parimbelli, Luca Ronzio, Matteo Cameli","doi":"10.3390/make5010017","DOIUrl":"https://doi.org/10.3390/make5010017","url":null,"abstract":"The emergence of black-box, subsymbolic, and statistical AI systems has motivated a rapid increase in the interest regarding explainable AI (XAI), which encompasses both inherently explainable techniques, as well as approaches to make black-box AI systems explainable to human decision makers. Rather than always making black boxes transparent, these approaches are at risk of painting the black boxes white, thus failing to provide a level of transparency that would increase the system’s usability and comprehensibility, or even at risk of generating new errors (i.e., white-box paradox). To address these usability-related issues, in this work we focus on the cognitive dimension of users’ perception of explanations and XAI systems. We investigated these perceptions in light of their relationship with users’ characteristics (e.g., expertise) through a questionnaire-based user study involved 44 cardiology residents and specialists in an AI-supported ECG reading task. Our results point to the relevance and correlation of the dimensions of trust, perceived quality of explanations, and tendency to defer the decision process to automation (i.e., technology dominance). This contribution calls for the evaluation of AI-based support systems from a human–AI interaction-oriented perspective, laying the ground for further investigation of XAI and its effects on decision making and user experience.","PeriodicalId":93033,"journal":{"name":"Machine learning and knowledge extraction","volume":"84 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136179174","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hassan Noroznia, M. Gandomkar, J. Nikoukar, A. Aranizadeh, Mirpouya Mirmozaffari
{"title":"A Novel Pipeline Age Evaluation: Considering Overall Condition Index and Neural Network Based on Measured Data","authors":"Hassan Noroznia, M. Gandomkar, J. Nikoukar, A. Aranizadeh, Mirpouya Mirmozaffari","doi":"10.3390/make5010016","DOIUrl":"https://doi.org/10.3390/make5010016","url":null,"abstract":"Today, the chemical corrosion of metals is one of the main problems of large productions, especially in the oil and gas industries. Due to massive downtime connected to corrosion failures, pipeline corrosion is a central issue in many oil and gas industries. Therefore, the determination of the corrosion progress of oil and gas pipelines is crucial for monitoring the reliability and alleviation of failures that can positively impact health, safety, and the environment. Gas transmission and distribution pipes and other structures buried (or immersed) in an electrolyte, by the existing conditions and due to the metallurgical structure, are corroded. After some time, this disrupts an active system and process by causing damage. The worst corrosion for metals implanted in the soil is in areas where electrical currents are lost. Therefore, cathodic protection (CP) is the most effective method to prevent the corrosion of structures buried in the soil. Our aim in this paper is first to investigate the effect of stray currents on failure rate using the condition index, and then to estimate the remaining useful life of CP gas pipelines using an artificial neural network (ANN). Predicting future values using previous data based on the time series feature is also possible. Therefore, this paper first uses the general equipment condition monitoring method to detect failures. The time series model of data is then measured and operated by neural networks. Finally, the amount of failure over time is determined.","PeriodicalId":93033,"journal":{"name":"Machine learning and knowledge extraction","volume":"15 1","pages":"252-268"},"PeriodicalIF":0.0,"publicationDate":"2023-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80413021","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Jensen, T. Stellingwerff, C. Pollock, J. Wakeling, M. Klimstra
{"title":"Can Principal Component Analysis Be Used to Explore the Relationship of Rowing Kinematics and Force Production in Elite Rowers during a Step Test? A Pilot Study","authors":"M. Jensen, T. Stellingwerff, C. Pollock, J. Wakeling, M. Klimstra","doi":"10.3390/make5010015","DOIUrl":"https://doi.org/10.3390/make5010015","url":null,"abstract":"Investigating the relationship between the movement patterns of multiple limb segments during the rowing stroke on the resulting force production in elite rowers can provide foundational insight into optimal technique. It can also highlight potential mechanisms of injury and performance improvement. The purpose of this study was to conduct a kinematic analysis of the rowing stroke together with force production during a step test in elite national-team heavyweight men to evaluate the fundamental patterns that contribute to expert performance. Twelve elite heavyweight male rowers performed a step test on a row-perfect sliding ergometer [5 × 1 min with 1 min rest at set stroke rates (20, 24, 28, 32, 36)]. Joint angle displacement and velocity of the hip, knee and elbow were measured with electrogoniometers, and force was measured with a tension/compression force transducer in line with the handle. To explore interactions between kinematic patterns and stroke performance variables, joint angular velocities of the hip, knee and elbow were entered into principal component analysis (PCA) and separate ANCOVAs were run for each performance variable (peak force, impulse, split time) with dependent variables, and the kinematic loading scores (Kpc,ls) as covariates with athlete/stroke rate as fixed factors. The results suggested that rowers’ kinematic patterns respond differently across varying stroke rates. The first seven PCs accounted for 79.5% (PC1 [26.4%], PC2 [14.6%], PC3 [11.3%], PC4 [8.4%], PC5 [7.5%], PC6 [6.5%], PC7 [4.8%]) of the variances in the signal. The PCs contributing significantly (p ≤ 0.05) to performance metrics based on PC loading scores from an ANCOVA were (PC1, PC2, PC6) for split time, (PC3, PC4, PC5, PC6) for impulse, and (PC1, PC6, PC7) for peak force. The significant PCs for each performance measure were used to reconstruct the kinematic patterns for split time, impulse and peak force separately. Overall, PCA was able to differentiate between rowers and stroke rates, and revealed features of the rowing-stroke technique correlated with measures of performance that may highlight meaningful technique-optimization strategies. PCA could be used to provide insight into differences in kinematic strategies that could result in suboptimal performance, potential asymmetries or to determine how well a desired technique change has been accomplished by group and/or individual athletes.","PeriodicalId":93033,"journal":{"name":"Machine learning and knowledge extraction","volume":"51 1","pages":"237-251"},"PeriodicalIF":0.0,"publicationDate":"2023-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85798624","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"InvMap and Witness Simplicial Variational Auto-Encoders","authors":"Aniss Aiman Medbouhi, Vladislav Polianskii, Anastasia Varava, Danica Kragic","doi":"10.3390/make5010014","DOIUrl":"https://doi.org/10.3390/make5010014","url":null,"abstract":"Variational auto-encoders (VAEs) are deep generative models used for unsupervised learning, however their standard version is not topology-aware in practice since the data topology may not be taken into consideration. In this paper, we propose two different approaches with the aim to preserve the topological structure between the input space and the latent representation of a VAE. Firstly, we introduce InvMap-VAE as a way to turn any dimensionality reduction technique, given an embedding it produces, into a generative model within a VAE framework providing an inverse mapping into original space. Secondly, we propose the Witness Simplicial VAE as an extension of the simplicial auto-encoder to the variational setup using a witness complex for computing the simplicial regularization, and we motivate this method theoretically using tools from algebraic topology. The Witness Simplicial VAE is independent of any dimensionality reduction technique and together with its extension, Isolandmarks Witness Simplicial VAE, preserves the persistent Betti numbers of a dataset better than a standard VAE.","PeriodicalId":93033,"journal":{"name":"Machine learning and knowledge extraction","volume":"101 1","pages":"199-236"},"PeriodicalIF":0.0,"publicationDate":"2023-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79033237","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
O. E. Santangelo, V. Gentile, Stefano Pizzo, D. Giordano, F. Cedrone
{"title":"Machine Learning and Prediction of Infectious Diseases: A Systematic Review","authors":"O. E. Santangelo, V. Gentile, Stefano Pizzo, D. Giordano, F. Cedrone","doi":"10.3390/make5010013","DOIUrl":"https://doi.org/10.3390/make5010013","url":null,"abstract":"The aim of the study is to show whether it is possible to predict infectious disease outbreaks early, by using machine learning. This study was carried out following the guidelines of the Cochrane Collaboration and the meta-analysis of observational studies in epidemiology and the preferred reporting items for systematic reviews and meta-analyses. The suitable bibliography on PubMed/Medline and Scopus was searched by combining text, words, and titles on medical topics. At the end of the search, this systematic review contained 75 records. The studies analyzed in this systematic review demonstrate that it is possible to predict the incidence and trends of some infectious diseases; by combining several techniques and types of machine learning, it is possible to obtain accurate and plausible results.","PeriodicalId":93033,"journal":{"name":"Machine learning and knowledge extraction","volume":"3 1","pages":"175-198"},"PeriodicalIF":0.0,"publicationDate":"2023-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75159337","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Special Issue \"Selected Papers from CD-MAKE 2020 and ARES 2020\"","authors":"E. Weippl, A. Holzinger, Peter Kieseberg","doi":"10.3390/make5010012","DOIUrl":"https://doi.org/10.3390/make5010012","url":null,"abstract":"In the current era of rapid technological advancement, machine learning (ML) is quickly becoming a dominant force in the development of smart environments [...]","PeriodicalId":93033,"journal":{"name":"Machine learning and knowledge extraction","volume":"69 1","pages":"173-174"},"PeriodicalIF":0.0,"publicationDate":"2023-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81389327","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Acknowledgment to the Reviewers of Machine Learning and Knowledge Extraction in 2022","authors":"","doi":"10.3390/make5010011","DOIUrl":"https://doi.org/10.3390/make5010011","url":null,"abstract":"High-quality academic publishing is built on rigorous peer review [...]","PeriodicalId":93033,"journal":{"name":"Machine learning and knowledge extraction","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135436114","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Explainable Machine Learning","authors":"J. Garcke, R. Roscher","doi":"10.3390/make5010010","DOIUrl":"https://doi.org/10.3390/make5010010","url":null,"abstract":"Machine learning methods are widely used in commercial applications and in many scientific areas [...]","PeriodicalId":93033,"journal":{"name":"Machine learning and knowledge extraction","volume":"88 1","pages":"169-170"},"PeriodicalIF":0.0,"publicationDate":"2023-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83818173","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Adeilson Antonio da Silva, Maurício Pamplona Segundo
{"title":"On Deceiving Malware Classification with Section Injection","authors":"Adeilson Antonio da Silva, Maurício Pamplona Segundo","doi":"10.3390/make5010009","DOIUrl":"https://doi.org/10.3390/make5010009","url":null,"abstract":"We investigate how to modify executable files to deceive malware classification systems. This work’s main contribution is a methodology to inject bytes across a malware file randomly and use it both as an attack to decrease classification accuracy but also as a defensive method, augmenting the data available for training. It respects the operating system file format to make sure the malware will still execute after our injection and will not change its behavior. We reproduced five state-of-the-art malware classification approaches to evaluate our injection scheme: one based on Global Image Descriptor (GIST) + K-Nearest-Neighbors (KNN), three Convolutional Neural Network (CNN) variations and one Gated CNN. We performed our experiments on a public dataset with 9339 malware samples from 25 different families. Our results show that a mere increase of 7% in the malware size causes an accuracy drop between 25% and 40% for malware family classification. They show that an automatic malware classification system may not be as trustworthy as initially reported in the literature. We also evaluate using modified malware alongside the original ones to increase networks robustness against the mentioned attacks. The results show that a combination of reordering malware sections and injecting random data can improve the overall performance of the classification. All the code is publicly available.","PeriodicalId":93033,"journal":{"name":"Machine learning and knowledge extraction","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47668184","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}