{"title":"DeepCKID: A Multi-Head Attention-Based Deep Neural Network Model Leveraging Classwise Knowledge to Handle Imbalanced Textual Data","authors":"Amit Kumar Sah , Muhammad Abulaish","doi":"10.1016/j.mlwa.2024.100575","DOIUrl":"10.1016/j.mlwa.2024.100575","url":null,"abstract":"<div><p>This paper presents DeepCKID, a Multi-Head Attention (MHA)-based deep learning model that exploits statistical and semantic knowledge corresponding to documents across different classes in the datasets to improve the model’s ability to detect minority class instances in imbalanced text classification. In this process, corresponding to each document, DeepCKID extracts — (i) word-level statistical and semantic knowledge, namely, class correlation and class similarity corresponding to each word, based on its association with different classes in the dataset, and (ii) class-level knowledge from the document using <span><math><mi>n</mi></math></span>-grams and relation triplets corresponding to classwise keywords present, identified using cosine similarity utilizing Transformers-based Pre-trained Language Models (PLMs). DeepCKID encodes the word-level and class-level features using deep convolutional networks, which can learn meaningful patterns from them. At first, DeepCKID combines the semantically meaningful Sentence-BERT document embeddings and word-level feature matrix to give the final document representation, which it further fuses to the different classwise encoded representations to strengthen feature propagation. DeepCKID then passes the encoded document representation and its different classwise representations through an MHA layer to identify the important features at different positions of the feature subspaces, resulting in a latent dense vector accentuating its association with a particular class. Finally, DeepCKID passes the latent vector to the softmax layer to learn the corresponding class label. We evaluate DeepCKID over six publicly available Amazon reviews datasets using four Transformers-based PLMs. We compare DeepCKID with three approaches and four ablation-like baselines. Our study suggests that in most cases, DeepCKID outperforms all the comparison approaches, including baselines.</p></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"17 ","pages":"Article 100575"},"PeriodicalIF":0.0,"publicationDate":"2024-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666827024000513/pdfft?md5=8efb9f85f258bdd00899e0b78ef5e189&pid=1-s2.0-S2666827024000513-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141716561","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Spatial instability of crash prediction models: A case of scooter crashes","authors":"Tumlumbe Juliana Chengula , Boniphace Kutela , Norris Novat , Hellen Shita , Abdallah Kinero , Reuben Tamakloe , Sarah Kasomi","doi":"10.1016/j.mlwa.2024.100574","DOIUrl":"10.1016/j.mlwa.2024.100574","url":null,"abstract":"<div><p>Scooters have gained widespread popularity in recent years due to their accessibility and affordability, but safety concerns persist due to the vulnerability of riders. Researchers are actively investigating the safety implications associated with scooters, given their relatively new status as transportation options. However, analyzing scooter safety presents a unique challenge due to the complexity of determining safe riding environments. This study presents a comprehensive analysis of scooter crash risk within various buffer zones, utilizing the Extreme Gradient Boosting (XGBoost) machine learning algorithm. The core objective was to unravel the multifaceted factors influencing scooter crashes and assess the predictive model’s performance across different buffers or spatial proximity to crash sites. After evaluating the model’s accuracy, sensitivity, and specificity across buffer distances ranging from 5 ft to 250 ft with the scooter crash as a reference point, a discernible trend emerged: as the buffer distance decreases, the model’s sensitivity increases, although at the expense of accuracy and specificity, which exhibit a gradual decline. Notably, at the widest buffer of 250 ft, the model achieved a high accuracy of 97% and specificity of 99%, but with a lower sensitivity of 31%. Contrastingly, at the closest buffer of 5 ft, sensitivity peaked at 95%, albeit with slightly reduced accuracy and specificity. Feature importance analysis highlighted the most significant predictor across all buffer distances, emphasizing the impact of vehicle interactions on scooter crash likelihood. Explainable Artificial Intelligence through SHAP value analysis provided deeper insights into each feature’s contribution to the predictive model, revealing passenger vehicle types of significantly escalated crash risks. Intriguingly, specific vehicular maneuvers, notably stopping in traffic lanes, alongside the absence of Traffic Control Devices (TCDs), were identified as the major contributors to increased crash occurrences. Road conditions, particularly wet and dry, also emerged as substantial risk factors. Furthermore, the study highlights the significance of road design, where elements like junction types and horizontal alignments – specifically 4 and 5-legged intersections and curves – are closely associated with heightened crash risks. These findings articulate a complex and spatially detailed framework of factors impacting scooter crashes, offering vital insights for urban planning and policymaking.</p></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"17 ","pages":"Article 100574"},"PeriodicalIF":0.0,"publicationDate":"2024-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666827024000501/pdfft?md5=c3afb02a60606c22b0434ac053d3571a&pid=1-s2.0-S2666827024000501-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141630842","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An algorithm for two-dimensional pattern detection by combining Echo State Network-based weak classifiers","authors":"Hiroshi Kage","doi":"10.1016/j.mlwa.2024.100571","DOIUrl":"10.1016/j.mlwa.2024.100571","url":null,"abstract":"<div><p>Pattern detection is one of the essential technologies in computer vision. To solve pattern detection problems, the system needs a vast amount of computational resources. To train a multilayer perceptron or convolutional neural network, the gradient descent method is commonly used. The method consumes computational resources. To reduce the amount of computation, we propose a two-dimensional pattern detection algorithm based on Echo State Network (ESN). The training rule of ESN is based on one-shot ridge regression, which enables us to avoid the gradient descent. ESN is a kind of recurrent neural network (RNN), which is often used to embed temporal signals inside the network, rarely used for the embedding of static patterns. In our prior work (Kage, 2023), we found that static patterns can be embedded in an ESN network by associating the training patterns with its stable states, or attractors. By using the same training procedure as our prior work, we made sure that we can associate each training patch image with the desired output vector. The resulting performance of a single ESN classifier is, however, relatively poor. To overcome this poor performance, we introduced an ensemble learning framework by combining multiple ESN weak classifiers. To evaluate the performance, we used CMU-MIT frontal face images (CMU DB). We trained eleven ESN-based classifiers by using six CMU DB training images and evaluated the performance by using a CMU DB test image. We succeeded in reducing false positives in the CMU DB test image down to 0.0515 %.</p></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"17 ","pages":"Article 100571"},"PeriodicalIF":0.0,"publicationDate":"2024-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666827024000471/pdfft?md5=e2ec9590ba19c5c866410152f0f80ebb&pid=1-s2.0-S2666827024000471-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141637634","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A comprehensive study of auto-encoders for anomaly detection: Efficiency and trade-offs","authors":"Asif Ahmed Neloy , Maxime Turgeon","doi":"10.1016/j.mlwa.2024.100572","DOIUrl":"10.1016/j.mlwa.2024.100572","url":null,"abstract":"<div><p>Unsupervised anomaly detection (UAD) is a diverse research area explored across various application domains. Over time, numerous anomaly detection techniques, including clustering, generative, and variational inference-based methods, are developed to address specific drawbacks and advance state-of-the-art techniques. Deep learning and generative models recently played a significant role in identifying unique challenges and devising advanced approaches. Auto-encoders (AEs) represent one such powerful technique that combines generative and probabilistic variational modeling with deep architecture. Auto-Encoder aims to learn the underlying data distribution to generate consequential sample data. This concept of data generation and the adoption of generative modeling have emerged in extensive research and variations in Auto-Encoder design, particularly in unsupervised representation learning. This study systematically reviews 11 Auto-Encoder architectures categorized into three groups, aiming to differentiate their reconstruction ability, sample generation, latent space visualization, and accuracy in classifying anomalous data using the Fashion-MNIST (FMNIST) and MNIST datasets. Additionally, we closely observed the reproducibility scope under different training parameters. We conducted reproducibility experiments utilizing similar model setups and hyperparameters and attempted to generate comparative results to address the scope of improvements for each Auto-Encoder. We conclude this study by analyzing the experimental results, which guide us in identifying the efficiency and trade-offs among auto-encoders, providing valuable insights into their performance and applicability in unsupervised anomaly detection techniques.</p></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"17 ","pages":"Article 100572"},"PeriodicalIF":0.0,"publicationDate":"2024-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666827024000483/pdfft?md5=deffaabf165a48bed93f11897aaeeb38&pid=1-s2.0-S2666827024000483-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141623474","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"CLRiuS: Contrastive Learning for intrinsically unordered Steel Scrap","authors":"Michael Schäfer , Ulrike Faltings , Björn Glaser","doi":"10.1016/j.mlwa.2024.100573","DOIUrl":"https://doi.org/10.1016/j.mlwa.2024.100573","url":null,"abstract":"<div><p>There has been remarkable progress in the field of Deep Learning and Computer Vision, but there is a lack of freely available labeled data, especially when it comes to data for specific industrial applications. However, large volumes of structured, semi-structured and unstructured data are generated in industrial environments, from which meaningful representations can be learned. The effort required for manual labeling is extremely high and can often only be carried out by domain experts. Self-supervised methods have proven their effectiveness in recent years in a wide variety of areas such as natural language processing or computer vision. In contrast to supervised methods, self-supervised techniques are rarely used in real industrial applications. In this paper, we present a self-supervised contrastive learning approach that outperforms existing supervised approaches on the used scrap dataset. We use different types of augmentations to extract the fine-grained structures that are typical for this type of images of intrinsically unordered items. This extracts a wider range of features and encodes more aspects of the input image. This approach makes it possible to learn characteristics from images that are common for applications in the industry, such as quality control. In addition, we show that this self-supervised learning approach can be successfully applied to scene-like images for classification.</p></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"17 ","pages":"Article 100573"},"PeriodicalIF":0.0,"publicationDate":"2024-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666827024000495/pdfft?md5=18eb4b138c0ed688f7c6e0a6f8c6b4a3&pid=1-s2.0-S2666827024000495-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141606620","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Charles Cao , Feiyi Wang , Lisa Lindley , Zejiang Wang
{"title":"Managing Linux servers with LLM-based AI agents: An empirical evaluation with GPT4","authors":"Charles Cao , Feiyi Wang , Lisa Lindley , Zejiang Wang","doi":"10.1016/j.mlwa.2024.100570","DOIUrl":"https://doi.org/10.1016/j.mlwa.2024.100570","url":null,"abstract":"<div><p>This paper presents an empirical study on the application of Large Language Model (LLM)-based AI agents for automating server management tasks in Linux environments. We aim to evaluate the effectiveness, efficiency, and adaptability of LLM-based AI agents in handling a wide range of server management tasks, and to identify the potential benefits and challenges of employing such agents in real-world scenarios. We present an empirical study where a GPT-based AI agent autonomously executes 150 unique tasks across 9 categories, ranging from file management to editing to program compilations. The agent operates in a Dockerized Linux sandbox, interpreting task descriptions and generating appropriate commands or scripts. Our findings reveal the agent’s proficiency in executing tasks autonomously and adapting to feedback, demonstrating the potential of LLMs in simplifying complex server management for users with varying technical expertise. This study contributes to the understanding of LLM applications in server management scenarios, and paves the foundation for future research in this domain.</p></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"17 ","pages":"Article 100570"},"PeriodicalIF":0.0,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S266682702400046X/pdfft?md5=c84038ecf9feef782cbf788c56d506da&pid=1-s2.0-S266682702400046X-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141606619","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multi-class AUC maximization for imbalanced ordinal multi-stage tropical cyclone intensity change forecast","authors":"Hirotaka Hachiya , Hiroki Yoshida , Udai Shimada , Naonori Ueda","doi":"10.1016/j.mlwa.2024.100569","DOIUrl":"https://doi.org/10.1016/j.mlwa.2024.100569","url":null,"abstract":"<div><p>Intense tropical cyclones (TCs) cause significant damage to human societies. Forecasting the multiple stages of TC intensity changes is considerably crucial yet challenging. This difficulty arises due to imbalanced data distribution and the need for ordinal multi-class classification. While existing classification methods, such as linear discriminant analysis, have been utilized to predict rare rapidly intensifying (RI) stages based on features related TC intensity changes, they are limited to binary classification distinguishing between RI and non-RI stages. In this paper, we introduce a novel methodology to tackle the challenges of imbalanced ordinal multi-class classification. We extend the Area Under the Curve maximization technique with inter-instance/class cross-hinge losses and inter-class distance-based slack variables. The proposed loss function, implemented within a deep learning framework, demonstrates its effectiveness using real sequence data of multi-stage TC intensity changes, including satellite infrared images and environmental variables observed in the western North Pacific.</p></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"17 ","pages":"Article 100569"},"PeriodicalIF":0.0,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666827024000458/pdfft?md5=92b286b0e461b132b43d67cb754aad34&pid=1-s2.0-S2666827024000458-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141594269","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mirza Raquib , Mohammad Amzad Hossain , Md Khairul Islam , Md Sipon Miah
{"title":"VashaNet: An automated system for recognizing handwritten Bangla basic characters using deep convolutional neural network","authors":"Mirza Raquib , Mohammad Amzad Hossain , Md Khairul Islam , Md Sipon Miah","doi":"10.1016/j.mlwa.2024.100568","DOIUrl":"https://doi.org/10.1016/j.mlwa.2024.100568","url":null,"abstract":"<div><p>Automated character recognition is currently highly popular due to its wide range of applications. Bengali handwritten character recognition (BHCR) is an extremely difficult issue because of the nature of the script. Very few handwritten character recognition (HCR) models are capable of accurately classifying all different sorts of Bangla characters. Recently, image recognition, video analytics, and natural language processing have all found great success using convolutional neural network (CNN) due to its ability to extract and classify features in novel ways. In this paper, we introduce a VashaNet model for recognizing Bangla handwritten basic characters. The suggested VashaNet model employs a 26-layer deep convolutional neural network (DCNN) architecture consisting of nine convolutional layers, six max pooling layers, two dropout layers, five batch normalization layers, one flattening layer, two dense layers, and one output layer. The experiment was performed over 2 datasets consisting of a primary dataset of 5750 images, CMATERdb 3.1.2 for the purpose of training and evaluating the model. The suggested character recognition model worked very well, with test accuracy rates of 94.60% for the primary dataset, 94.43% for CMATERdb 3.1.2 dataset. These remarkable outcomes demonstrate that the proposed VashaNet outperforms other existing methods and offers improved suitability in different character recognition tasks. The proposed approach is a viable candidate for the high efficient practical automatic BHCR system. The proposed approach is a more powerful candidate for the development of an automatic BHCR system for use in practical settings.</p></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"17 ","pages":"Article 100568"},"PeriodicalIF":0.0,"publicationDate":"2024-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666827024000446/pdfft?md5=5c72d6c025c7e6abd41097207c352a7c&pid=1-s2.0-S2666827024000446-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141487633","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Word embedding and classification methods and their effects on fake news detection","authors":"Jessica Hauschild , Kent Eskridge","doi":"10.1016/j.mlwa.2024.100566","DOIUrl":"https://doi.org/10.1016/j.mlwa.2024.100566","url":null,"abstract":"<div><p>Natural language processing contains multiple methods of translating written text or spoken words into numerical information called word embeddings. Some of these embedding methods, such as Bag of Words, assume words are independent of one another. Other embedding methods, such as Bidirectional Encoder Representations from Transformers and Word2Vec, capture the relationship between words in various ways. In this paper, we are interested in comparing methods treating words as independent and methods capturing the relationship between words by looking at the effect these methods have on the classification of fake news. Using various classification methods, we compare the word embedding processes based on their effects on accuracy, precision, sensitivity, and specificity.</p></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"17 ","pages":"Article 100566"},"PeriodicalIF":0.0,"publicationDate":"2024-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666827024000422/pdfft?md5=ca2f2864023899f08c1f4e9adba5d1ef&pid=1-s2.0-S2666827024000422-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141487632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Explaining customer churn prediction in telecom industry using tabular machine learning models","authors":"Sumana Sharma Poudel , Suresh Pokharel , Mohan Timilsina","doi":"10.1016/j.mlwa.2024.100567","DOIUrl":"https://doi.org/10.1016/j.mlwa.2024.100567","url":null,"abstract":"<div><p>The study addresses customer churn, a major issue in service-oriented sectors like telecommunications, where it refers to the discontinuation of subscriptions. The research emphasizes the importance of recognizing customer satisfaction for retaining clients, focusing specifically on early churn prediction as a key strategy. Previous approaches mainly used generalized classification techniques for churn prediction but often neglected the aspect of interpretability, vital for decision-making. This study introduces explainer models to address this gap, providing both local and global explanations of churn predictions. Various classification models, including the standout Gradient Boosting Machine (GBM), were used alongside visualization techniques like Shapley Additive Explanations plots and scatter plots for enhanced interpretability. The GBM model demonstrated superior performance with an 81% accuracy rate. A Wilcoxon signed rank test confirmed GBM’s effectiveness over other models, with the <span><math><mi>p</mi></math></span>-value indicating significant performance differences. The study concludes that GBM is notably better for churn prediction, and the employed visualization techniques effectively elucidate key churn factors in the telecommunications sector.</p></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"17 ","pages":"Article 100567"},"PeriodicalIF":0.0,"publicationDate":"2024-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666827024000434/pdfft?md5=18da470f5a20f71eeb29e96078ff9ca6&pid=1-s2.0-S2666827024000434-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141487634","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}