InformationPub Date : 2024-07-23DOI: 10.3390/info15080425
Anthony J. Coscia, Andrea Iannacone, Antonio Maci, Alessandro Stamerra
{"title":"SINNER: A Reward-Sensitive Algorithm for Imbalanced Malware Classification Using Neural Networks with Experience Replay","authors":"Anthony J. Coscia, Andrea Iannacone, Antonio Maci, Alessandro Stamerra","doi":"10.3390/info15080425","DOIUrl":"https://doi.org/10.3390/info15080425","url":null,"abstract":"Reports produced by popular malware analysis services showed a disparity in samples available for different malware families. The unequal distribution between such classes can be attributed to several factors, such as technological advances and the application domain that seeks to infect a computer virus. Recent studies have demonstrated the effectiveness of deep learning (DL) algorithms when learning multi-class classification tasks using imbalanced datasets. This can be achieved by updating the learning function such that correct and incorrect predictions performed on the minority class are more rewarded or penalized, respectively. This procedure can be logically implemented by leveraging the deep reinforcement learning (DRL) paradigm through a proper formulation of the Markov decision process (MDP). This paper proposes SINNER, i.e., a DRL-based multi-class classifier that approaches the data imbalance problem at the algorithmic level by exploiting a redesigned reward function, which modifies the traditional MDP model used to learn this task. Based on the experimental results, the proposed formula appears to be successful. In addition, SINNER has been compared to several DL-based models that can handle class skew without relying on data-level techniques. Using three out of four datasets sourced from the existing literature, the proposed model achieved state-of-the-art classification performance.","PeriodicalId":510156,"journal":{"name":"Information","volume":"137 16","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141811140","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
InformationPub Date : 2024-07-23DOI: 10.3390/info15080426
Elias Dritsas, M. Trigka
{"title":"Utilizing Multi-Class Classification Methods for Automated Sleep Disorder Prediction","authors":"Elias Dritsas, M. Trigka","doi":"10.3390/info15080426","DOIUrl":"https://doi.org/10.3390/info15080426","url":null,"abstract":"Even from infancy, a human’s day-life alternates from a period of wakefulness to a period of sleep at night, during the 24-hour cycle. Sleep is a normal process necessary for human physical and mental health. A lack of sleep makes it difficult to control emotions and behaviour, reduces productivity at work, and can even increase stress or depression. In addition, poor sleep affects health; when sleep is insufficient, the chances of developing serious diseases greatly increase. Researchers in sleep medicine have identified an extensive list of sleep disorders, and thus leveraged Artificial Intelligence (AI) to automate their analysis and gain a deeper understanding of sleep patterns and related disorders. In this research, we seek a Machine Learning (ML) solution that will allow for efficient classification of unlabeled instances as being Sleep Apnea, Insomnia or Normal (subjects without a specific sleep disorder) by assessing the performance of two well-established strategies for multi-class classification tasks: the One-Vs-All (OVA) and One-Vs-One (OVO). In the context of the specific strategies, two well-known binary classification models were assumed, Logistic Regression (LR) and Support Vector Machines (SVMs). Both strategies’ validity was verified upon a dataset of diverse information related to the profiles (anthropometric data, sleep metrics, lifestyle and cardiovascular health factors) of potential patients or individuals not exhibiting any specific sleep disorder. Performance evaluation was carried out by comparing the weighted average results in all involved classes that represent these two specific sleep disorders and no-disorder occurrence; accuracy, kappa score, precision, recall, f-measure, and Area Under the ROC curve (AUC) were recorded and compared to identify an effective and robust model and strategy, both class-wise and on average. The experimental evaluation unveiled that after feature selection, 2-degree polynomial SVM under both strategies was the least complex and most efficient, recording an accuracy of 91.44%, a kappa score of 84.97%, precision, recall and f-measure equal to 0.914, and an AUC of 0.927.","PeriodicalId":510156,"journal":{"name":"Information","volume":"16 9","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141810657","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
InformationPub Date : 2024-07-23DOI: 10.3390/info15080427
L. Macken, Vanessa De Wilde, Arda Tezcan
{"title":"Machine Translation for Open Scholarly Communication: Examining the Relationship between Translation Quality and Reading Effort","authors":"L. Macken, Vanessa De Wilde, Arda Tezcan","doi":"10.3390/info15080427","DOIUrl":"https://doi.org/10.3390/info15080427","url":null,"abstract":"This study assesses the usability of machine-translated texts in scholarly communication, using self-paced reading experiments with texts from three scientific disciplines, translated from French into English and vice versa. Thirty-two participants, proficient in the target language, participated. This study uses three machine translation engines (DeepL, ModernMT, OpenNMT), which vary in translation quality. The experiments aim to determine the relationship between translation quality and readers’ reception effort, measured by reading times. The results show that for two disciplines, manual and automatic translation quality measures are significant predictors of reading time. For the most technical discipline, this study could not build models that outperformed the baseline models, which only included participant and text ID as random factors. This study acknowledges the need to include reader-specific features, such as prior knowledge, in future research.","PeriodicalId":510156,"journal":{"name":"Information","volume":"42 12","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141813243","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
InformationPub Date : 2024-07-23DOI: 10.3390/info15080424
Basma Al-Sabah, Gholomreza Anbarjafari
{"title":"Anomaly Detection in Kuwait Construction Market Data Using Autoencoder Neural Networks","authors":"Basma Al-Sabah, Gholomreza Anbarjafari","doi":"10.3390/info15080424","DOIUrl":"https://doi.org/10.3390/info15080424","url":null,"abstract":"In the ambitiously evolving construction industry of Kuwait, characterised by its vision 2035 and rapid technological integration, there exists a pressing need for advanced analytical frameworks. The pressing need for advanced analytical frameworks in the Kuwait Construction Market arises from the necessity to identify inefficiencies, predict market trends, and enhance decision-making processes. For instance, these frameworks can be used to detect anomalies in investment patterns, forecast the impact of economic changes on project timelines, and optimise resource allocation by analysing labour and material supply data. By leveraging deep learning techniques, such as autoencoder neural networks, stakeholders can gain deeper insights into the market’s complexities and improve strategic planning and operational efficiency. This research paper introduces a deep learning approach utilising an autoencoder neural network to analyse the complexities of the Kuwait Construction Market and identify data irregularities. The construction sector’s significant investment influx and project expansion make it an ideal candidate for deploying sophisticated analytical techniques to detect anomalous patterns indicating inefficiencies or unveiling potential opportunities. Our approach leverages the capabilities of autoencoder architectures to delve into and understand the prevalent patterns in market behaviours. This analysis involves training the autoencoder on historical market data to learn the normal patterns and subsequently using it to identify deviations from these learned patterns. This allows for the detection of anomalies that may lead to operational or financial consequences. We elucidate the mathematical foundations of autoencoders, highlighting their proficiency in managing the complex, multidimensional data typical of the construction industry. Through training on an extensive dataset—comprising variables like market sizes, investment distributions, and project completions—our model demonstrates its ability to pinpoint subtle yet significant anomalies. The outcomes of this study enhance our understanding of deep learning’s pivotal role in construction and building management. Empirically, the model detected anomalies in transaction volumes of lands and houses, highlighting unusual spikes that correlate with specific market activities. These findings demonstrate the autoencoder’s effectiveness in anomaly detection, emphasising its importance in enhancing operational efficiency and strategic planning in the construction industry.","PeriodicalId":510156,"journal":{"name":"Information","volume":"20 6","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141813885","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
InformationPub Date : 2024-07-23DOI: 10.3390/info15080423
M. Mobilio, O. Riganelli, D. Micucci, Leonardo Mariani
{"title":"FILO: Automated FIx-LOcus Identification for Android Framework Compatibility Issues","authors":"M. Mobilio, O. Riganelli, D. Micucci, Leonardo Mariani","doi":"10.3390/info15080423","DOIUrl":"https://doi.org/10.3390/info15080423","url":null,"abstract":"Keeping up with the fast evolution of mobile operating systems is challenging for developers, who have to frequently adapt their apps to the upgrades and behavioral changes of the underlying API framework. Those changes often break backward compatibility. The consequence is that apps, if not updated, may misbehave and suffer unexpected crashes if executed within an evolved environment. Being able to quickly identify the portion of the app that should be modified to provide compatibility with new API versions can be challenging. To facilitate the debugging activities of problems caused by backward incompatible upgrades of the operating system, this paper presents FILO, a technique that is able to recommend the method that should be modified to implement the fix by analyzing a single failing execution. FILO can also provide additional information and key symptomatic anomalous events that can help developers understand the reason for the failure, therefore facilitating the implementation of the fix. We evaluated FILO against 18 real compatibility problems related to Android upgrades and compared it with Spectrum-Based Localization approaches. Results show that FILO is able to efficiently and effectively identify the fix-locus in the apps.","PeriodicalId":510156,"journal":{"name":"Information","volume":"4 9","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141814081","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
InformationPub Date : 2024-07-22DOI: 10.3390/info15070421
Najmeh Ziraki, A. Bosaghzadeh, Fadi Dornaika
{"title":"Semi-Supervised Learning for Multi-View Data Classification and Visualization","authors":"Najmeh Ziraki, A. Bosaghzadeh, Fadi Dornaika","doi":"10.3390/info15070421","DOIUrl":"https://doi.org/10.3390/info15070421","url":null,"abstract":"Data visualization has several advantages, such as representing vast amounts of data and visually demonstrating patterns within it. Manifold learning methods help us estimate lower-dimensional representations of data, thereby enabling more effective visualizations. In data analysis, relying on a single view can often lead to misleading conclusions due to its limited perspective. Hence, leveraging multiple views simultaneously and interactively can mitigate this risk and enhance performance by exploiting diverse information sources. Additionally, incorporating different views concurrently during the graph construction process using interactive visualization approach has improved overall performance. In this paper, we introduce a novel algorithm for joint consistent graph construction and label estimation. Our method simultaneously constructs a unified graph and predicts the labels of unlabeled samples. Furthermore, the proposed approach estimates a projection matrix that enables the prediction of labels for unseen samples. Moreover, it incorporates the information in the label space to further enhance the accuracy. In addition, it merges the information in different views along with the labels to construct a consensus graph. Experimental results conducted on various image databases demonstrate the superiority of our fusion approach compared to using a single view or other fusion algorithms. This highlights the effectiveness of leveraging multiple views and simultaneously constructing a unified graph for improved performance in data classification and visualization tasks in semi-supervised contexts.","PeriodicalId":510156,"journal":{"name":"Information","volume":"80 13","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141817774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
InformationPub Date : 2024-07-22DOI: 10.3390/info15070422
Aritz Gorostiza-Cerviño, Álvaro Serna-Ortega, Andrea Moreno-Cabanillas, A. Almansa-Martínez, Antonio Castillo-Esparcia
{"title":"Examining the Roles, Sentiments, and Discourse of European Interest Groups in the Ukrainian War through X (Twitter)","authors":"Aritz Gorostiza-Cerviño, Álvaro Serna-Ortega, Andrea Moreno-Cabanillas, A. Almansa-Martínez, Antonio Castillo-Esparcia","doi":"10.3390/info15070422","DOIUrl":"https://doi.org/10.3390/info15070422","url":null,"abstract":"This research focuses on examining the responses of interest groups listed in the European Transparency Register to the ongoing Russia–Ukraine war. Its aim is to investigate the nuanced reactions of 2579 commercial and business associations and 2957 companies and groups to the recent conflict, as expressed through their X (Twitter) activities. Utilizing advanced text mining and NLP and LDA techniques, this study conducts a comprehensive analysis encompassing language dynamics, thematic shifts, sentiment variations, and activity levels exhibited by these entities both before and after the outbreak of the war. The results obtained reflect a gradual decrease in negative emotions regarding the conflict over time. Likewise, multiple forms of outside lobbying are identified in the communication strategies of interest groups. All in all, this empirical inquiry into how interest groups adapt their messaging in response to complex geopolitical events holds the potential to provide invaluable insights into the multifaceted role of lobbying in shapi ng public policies.","PeriodicalId":510156,"journal":{"name":"Information","volume":"29 13","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141815595","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
InformationPub Date : 2024-07-20DOI: 10.3390/info15070420
Rahmah Alhamyani, Majid Alshammari
{"title":"Machine Learning-Driven Detection of Cross-Site Scripting Attacks","authors":"Rahmah Alhamyani, Majid Alshammari","doi":"10.3390/info15070420","DOIUrl":"https://doi.org/10.3390/info15070420","url":null,"abstract":"The ever-growing web application landscape, fueled by technological advancements, introduces new vulnerabilities to cyberattacks. Cross-site scripting (XSS) attacks pose a significant threat, exploiting the difficulty of distinguishing between benign and malicious scripts within web applications. Traditional detection methods struggle with high false-positive (FP) and false-negative (FN) rates. This research proposes a novel machine learning (ML)-based approach for robust XSS attack detection. We evaluate various models including Random Forest (RF), Logistic Regression (LR), Support Vector Machines (SVMs), Decision Trees (DTs), Extreme Gradient Boosting (XGBoost), Multi-Layer Perceptron (MLP), Convolutional Neural Networks (CNNs), Artificial Neural Networks (ANNs), and ensemble learning. The models are trained on a real-world dataset categorized into benign and malicious traffic, incorporating feature selection methods like Information Gain (IG) and Analysis of Variance (ANOVA) for optimal performance. Our findings reveal exceptional accuracy, with the RF model achieving 99.78% and ensemble models exceeding 99.64%. These results surpass existing methods, demonstrating the effectiveness of the proposed approach in securing web applications while minimizing FPs and FNs. This research offers a significant contribution to the field of web application security by providing a highly accurate and robust ML-based solution for XSS attack detection.","PeriodicalId":510156,"journal":{"name":"Information","volume":"119 50","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141820104","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
InformationPub Date : 2024-07-19DOI: 10.3390/info15070419
Hamed Alshammari, Khaled Elleithy
{"title":"Toward Robust Arabic AI-Generated Text Detection: Tackling Diacritics Challenges","authors":"Hamed Alshammari, Khaled Elleithy","doi":"10.3390/info15070419","DOIUrl":"https://doi.org/10.3390/info15070419","url":null,"abstract":"Current AI detection systems often struggle to distinguish between Arabic human-written text (HWT) and AI-generated text (AIGT) due to the small marks present above and below the Arabic text called diacritics. This study introduces robust Arabic text detection models using Transformer-based pre-trained models, specifically AraELECTRA, AraBERT, XLM-R, and mBERT. Our primary goal is to detect AIGTs in essays and overcome the challenges posed by the diacritics that usually appear in Arabic religious texts. We created several novel datasets with diacritized and non-diacritized texts comprising up to 9666 HWT and AIGT training examples. We aimed to assess the robustness and effectiveness of the detection models on out-of-domain (OOD) datasets to assess their generalizability. Our detection models trained on diacritized examples achieved up to 98.4% accuracy compared to GPTZero’s 62.7% on the AIRABIC benchmark dataset. Our experiments reveal that, while including diacritics in training enhances the recognition of the diacritized HWTs, duplicating examples with and without diacritics is inefficient despite the high accuracy achieved. Applying a dediacritization filter during evaluation significantly improved model performance, achieving optimal performance compared to both GPTZero and the detection models trained on diacritized examples but evaluated without dediacritization. Although our focus was on Arabic due to its writing challenges, our detector architecture is adaptable to any language.","PeriodicalId":510156,"journal":{"name":"Information","volume":" 428","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141823814","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SiamSMN: Siamese Cross-Modality Fusion Network for Object Tracking","authors":"Shuo Han, Lisha Gao, Yue Wu, Tian Wei, Manyu Wang, Xu Cheng","doi":"10.3390/info15070418","DOIUrl":"https://doi.org/10.3390/info15070418","url":null,"abstract":"The existing Siamese trackers have achieved increasingly successful results in visual object tracking. However, the interactive fusion among multi-layer similarity maps after cross-correlation has not been fully studied in previous Siamese network-based methods. To address this issue, we propose a novel Siamese network for visual object tracking, named SiamSMN, which consists of a feature extraction network, a multi-scale fusion module, and a prediction head. First, the feature extraction network is used to extract the features of the template image and the search image, which is calculated by a depth-wise cross-correlation operation to produce multiple similarity feature maps. Second, we propose an effective multi-scale fusion module that can extract global context information for object search and learn the interdependencies between multi-level similarity maps. In addition, to further improve tracking accuracy, we design a learnable prediction head module to generate a boundary point for each side based on the coarse bounding box, which can solve the problem of inconsistent classification and regression during the tracking. Extensive experiments on four public benchmarks demonstrate that the proposed tracker has a competitive performance among other state-of-the-art trackers.","PeriodicalId":510156,"journal":{"name":"Information","volume":"121 35","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141822056","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}