{"title":"OpenClinicalAI: An open and dynamic model for Alzheimer’s Disease diagnosis","authors":"","doi":"10.1016/j.eswa.2024.125528","DOIUrl":"10.1016/j.eswa.2024.125528","url":null,"abstract":"<div><div>Although Alzheimer’s disease (AD) cannot be reversed or cured, timely diagnosis can significantly reduce the burden of treatment and care. Current research on AD diagnosis models usually regards the diagnosis task as a typical classification task with two primary assumptions: (1) All target categories are known a priori; (2) The diagnostic strategy for each patient is consistent, that is, the number and type of model input data for each patient are the same. However, real-world clinical settings are open, with complexity and uncertainty in terms of both subjects and the resources of the medical institutions. This means that diagnostic models may encounter unseen disease categories and need to dynamically develop diagnostic strategies based on the subject’s specific circumstances and available medical resources. Thus, the AD diagnosis task is tangled and coupled with the diagnosis strategy formulation. To promote the application of diagnostic systems in real-world clinical settings, we propose OpenClinicalAI for direct AD diagnosis in complex and uncertain clinical settings. This is the first end-to-end model to dynamically formulate diagnostic strategies and provide diagnostic results based on the subject’s conditions and available medical resources. OpenClinicalAI combines reciprocally coupled deep multi-action reinforcement learning (DMARL) for diagnostic strategy formulation and multicenter meta-learning (MCML) for open-set recognition. The experimental results show that OpenClinicalAI achieves better performance and fewer clinical examinations than the state-of-the-art model. Our method provides an opportunity to embed the AD diagnostic system into the current healthcare system to cooperate with clinicians to improve current healthcare.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":null,"pages":null},"PeriodicalIF":7.5,"publicationDate":"2024-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142538237","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Plagiarism detection of anime character portraits","authors":"","doi":"10.1016/j.eswa.2024.125566","DOIUrl":"10.1016/j.eswa.2024.125566","url":null,"abstract":"<div><div>With the expansion of animation industry, there has been a recurring issue of animation plagiarism. Currently, the technology for copyright protection and plagiarism detection in animation is limited. This paper proposes a framework of plagiarism detection for animation character portraits. A Seq2Seq-based model is proposed for extracting the morphological features of animation portraits. By studying portrait morphological fitting simulation, a significant progress is made in learning facial characteristic of animation portraits. With the fitting mechanism, the proposed model can effectively learn the representative essential features of animation portraits. We also propose several loss functions to enhance recognizability and feature mapping ability between original and pirated animation portraits. By comparing the similarity between the pirated and original versions, plagiarism detection is carried out. Experimental results demonstrate that the proposed model has a good fitting effect and effectively detect the authenticity and piracy of the animation portraits.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":null,"pages":null},"PeriodicalIF":7.5,"publicationDate":"2024-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142538293","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Cascaded capsule twin attentional dilated convolutional network for malicious URL detection","authors":"","doi":"10.1016/j.eswa.2024.125507","DOIUrl":"10.1016/j.eswa.2024.125507","url":null,"abstract":"<div><div>Malware is one of the most popular cyber-attacks, and it is becoming more common on the network every day. In contrast to benign transmission, which typically exhibits symmetrical patterns, malware communication often shows asymmetrical behaviours, making detection a complex challenge. Fortunately, malware can be distinguished and identified for actual activities utilizing a variety of artificial intelligence methods. However, insufficient work has been allocated to the problem of handling high-dimensional and huge data. This paper proposes a novel deep learning-based approach to identify malicious Uniform Resource Locators (URLs) specifically designed to handle the challenges posed by large-scale and complex data. Initially, input data is sourced from a comprehensive Kaggle dataset, which includes diverse and large-scale URL samples. The URLs are then transformed into vector representations using a Vector Embedding Module, which employs a character-level word embedding technique to capture intricate patterns within the URLs. To further refine the data, the Chaotic Kookaburra Efficient-Bo Network (CKEBO-Net) is applied to extract the most significant features from these vectors, effectively reducing the dimensionality and computational burden. Subsequently, the Cascaded Capsule Twin Attentional Dilated Convolutional Network (C<sup>2</sup>TA_DiCN) model is introduced to classify and identify malicious URLs with high precision. This model leverages the unique strengths of capsule networks and attentional mechanisms, enhancing its capability to capture subtle dependencies within the data. Furthermore, the Lyrebird Meta-heuristic Optimization (LMO) algorithm is used to fine-tune the model parameters appropriately, ensuring that the training process is efficient and robust. The proposed approach is implemented using Python and rigorously evaluated on the Kaggle dataset. Simulation results demonstrate that the proposed method significantly outperforms existing models, achieving a malicious URL detection accuracy of 99.7%.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":null,"pages":null},"PeriodicalIF":7.5,"publicationDate":"2024-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142533322","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Artistic-style text detector and a new Movie-Poster dataset","authors":"","doi":"10.1016/j.eswa.2024.125544","DOIUrl":"10.1016/j.eswa.2024.125544","url":null,"abstract":"<div><div>Although current text detection algorithms demonstrate effectiveness in general scenarios, their performance declines when confronted with artistic-style text featuring complex structures. This paper proposes a method that utilizes the Criss-Cross Attention and the residual dense block to address the incomplete and misdiagnosis of artistic-style text detection by current algorithms. Specifically, our method mainly consists of a feature extraction backbone, a Recycle Criss-Cross Attention module, a Residual Feature Pyramid Network, and a Boundary Discrimination Module. The Recycle Criss-Cross Attention module significantly enhances the model’s perceptual capabilities in complex environments by fusing horizontal and vertical contextual information, allowing it to capture detailed features overlooked in artistic-style text. We incorporate the residual dense block into the feature pyramid network to suppress the effect of background noise during feature fusion. Aiming to omit the complex post-processing, we explore a Boundary Discrimination Module that guides the correct generation of boundary proposals. Furthermore, given that movie poster titles often use stylized art fonts, we collected a Movie-Poster dataset to address the scarcity of artistic-style text data. Extensive experiments demonstrate that our proposed method performs superiorly on the Movie-Poster dataset and produces excellent results on multiple benchmark datasets. The code and the Movie-Poster dataset will be available at: <span><span>https://github.com/AXNing/Artistic-style-text-detection</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":null,"pages":null},"PeriodicalIF":7.5,"publicationDate":"2024-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142538235","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"APT: Alarm Prediction Transformer","authors":"","doi":"10.1016/j.eswa.2024.125521","DOIUrl":"10.1016/j.eswa.2024.125521","url":null,"abstract":"<div><div>Distributed control systems (DCS) are essential to operate complex industrial processes. A major part of a DCS is the alarm system, which helps plant operators to keep the processes stable and safe. Alarms are defined as threshold values on individual signals taking into account minimum reaction time of the human operator. In reality, however, alarms are often noisy and overwhelming, and thus can be easily overlooked by the operators. Early alarm prediction can give the operator more time to react and introduce corrective actions to avoid downtime and negative impact on human safety and the environment. In this context, we introduce Alarm Prediction Transformer (APT), a multimodal Transformer-based machine learning model for early alarm prediction based on the combination of recent events and signal data. Specifically, we propose two novel fusion strategies and three methods of label encoding with various levels of granularity. Given a window of several minutes of event logs and signal data, our model predicts whether an alarm is going to be triggered after a few minutes and, if yes, it also predicts its location. Our experiments on two novel real industrial plant data sets and a simulated data set show that the model is capable of predicting alarms with the given horizon and that our proposed fusion technique combining inputs from different modalities, i. e. events and signals, yields more accurate results than any of the modalities alone or conventional fusion techniques.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":null,"pages":null},"PeriodicalIF":7.5,"publicationDate":"2024-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142538240","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An efficient data fusion model based on Bayesian model averaging for robust water quality prediction using deep learning strategies","authors":"","doi":"10.1016/j.eswa.2024.125499","DOIUrl":"10.1016/j.eswa.2024.125499","url":null,"abstract":"<div><div>Accurate monitoring of dissolved oxygen (DO) levels is critical for stakeholders to effectively safeguard water resources and aquatic ecosystem health. This research presents an innovative data fusion framework based on Bayesian model averaging (BMA) by the combination of several neuroscience models (deep learning methodologies) including multilayer perceptron neural network (MLPNN), recurrent neural network (RNN), convolutional neural network (CNN), gated recurrent unit (GRU), long short-term memory (LSTM), and seasonal autoregressive integrated moving average with exogenous variables (SARIMAX). BMA has the capacity to greatly enhance the results attained by standalone approaches. In this study, two feature selection methods such as mutual information (MI) and recursive feature elimination (RFE) applied to select effective predictors and investigate the importance of each input parameters. The techniques were evaluated based on four different metrics and, finally, to demonstrate the usefulness and effectiveness of the newly implemented strategy, it was applied proficiently at the USGS stations, 01427510 and 02336152 within the USA. The findings from analyzing data based on two stations confirmed that the suggested approach (BMA) outperformed other methods such as MLPNN, RNN, CNN, LSTM, and SARIMAX when it came to predicting the levels of DO on a daily basis. In terms of RMSE, MAE and R<sup>2</sup>, BMA yielded 0.272 mg/L, 0.216 mg/L, and 0.975 using MI technique and 0.320 mg/L, 0.261 mg/L, and 0.965 using RFE method at 01427510 USGS station, respectively. Similarly, based on RMSE, MAE and R<sup>2</sup>, BMA produced DO prediction by RMSE = 0.352 mg/L, 0.264 mg/L, and 0.968 by MI approach and RMSE = 0.378 mg/L, 0.282 mg/L, and 0.963 via RFE process at 02336152 USGS station, respectively. After analyzing different combinations of input that obtained from feature selection paradigms, it was observed that the variables with the greatest impact on daily dissolved oxygen levels are the dissolved oxygen levels in the previous time period and the water temperature. On the other hand, the pH and turbidity have minimal influence on the daily DO. Finally, the results of this study confirmed that the BMA tool can be efficiently applied to predict DO concentration based on deep learning model outputs.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":null,"pages":null},"PeriodicalIF":7.5,"publicationDate":"2024-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142438107","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A multiple-input fluid queue model for performance evaluation of fog server in an intelligent vehicular network","authors":"","doi":"10.1016/j.eswa.2024.125538","DOIUrl":"10.1016/j.eswa.2024.125538","url":null,"abstract":"<div><div>Nowadays, there is a growing trend towards fog computing which is a relatively new concept that has been introduced as an extension of cloud computing. It is considered as a promising paradigm for intelligent vehicular networks due to its ability to reduce delays and enhance network efficiency. The utilization of fog computing plays a crucial role in tackling the distinct obstacles encountered in vehicular computing, including issues like real-time data processing, and limitations on bandwidth. By relocating computational resources closer to the network’s edge, it enhances the effectiveness, dependability, and safety of vehicular applications while simultaneously boosting privacy and security measures. In addition, fog offloading has played a pivotal role in fog computing and edge computing structures. Its objective is to efficaciously allocate the data and assign the processing of tasks among edge devices, fog nodes, and cloud resources. Efficient fog offloading strategies are adaptable, scalable, and are able to handle faults, making them indispensable for optimizing these architectures. Although efficiency and quality of service are crucial objectives in a fog computing based intelligent vehicular network environment, performance remains a significant concern that cannot be overlooked. In light of this, to evaluate the efficacy of the probabilistic offloading approach on a fog server, a thorough performance evaluation is necessary. This research proposes a fluid queue approach that accounts for the constant flow of data packets while evaluating a fog server’s performance. For a fog computing-based intelligent vehicular network (FCIVN) with numerous heterogeneous smart vehicles (SVs), we construct a multiple-input fluid queue to model the tasks handed over to the fog server in order to evaluate its performance. In an FCIVN, the arrivals are drawn from various sources at the fog server. Accordingly, in the proposed model the fluid queue is modulated by more than one independent and distinct finite-state birth–death processes (BDPs) which control the variable inflow, and another BDP which controls the variable outflow. We present the details concerning an intermediate fog server’s buffer occupancy distribution within an FCIVN. Further, we assess the performance measures in terms of the offered load, expected buffer-level, average throughput and average latency of the tasks in an FCIVN. Finally, quantitative illustrations are presented to demonstrate the appropriateness of the fluid queue model developed in this study. The results are found to be in consistence with the expected behaviour of these performance indicators.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":null,"pages":null},"PeriodicalIF":7.5,"publicationDate":"2024-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142538295","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Uncertainty quantification driven machine learning for improving model accuracy in imbalanced regression tasks","authors":"","doi":"10.1016/j.eswa.2024.125526","DOIUrl":"10.1016/j.eswa.2024.125526","url":null,"abstract":"<div><div>Several factors are known to determine the quality of machine learning models, one of which is the dataset quality. One problem related to the quality of a dataset is the imbalance issue. An imbalanced dataset contains significantly more data points for certain values of the output variable which increases the overfitting risk and negatively affects the prediction accuracy. In this article, we propose using epistemic uncertainty quantification (UQ) of machine learning models to identify rare samples in imbalanced regression problems for balancing the dataset. The developed algorithm, uncertainty quantification-driven imbalanced regression (UQDIR), is guided by UQ to restructure the training set with an adequate weight function using existent samples, eliminating the need for new data collection. After identifying rare samples with UQ, the algorithm selects a sample from the training set, assigns a resampling weight using the new weight function, and finally resamples the selected sample according to its assigned weight. We test UQDIR on several benchmark datasets and different machine learning algorithms, then compare its performance with similar imbalanced regression methods. A metamaterial design problem application is also provided for demonstrating the effectiveness of the algorithm in real-world scenarios. We show that improving the quality of UQ metrics results in improved model accuracy.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":null,"pages":null},"PeriodicalIF":7.5,"publicationDate":"2024-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142438015","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Stream ETL framework for twitter-based sentiment analysis: Leveraging big data technologies","authors":"","doi":"10.1016/j.eswa.2024.125523","DOIUrl":"10.1016/j.eswa.2024.125523","url":null,"abstract":"<div><div>Twitter has emerged as a rich source of real-time data, providing valuable insights across various domains such as healthcare and business. Sentiment analysis is crucial in understanding public reactions and sentiments expressed on Twitter, empowering organizations to make informed decisions. However, efficiently analyzing sentiment from social media data, presents a challenge for real-time streaming. Conventional Extract-Transform-Load (ETL) methods, inadequate for this challenge, limit their applicability in processing vast volumes of Twitter data. Big data technologies like Kafka, Spark, Hadoop HDFS, Hive, and HBase have become indispensable in addressing this challenge. To address this, we propose a stream ETL framework for Twitter-based sentiment analysis, leveraging Kafka, Spark, Cassandra, HBase, Hive, and HDFS. Our framework enables data stream processing, bias detection and correction, sentiment-based analysis, and visualization of tweets’ geospatial distribution. We present a set of use case studies to illustrate the applicability of the proposed framework comprises of sentiment classification, and bias detection and correction capability. We also present a comparative study to demonstrate the performance of data streaming processing, analysis, and visualization that has been implemented using multiple big data technologies under different parameter settings. Experimental results demonstrate the framework scalability and trade-off factors of the data stream processing pipeline in the execution of big data processing tasks.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":null,"pages":null},"PeriodicalIF":7.5,"publicationDate":"2024-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142538138","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multi-region hierarchical surrogate-assisted quantum-behaved particle swarm optimization for expensive optimization problems","authors":"","doi":"10.1016/j.eswa.2024.125496","DOIUrl":"10.1016/j.eswa.2024.125496","url":null,"abstract":"<div><div>Surrogate-assisted evolutionary algorithms (SAEAs) have been successfully applied to solve computationally expensive optimization problems. However, most SAEAs struggle to achieve good results in solving complex multimodal problems, especially high-dimensional ones. Moreover, for problems with complex landscapes, SAEAs typically require constructing complex global surrogates to model the landscape and performing many iterations to identify the surrogate’s optimum, thereby reducing the efficiency of SAEAs. To deal with these issues, this paper proposes a multi-region hierarchical surrogate-assisted quantum-behaved particle swarm optimization (MHS-QPSO) algorithm for expensive optimization problems. To better balance exploration and exploitation, a search behavior selection strategy is proposed, enabling MHS-QPSO to appropriately switch between global and local searches. For the global search, the search space is divided into multiple regions that can adaptively adjust the size of the areas. A surrogate is constructed in each region, requiring only a small number of QPSO iterations to find the optimum of each surrogate. Furthermore, a novel reliability-based criterion is proposed to screen candidate solutions in different regions for exact evaluations, which can save the number of exact function evaluations and can rapidly improve the fitting accuracy of the surrogates in regions with superior fitness. During local searches, a dynamic boundary adjustment strategy is introduced to guide the QPSO to faster approach the potential optimal region. Experimental results on seven benchmark functions with dimensions from 10 to 100, and on a complex real application, demonstrate that MHS-QPSO significantly outperforms several state-of-the-art algorithms within a limited computational budget. Code for MHS-QPSO is available at <span><span>https://github.com/quanshuzhang/MHS-QPSO.git</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":null,"pages":null},"PeriodicalIF":7.5,"publicationDate":"2024-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142438034","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}