Der Sheng Tan , Humaira Nisar , Kim Ho Yeap , Veerendra Dakulagi , Muhammad Amin
{"title":"Lumbar intervertebral disc detection and classification with novel deep learning models","authors":"Der Sheng Tan , Humaira Nisar , Kim Ho Yeap , Veerendra Dakulagi , Muhammad Amin","doi":"10.1016/j.jksuci.2024.102148","DOIUrl":"10.1016/j.jksuci.2024.102148","url":null,"abstract":"<div><p>Low back pain (LBP) is a prevalent spinal issue, affecting eight out of ten individuals. Notably, lumbar intervertebral disc (IVD) abnormalities frequently contribute to LBP. To diagnose LBP, Magnetic Resonance Imaging (MRI) is crucial for obtaining detailed spinal images. This paper employs deep learning (DL) to detect and locate lumbar IVD in sagittal MR images. It further classifies lumbar IVDs as healthy or herniated, utilizing both novel convolutional neural network (CNN) and conventional CNN models. The dataset utilized comprises MR images from 32 patients, with 10 exhibiting healthy discs and the remaining 22 posing a mix of healthy and herniated discs, totaling 160 lumbar discs, incorporating 112 healthy and 48 herniated discs. In this study, ResNet-50 architecture in the Novel Lumbar IVD detection (NLID) model served as the feature extractor to segment the five lumbar IVDs from MR images. The features extracted from ResNet-50 were input into YOLOv2 for the identification of the region of interest (ROI). The findings indicate that optimal performance was achieved at the 22nd Rectified Linear Unit (ReLU) activation layer, boasting a remarkable 99.59% average precision, 97.22% F1-score, 94.59% precision, and a perfect 100% recall. This commendable performance consistently held above the 85% threshold until the 22nd ReLU activation layer. Regarding imbalanced dataset classification, AlexNet emerged as the frontrunner among other pre-trained networks, boasting the highest test accuracy of 90.63%, and an impressive F1 score of 88.77%. Meanwhile, the Novel Lumbar IVD Classification (NLIC) model achieved superior results with 93.75% test accuracy, and 92.27% F1-score. In the setting of the balanced dataset, NLIC achieved 96.88% test accuracy, and 96.46% F1-score with fewer epochs compared to AlexNet, affirming the robustness of the novel trained-from-scratch network. These findings distinctly underscore the effectiveness of CNNs in both medical image segmentation and classification.</p></div>","PeriodicalId":48547,"journal":{"name":"Journal of King Saud University-Computer and Information Sciences","volume":"36 7","pages":"Article 102148"},"PeriodicalIF":5.2,"publicationDate":"2024-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1319157824002374/pdfft?md5=242b16e1864b249a5d6a3f20dfd70a71&pid=1-s2.0-S1319157824002374-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141961047","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Planning the development of text-to-speech synthesis models and datasets with dynamic deep learning","authors":"Hawraz A. Ahmad , Tarik A. Rashid","doi":"10.1016/j.jksuci.2024.102131","DOIUrl":"10.1016/j.jksuci.2024.102131","url":null,"abstract":"<div><p>Synthesis of Text-to-speech (TTS) is a process that involves translating a natural language text into a speech. Speech synthesisers face a major challenge when recognizing the prosodic elements of written text, such as intonation (the rise and fall of the voice in speaking), and length. In contrast, continuous speech features are influenced by the personality and emotions of the artist. A database is maintained to store the synthesized speech pieces. Its output is determined by how similar the person utters the words and how capable they are of being implied. In the past few years, the field of text-to-speech synthesis has been heavily impacted by the emergence of deep learning, an AI technology that has gained widespread popularity. This review paper presents a taxonomy of models and architectures that are based on deep learning and discusses the various datasets that are utilised in the TTS process. It also covers the evaluation matrices that are commonly used. The paper ends with a look at the future directions of the system and reaches to some Deep learning models that give promising results in this field.</p></div>","PeriodicalId":48547,"journal":{"name":"Journal of King Saud University-Computer and Information Sciences","volume":"36 7","pages":"Article 102131"},"PeriodicalIF":5.2,"publicationDate":"2024-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1319157824002209/pdfft?md5=73c94f11cbc25ec7eb6841c1af93654a&pid=1-s2.0-S1319157824002209-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141844302","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hongkui Jiang , Qiupu Chen , Rujing Wang , Jianming Du , Tianjiao Chen
{"title":"SWFormer: A scale-wise hybrid CNN-Transformer network for multi-classes weed segmentation","authors":"Hongkui Jiang , Qiupu Chen , Rujing Wang , Jianming Du , Tianjiao Chen","doi":"10.1016/j.jksuci.2024.102144","DOIUrl":"10.1016/j.jksuci.2024.102144","url":null,"abstract":"<div><p>Weeds in rapeseed field are an important factor in crop yield reduction and economic loss. Thus, Precision Agriculture is an important task for sustainable agriculture and weed management. At present, deep learning techniques have shown great potential for image-based detection and classification in various crops and weeds. However, the inherent limitations of traditional convolutional neural networks pose significant challenges due to the locally similarity of weeds and crops in color, shape and texture. To address this issue, we introduce SWFormer, a scale-wise hybrid CNN-Transformer network. SWFormer leverages the distinct strengths of both convolutional and transformer architectures. Convolutional structures excel at extracting short-range dependency information among pixels, whereas transformer structures are adept at capturing global dependency relationships. Additionally, we propose two innovative modules. Firstly, the Scale-wise Cascade Convolution (SWCC) module is designed to capture multiscale features and expand the receptive field. Secondly, the Adaptive Semantic Aggregation (ASA) module facilitates adaptive and effective information fusion across two distinct feature maps. Our experiments were conducted on the publicly available cropandweed dataset and SB20 dataset. it yields improved performance over other mainstream segmentation models. Specifically, SWFormer with 52.33M/527.51GFLOPs achieves an mAP of 76.54% and an accuracy of 83.95% on the cropandweed dataset. For the SB20 dataset, it attains an mAP of 61.24% and an accuracy of 79.47%. Overall, the evaluation clearly demonstrates our proposed SWFormer is conducive to promoting further research in the area of Precision Agriculture.</p></div>","PeriodicalId":48547,"journal":{"name":"Journal of King Saud University-Computer and Information Sciences","volume":"36 7","pages":"Article 102144"},"PeriodicalIF":5.2,"publicationDate":"2024-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1319157824002337/pdfft?md5=279cbd7e6876b807bb7098b77b2e40a6&pid=1-s2.0-S1319157824002337-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141846032","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Diverse representation-guided graph learning for multi-view metric clustering","authors":"Xiaoshuang Sang, Yang Zou, Feng Li, Ranran He","doi":"10.1016/j.jksuci.2024.102129","DOIUrl":"10.1016/j.jksuci.2024.102129","url":null,"abstract":"<div><p>Multi-view graph clustering has garnered tremendous interest for its capability to effectively segregate data by harnessing information from multiple graphs representing distinct views. Despite the advances, conventional methods commonly construct similarity graphs straightway from raw features, leading to suboptimal outcomes due to noise or outliers. To address this, latent representation-based graph clustering has emerged. However, it often hypothesizes that multiple views share a fixed-dimensional coefficient matrix, potentially resulting in useful information loss and limited representation capabilities. Additionally, many methods exploit Euclidean distance as a similarity metric, which may inaccurately measure linear relationships between samples. To tackle these challenges, we develop a novel diverse representation-guided graph learning for multi-view metric clustering (DRGMMC). Concretely, raw sample matrix from each view is first projected into diverse latent spaces to capture comprehensive knowledge. Subsequently, a popular metric is leveraged to adaptively learn similarity graphs with linearity-aware based on attained coefficient matrices. Furthermore, a self-weighted fusion strategy and Laplacian rank constraint are introduced to output clustering results directly. Consequently, our model merges diverse representation learning, metric learning, consensus graph learning, and data clustering into a joint model, reinforcing each other for holistic optimization. Substantial experimental findings substantiate that DRGMMC outperforms most advanced graph clustering techniques.</p></div>","PeriodicalId":48547,"journal":{"name":"Journal of King Saud University-Computer and Information Sciences","volume":"36 7","pages":"Article 102129"},"PeriodicalIF":5.2,"publicationDate":"2024-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1319157824002180/pdfft?md5=7f0dd8a20b2ca00d3561c9fb487ffc79&pid=1-s2.0-S1319157824002180-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141838749","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Energy-efficient routing protocols for UWSNs: A comprehensive review of taxonomy, challenges, opportunities, future research directions, and machine learning perspectives","authors":"Sajid Ullah Khan , Zahid Ulalh Khan , Mohammed Alkhowaiter , Javed Khan , Shahid Ullah","doi":"10.1016/j.jksuci.2024.102128","DOIUrl":"10.1016/j.jksuci.2024.102128","url":null,"abstract":"<div><p>Underwater Wireless Sensor Networks (UWSNs) are essential for a number of environmental and oceanographic monitoring applications. However, they face different and more complex challenges than terrestrial wireless sensor networks (TWSNs). The main challenges faced by UWSNs are limited include high propagation delays, poor bandwidth, low throughput, and high energy consumption. Replacing sensor batteries in such networks becomes extremely difficult as they are usually deployed in remote areas where limited human interaction is possible. The unbalanced and inefficient usage of energy by various network nodes poses another issue, as it may reduce the applicability and feasibility of the network. Therefore, proposing Energy-Efficient Routing Protocols (E-ER-Ps) is crucial to improve the performance and lifespan of these networks. Due to the challenges mentioned earlier, this research presents an extensive analysis of several different E-ER-Ps intended for UWSNs. We compare contemporary approaches that use machine learning (ML) with conventional protocols, as ML-based approaches have shown significant potential in resolving the intricate challenges faced by UWSNs. This paper aims to present a critical review of different E-ER-Ps from various prospects for UWSNs. To better comprehend the structure and uses of these protocols, we provide an innovative taxonomy for their classification. While ML-based protocols are evaluated for their flexibility, predictive power, and overall efficiency advancements, traditional protocols are evaluated based on their routing tactics and energy-efficiency improvements. A thorough comparative analysis highlights the advantages, disadvantages, and possible uses for different protocols. Furthermore, a critical analysis of ML’s function, incorporating intelligent and adaptive routing approaches, is presented, highlighting the technology’s potential to completely alter UWSN management. To formulate and implement E-ER-Ps for UWSNs, the article concludes by highlighting the present obstacles, including the need for real-time flexibility, resilience to environmental alters, and interaction with pre-existing network infrastructures. The development of ML-based approaches, hybrid approaches that combine conventional and ML-based methodologies, and the design of protocols that can adapt dynamically to the changing circumstances of underwater habitats are highlighted as future research objectives. This research provides the foundation for future advancements in this crucial field by presenting a comprehensive overview of the state-of-the-art UWSN E-ER-Ps.</p></div>","PeriodicalId":48547,"journal":{"name":"Journal of King Saud University-Computer and Information Sciences","volume":"36 7","pages":"Article 102128"},"PeriodicalIF":5.2,"publicationDate":"2024-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1319157824002179/pdfft?md5=0ca24269fca8e21ff16074a33686ceaa&pid=1-s2.0-S1319157824002179-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141846029","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yameng Tu, Jianbin Wu, Liang Lu, Shuaikang Gao, MingHao Li
{"title":"Face forgery video detection based on expression key sequences","authors":"Yameng Tu, Jianbin Wu, Liang Lu, Shuaikang Gao, MingHao Li","doi":"10.1016/j.jksuci.2024.102142","DOIUrl":"10.1016/j.jksuci.2024.102142","url":null,"abstract":"<div><p>In order to minimize additional computational costs in detecting forged videos, and enhance detection accuracy, this paper employs dynamic facial expression sequences as key sequences, replacing original video sequences as inputs for the detection model. A spatio-temporal dual-branch detection network is designed based on the visual Transformer architecture. Specifically, this process involves three steps. Firstly, dynamic facial expression sequences are localized as key sequences using optical flow difference algorithms. Subsequently, the spatial branch network employs the focal self-attention mechanism to focus on dynamic features of expression-relevant regions and uses Factorization Machines to facilitate feature interaction among multiple key sequences. Meanwhile, the temporal branch network concentrates on learning the temporal inconsistency of optical flow differences between adjacent frames. Finally, a binary classification linear SVM combines the Softmax values from the two branch networks to provide the ultimate detection outcome. Experimental results on the Faceforensics++ dataset demonstrate: (a) replacing whole video sequences with facial expression key sequences effectively reduces training and detection time by nearly 80% and 90%, respectively; (b) compared to state-of-the-art methods involving random sequence/frame extraction and key frame extraction based on video compression techniques, the proposed approach in this paper presents a more competitive detection accuracy.</p></div>","PeriodicalId":48547,"journal":{"name":"Journal of King Saud University-Computer and Information Sciences","volume":"36 7","pages":"Article 102142"},"PeriodicalIF":5.2,"publicationDate":"2024-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1319157824002313/pdfft?md5=d3161c3d47c3e55bf622551f8213c551&pid=1-s2.0-S1319157824002313-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141845621","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Lightweight citrus leaf disease detection model based on ARMS and cross-domain dynamic attention","authors":"Henghui Mo, Linjing Wei","doi":"10.1016/j.jksuci.2024.102133","DOIUrl":"10.1016/j.jksuci.2024.102133","url":null,"abstract":"<div><p>In citrus cultivation, Anthracnose, Scab, and Greasy Spot significantly impact yield and quality. Facing challenges in detecting small targets against complex orchard backgrounds with uneven lighting and obstructions, existing models suffer from low detection accuracy. This study introduces the YOLOv8n-CDDA citrus leaf disease detection model. The Cross-Domain Dynamic Attention (CDDA) mechanism deconstructs the backbone network’s input feature maps into sections, dynamically assigning spatial and channel attention weights to reconstruct critical information and capture the variations and weak semantic features of disease textures. The proposed Adaptive Random Mix-Cut Splicing (ARMS) image augmentation technique blends diseased leaf images with healthy citrus backgrounds, enhancing the diversity and number of background targets. To reduce computational and memory consumption, the network is streamlined through channel pruning; to compensate for the loss in accuracy from pruning, a teacher–assistant–student network format is used for knowledge distillation, where the student network learns from soft knowledge to improve disease recognition accuracy. Finally, Grad-CAM++ technology generates heatmaps of the detections, facilitating the visualization of effective features and deepening understanding of the model’s focus areas. Experimental results demonstrate that the YOLOv8n-CDDA model achieves an average accuracy of 90.89% in disease detection, with an average recall rate of 81.12%, and a mean Average Precision (mAP50) of 88.36%. Compared to the original YOLO v8n and current mainstream detection models such as YOLOv5s, SSD, and Faster-RCNN, the improvements in average accuracy are respectively 2.95%, 4.78%, 14.22%, and 21.01%; in average recall, 2.36%, 3.09%, 15.74%, and 23.27%; and in mAP50, 2.38%, 3.13%, 13.45%, and 20.91%. After pruning and distillation for lightweight adaptation, the YOLOv8n-CDDA model has a parameter size of 0.8M, requires 4.2 GFLOPs, weighs 2.0 MB, and operates at 45 fps. Compared to YOLOv8n, this represents a reduction of 2.2M in parameters, 3.9 GFLOPs, and 4 MB in model weight, with an increase of 7 fps in speed. This model exhibits exceptional performance in the complex environment of citrus leaf disease detection, providing robust technical support for citrus growth monitoring studies, and offering insights for disease detection in other crops as well.</p></div>","PeriodicalId":48547,"journal":{"name":"Journal of King Saud University-Computer and Information Sciences","volume":"36 7","pages":"Article 102133"},"PeriodicalIF":5.2,"publicationDate":"2024-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1319157824002222/pdfft?md5=d186db7e088c79129786ff12b138ee08&pid=1-s2.0-S1319157824002222-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141841316","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Li Sheng Kong , Muhammed Basheer Jasser , Samuel-Soma M. Ajibade , Ali Wagdy Mohamed
{"title":"A systematic review on software reliability prediction via swarm intelligence algorithms","authors":"Li Sheng Kong , Muhammed Basheer Jasser , Samuel-Soma M. Ajibade , Ali Wagdy Mohamed","doi":"10.1016/j.jksuci.2024.102132","DOIUrl":"10.1016/j.jksuci.2024.102132","url":null,"abstract":"<div><p>The widespread integration of software into all parts of our lives has led to the need for software of higher reliability. Ensuring reliable software usually necessitates some form of formal methods in the early stages of the development process which requires strenuous effort. Hence, researchers in the field of software reliability introduced Software Reliability Growth Models (SRGMs) as a relatively inexpensive approach to software reliability prediction. Conventional parameter estimation methods of SRGMs were ineffective and left more to be desired. Consequently, researchers sought out swarm intelligence to combat its flaws, resulting in significant improvements. While similar surveys exist within the domain, the surveys are broader in scope and do not cover many swarm intelligence algorithms. Moreover, the broader scope has resulted in the occasional omission of information regarding the design for reliability predictions. A more comprehensive survey containing 38 studies and 18 different swarm intelligence algorithms in the domain is presented. Each design proposed by the studies was systematically analyzed where relevant information including the measures used, datasets used, SRGMs used, and the effectiveness of each proposed design, were extracted and organized into tables and taxonomies to be able to identify the current trends within the domain. Some notable findings include the distance-based approach providing a high prediction accuracy and an increasing trend in hybridized variants of swarm intelligence algorithms designs to predict software reliability. Future researchers are encouraged to include Mean Square Error (MSE) or Root MSE as the measures offer the largest sample size for comparison.</p></div>","PeriodicalId":48547,"journal":{"name":"Journal of King Saud University-Computer and Information Sciences","volume":"36 7","pages":"Article 102132"},"PeriodicalIF":5.2,"publicationDate":"2024-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1319157824002210/pdfft?md5=65281963d468eb6753881c759697abc2&pid=1-s2.0-S1319157824002210-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141846519","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Usman Mahmood Malik , Muhammad Awais Javed , Abdulaziz AlMohimeed , Mohammed Alkhathami , Deafallah Alsadie , Abeer Almujalli
{"title":"A many-to-many matching with externalities solution for parallel task offloading in IoT networks","authors":"Usman Mahmood Malik , Muhammad Awais Javed , Abdulaziz AlMohimeed , Mohammed Alkhathami , Deafallah Alsadie , Abeer Almujalli","doi":"10.1016/j.jksuci.2024.102134","DOIUrl":"10.1016/j.jksuci.2024.102134","url":null,"abstract":"<div><p>The efficient and timely execution of tasks is a fundamental challenge in the realm of future Internet of Things (IoT) networks. To address this challenge, fog devices are often deployed close to end devices to facilitate task processing on behalf of IoT nodes. One strategy for improving task computational delay is to employ parallel task offloading, in which tasks are subdivided into subtasks and sent to different fog devices for execution in parallel. However, allocating computational resources to fog nodes and mapping these resources to IoT subtasks is a key challenge in this area. This work models the parallel task offloading problem as a graph-matching problem and utilizes a many-to-many matching technique to achieve a stable mapping of IoT subtasks to fog node resources. Unfortunately, the proposed solution is subject to the problem of externalities due to the dynamic preference profiling of fog nodes. To address this issue, we employ an iterative algorithm to resolve any blocking pairs that may arise. Our results demonstrate that the proposed technique reduces the task latency by 29% as compared to other matching-based techniques available in the literature.</p></div>","PeriodicalId":48547,"journal":{"name":"Journal of King Saud University-Computer and Information Sciences","volume":"36 7","pages":"Article 102134"},"PeriodicalIF":5.2,"publicationDate":"2024-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1319157824002234/pdfft?md5=ca723de57705f68d89bad154b59605a4&pid=1-s2.0-S1319157824002234-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141960831","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An optimized fusion of deep learning models for kidney stone detection from CT images","authors":"Sohaib Asif , Xiaolong Zheng , Yusen Zhu","doi":"10.1016/j.jksuci.2024.102130","DOIUrl":"10.1016/j.jksuci.2024.102130","url":null,"abstract":"<div><p>Accurate diagnosis of kidney disease is crucial, as it is a significant health concern that demands precise identification for effective and appropriate treatment. Deep learning methods are increasingly recognized as valuable tools for disease diagnosis in the biomedical field. However, current models utilizing deep networks often encounter challenges of overfitting and low accuracy, necessitating further refinement for optimal performance. To overcome these challenges, this paper proposes the introduction of two ensemble models designed for kidney stone detection in CT images. The first model, called StackedEnsembleNet, is a two-level deep stack ensemble model that effectively integrates the predictions from four base models: InceptionV3, InceptionResNetV2, MobileNet, and Xception. By leveraging the collective knowledge of these models, StackedEnsembleNet improves the accuracy and reliability of kidney stone detection. The second model PSOWeightedAvgNet, leverages the Particle Swarm Optimization (PSO) algorithm to determine the optimal weights for the weighted average ensemble. Through PSO, this ensemble approach assigns optimized weights to each model during the ensembling process, effectively enhancing the performance by optimizing the combination of their predictions. Experimental results conducted on a large dataset of 1799 CT images demonstrate that both StackedEnsembleNet and PSOWeightedAvgNet outperform the individual base models, achieving high accuracy rates in kidney stone detection. Furthermore, additional experiments on an unseen dataset validate the models’ ability to generalize. The comparison with previous methods confirms the superior performance of the proposed ensemble models. The paper also presents Grad-CAM visualizations and error case analysis to provide insights into the decision-making processes of the models. By overcoming the limitations of existing deep learning models, StackedEnsembleNet and PSOWeightedAvgNet offer a promising approach for accurate kidney stone detection, contributing to improved diagnosis and treatment outcomes in the field of nephrology.</p></div>","PeriodicalId":48547,"journal":{"name":"Journal of King Saud University-Computer and Information Sciences","volume":"36 7","pages":"Article 102130"},"PeriodicalIF":5.2,"publicationDate":"2024-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1319157824002192/pdfft?md5=49b54c2eb6fd0a154e0f96100151eede&pid=1-s2.0-S1319157824002192-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141728833","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}