{"title":"Real-Time Nail-Biting Detection on a Smartwatch Using Three CNN Models Pipeline","authors":"Abdullah Alesmaeil, Eftal Şehirli","doi":"10.1111/coin.70020","DOIUrl":"https://doi.org/10.1111/coin.70020","url":null,"abstract":"<div>\u0000 \u0000 <p>Nail-biting (NB) or onychophagia is a compulsive disorder that affects millions of people in both children and adults. It has several health complications and negative social effects. Treatments include surgical interventions, pharmacological medications, or additionally, it can be treated using behavioral modification therapies that utilize positive reinforcement and periodical reminders. Although it is the least invasive, such therapies still depend on manual monitoring and tracking which limits their success. In this work, we propose a novel approach for automatic real-time NB detection and alert on a smartwatch that does not require surgical intervention, medications, or manual habit monitoring. It addresses two key challenges: First, NB actions generate subtle motion patterns at the wrist that lead to a high false-positives (FP) rate even when the hand is not on the face. Second, is the challenge to run power-intensive applications on a power-constrained edge device like a smartwatch. To overcome these challenges, our proposed approach implements a pipeline of three convolutional neural networks (CNN) models instead of a single model. The first two models are small and efficient, designed to detect face-touch (FT) actions and hand movement away (MA) from the face. The third model is a larger and deeper CNN model dedicated to classifying hand actions on the face and detecting NB actions. This separation of tasks addresses the key challenges: decreasing FPs by ensuring NB model is activated only when the hand on the face, and optimizing power usage by ensuring the larger NB model runs only for short periods while the efficient FT model runs most of the time. In addition, this separation of tasks gives more freedom to design, configure, and optimize the three models based on each model task. Lastly, for training the main NB model, this work presents further optimizations including developing NB dataset from start through a dedicated data collection application, applying data augmentation, and utilizing several CNN optimization techniques during training. Results show that the model pipeline approach minimizes FPs significantly compared with the single model for NB detection while improving the overall efficiency.</p>\u0000 </div>","PeriodicalId":55228,"journal":{"name":"Computational Intelligence","volume":"41 1","pages":""},"PeriodicalIF":1.8,"publicationDate":"2025-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143114730","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Novel Deep Learning Based Dual Watermarking System for Securing Healthcare Data","authors":"Kumari Suniti Singh, Harsh Vikram Singh","doi":"10.1111/coin.70011","DOIUrl":"https://doi.org/10.1111/coin.70011","url":null,"abstract":"<div>\u0000 \u0000 <p>The sharing of patient information on an open network has drawn attention to the healthcare system. Security is the primary issue while sharing documents online. Thus, a dual watermarking technique has been developed to improve the security of shared data. The classical watermarking schemes are resilient to many attacks. Protecting the authenticity and copyrights of medical images is essential to prevent duplication, modification, or unauthorized distribution. This paper proposes a robust, novel dual watermarking system for securing healthcare data. Initially, watermarking is performed based on redundant lifting wavelet transform (LWT) and turbo code decomposition for COVID-19 patient images and patient text data. To achieve a high level of authenticity, watermarks in the form of encoded text data and decomposed watermark images are inserted together, and an inverse LWT is used to generate an initial watermarked image. Improve imperceptibility and robustness by incorporating the cover image into the watermarked image. Cross-guided bilateral filtering (CG_BF) improves cover image quality, while the integrated Walsh–Hadamard transform (IWHT) extracts features. The novel adaptive coati optimization (ACO) technique is used to identify the ideal location for the watermarked image in the cover image. To improve security, the watermarked image is dissected using discrete wavelet transform (DWT) and encrypted with a chaotic extended logistic system. Finally, the encrypted watermarked image is implanted in the desired place using a novel deep-learning model based on the Hybrid Convolutional Cascaded Capsule Network (HC<sup>3</sup>Net). Thus, the secured watermarked image is obtained, and the watermark and text data are extracted using the decryption and inverse DWT procedure. The performance of the proposed method is evaluated using accuracy, peak signal-to-noise ratio (PSNR), NC, and other metrics. The proposed method achieved an accuracy of 99.26%, which is greater than the existing methods.</p>\u0000 </div>","PeriodicalId":55228,"journal":{"name":"Computational Intelligence","volume":"41 1","pages":""},"PeriodicalIF":1.8,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143112640","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Automated Recommendation System for Crowdsourcing Data Using Improved Heuristic-Aided Residual Long Short-Term Memory","authors":"K. Dhinakaran, R. Nedunchelian","doi":"10.1111/coin.70017","DOIUrl":"https://doi.org/10.1111/coin.70017","url":null,"abstract":"<div>\u0000 \u0000 <p>In recent years, crowdsourcing has developed into a business production paradigm and a distributed problem-solving platform. However, the conventional machine learning models failed to assist both requesters and workers in finding the proper jobs that affect better quality outputs. The traditional large-scale crowdsourcing systems typically involve a lot of microtasks, and it requires more time for a crowdworker to search a work on this platform. Thus, task suggestion methods are more useful. Yet, the traditional approaches do not consider the cold-start issue. To tackle these issues, in this paper, a new recommendation system for crowdsourcing data is implemented utilizing deep learning. Initially, from the standard online sources, the crowdsourced data are accumulated. The novelty of the model is to propose an adaptive residual long short-term memory (ARes-LSTM) that learns the task's latent factor via the task features rather than the task ID. Here, this network's parameters are optimized by the fitness-based drawer algorithm (F-DA) to improve the efficacy rates. Further, the suggested ARes-LSTM is adopted to detect the user's preference score based on the user's historical behaviors. According to the historical behavior records of the users and task features, the ARes-LSTM provides personalized task recommendations and rectifies the issue of cold-start. From the outcomes, the better accuracy rate of the implemented model is 91.42857. Consequently, the accuracy rate of the traditional techniques such as AOA, TSA, BBRO, and DA is attained as 84.07, 85.42, 87.07, and 90.07. Finally, the simulation of the implemented recommendation system is conducted with various traditional techniques with standard efficiency metrics to show the supremacy of the designed recommendation system. Thus, it is proved that the developed recommendation system for the crowdsourcing data model chooses intended tasks based on individual preferences that can help to enlarge the number of chances to engage in crowdsourcing efforts across a broad range of platforms.</p>\u0000 </div>","PeriodicalId":55228,"journal":{"name":"Computational Intelligence","volume":"41 1","pages":""},"PeriodicalIF":1.8,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143112656","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Real-Time Single Channel Speech Enhancement Using Triple Attention and Stacked Squeeze-TCN","authors":"Chaitanya Jannu, Manaswini Burra, Sunny Dayal Vanambathina, Veeraswamy Parisae","doi":"10.1111/coin.70016","DOIUrl":"https://doi.org/10.1111/coin.70016","url":null,"abstract":"<div>\u0000 \u0000 <p>Speech enhancement is crucial in many speech processing applications. Recently, researchers have been exploring ways to improve performance by effectively capturing the long-term contextual relationships within speech signals. Using multiple stages of learning, where several deep learning modules are activated one after the other, has been shown to be an effective approach. Recently, the attention mechanism has been explored for improving speech quality, showing significant improvements. The attention modules have been developed to improve CNNs backbone network performance. However, these attention modules often use fully connected (FC) and convolution layers, which increase the model's parameter count and computational requirements. The present study employs multi-stage learning within the framework of speech enhancement. The proposed study uses a multi-stage structure in which a sequence of Squeeze temporal convolutional modules (STCM) with twice dilation rates comes after a Triple attention block (TAB) at each stage. An estimate is generated at each phase and refined in the subsequent phase. To reintroduce the original information, a feature fusion module (FFM) is inserted at the beginning of each following phase. In the proposed model, the intermediate output can go through several phases of step-by-step improvement by continually unfolding STCMs, which eventually leads to the precise estimation of the spectrum. A TAB is crafted to enhance the model performance, allowing it to concurrently concentrate on areas of interest in the channel, spatial, and time-frequency dimensions. To be more specific, the CSA has two parallel regions combining channel with spatial attention, enabling both the channel dimension and the spatial dimension to be captured simultaneously. Next, the signal can be emphasized as a function of time and frequency by aggregating the feature maps along these dimensions. This improves its capability to model the temporal dependencies of speech signals. Using the VCTK and Librispeech datasets, the proposed speech enhancement system is assessed against state-of-the-art deep learning techniques and yielded better results in terms of PESQ, STOI, CSIG, CBAK, and COVL.</p>\u0000 </div>","PeriodicalId":55228,"journal":{"name":"Computational Intelligence","volume":"41 1","pages":""},"PeriodicalIF":1.8,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143112659","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multimodal Integration of Mel Spectrograms and Text Transcripts for Enhanced Automatic Speech Recognition: Leveraging Extractive Transformer-Based Approaches and Late Fusion Strategies","authors":"Sunakshi Mehra, Virender Ranga, Ritu Agarwal","doi":"10.1111/coin.70012","DOIUrl":"https://doi.org/10.1111/coin.70012","url":null,"abstract":"<div>\u0000 \u0000 <p>This research endeavor aims to advance the field of Automatic Speech Recognition (ASR) by innovatively integrating multimodal data, specifically textual transcripts and Mel Spectrograms (2D images) obtained from raw audio. This study explores the less-explored potential of spectrograms and linguistic information in enhancing spoken word recognition accuracy. To elevate ASR performance, we propose two distinct transformer-based approaches: First, for the audio-centric approach, we leverage RegNet and ConvNeXt architectures, initially trained on a massive dataset of 14 million annotated images from ImageNet, to process Mel Spectrograms as image inputs. Second, we harness the Speech2Text transformer to decouple text transcript acquisition from raw audio. We pre-process Mel Spectrogram images, resizing them to 224 × 224 pixels to create two-dimensional audio representations. ImageNet, RegNet, and ConvNeXt individually categorize these images. The first channel generates the embeddings for visual modalities (RegNet and ConvNeXt) on 2D Mel Spectrograms. Additionally, we employ Sentence-BERT embeddings via Siamese BERT networks to transform Speech2Text transcripts into vectors. These image embeddings, along with Sentence-BERT embeddings from speech transcription, are subsequently fine-tuned within a deep dense model with five layers and batch normalization for spoken word classification. Our experiments focus on the Google Speech Command Dataset (GSCD) version 2, encompassing 35-word categories. To gauge the impact of spectrograms and linguistic features, we conducted an ablation analysis. Our novel late fusion strategy unites word embeddings and image embeddings, resulting in remarkable test accuracy rates of 95.87% for ConvNeXt, 99.95% for RegNet, and 85.93% for text transcripts across the 35-word categories, as processed by the deep dense layered model with Batch Normalization. We obtained a test accuracy of 99.96% for 35-word categories after using the late fusion of ConvNeXt + RegNet + SBERT, demonstrating superior results compared to other state-of-the-art methods.</p>\u0000 </div>","PeriodicalId":55228,"journal":{"name":"Computational Intelligence","volume":"40 6","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142861814","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SHREA: A Systematic Hybrid Resampling Ensemble Approach Using One Class Classifier","authors":"Pranita Baro, Malaya Dutta Borah","doi":"10.1111/coin.70004","DOIUrl":"https://doi.org/10.1111/coin.70004","url":null,"abstract":"<div>\u0000 \u0000 <p>Imbalanced classification and data incompleteness are two critical issues in machine learning that, despite significant research, are difficult to solve. This paper presents the Systematic Hybrid Resampling Ensemble Approach that deals with the class imbalance and incompleteness of data at a given dataset and improves classification performance. We use an oscillator-guided Factor Based Multiple Imputation Oversampling technique to balance out the minority and majority data samples, while substituting missing values in the dataset. The improved dataset is an oversampled dataset and it goes through random undersample to create majority and minority class subsets. These subsets are then trained with the classifiers using one of the One Class Classifier-based methods, that is, One Class Support Vector Machine or Local Outlier Factor. Lastly, bootstrap aggregation ensemble setups are done using majority and minority class classifiers and combining them to come up with a score-based prediction. To mimic real-life scenarios where data could be missing, we introduce random missing values on each of these imbalance datasets to create <span></span><math>\u0000 <semantics>\u0000 <mrow>\u0000 <mn>3</mn>\u0000 </mrow>\u0000 <annotation>$$ 3 $$</annotation>\u0000 </semantics></math> new sets from each dataset with different missing values, that is, (10%, 20%, and 30%). The proposed method is experimented with using datasets taken from the KEEL website, and the results are compared against RBG, SBG, SBT, DTE, and EUS. Experimental analysis shows that the proposed approach gives better results revealing the efficiency and significance compared to the existing methods. The proposed method Local Outlier Factor Systematic Hybrid Resampling Ensemble Approach improves by 3.46%, 5.30%, 10.51% and 9.26% in terms of Recall, AUC, f-measure and g-mean and One Class Support Vector Machine Systematic Hybrid Resampling Ensemble Approach by 4.82%, 5.95%, 11.03% and 8.80% respectively.</p>\u0000 </div>","PeriodicalId":55228,"journal":{"name":"Computational Intelligence","volume":"40 6","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142861811","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Automated Histopathological Colorectal Cancer Multi-Class Classification System Based on Optimal Image Processing and Prominent Features","authors":"Tasnim Jahan Tonni, Shakil Rana, Kaniz Fatema, Asif Karim, Md. Awlad Hossen Rony, Md. Zahid Hasan, Md. Saddam Hossain Mukta, Sami Azam","doi":"10.1111/coin.70007","DOIUrl":"https://doi.org/10.1111/coin.70007","url":null,"abstract":"<div>\u0000 \u0000 <p>Colorectal cancer (CRC) is characterized by the uncontrollable growth of cancerous cells within the rectal mucosa. In contrast, colon polyps, precancerous growths, can develop into colon cancer, causing symptoms like rectal bleeding, abdominal pain, diarrhea, weight loss, and constipation. It is the leading cause of death worldwide, and this potentially fatal cancer severely afflicts the elderly. Furthermore, early diagnosis is crucial for effective treatment, as it is often more time-consuming and laborious for experts. This study improved the accuracy of CRC multi-class classification compared to previous research utilizing diverse datasets, such as NCT-CRC-HE-100 K (100,000 images) and CRC-VAL-HE-7 K (7,180 images). Initially, we utilized various image processing techniques on the NCT-CRC-HE-100 K dataset to improve image quality and noise-freeness, followed by multiple feature extraction and selection methods to identify prominent features from a large data hub and experimenting with different approaches to select the best classifiers for these critical features. The third ensemble model (XGB-LightGBM-RF) achieved an optimum accuracy of 99.63% with 40 prominent features using univariate feature selection methods. Moreover, the third ensemble model also achieved 99.73% accuracy from the CRC-VAL-HE-7 K dataset. After combining two datasets, the third ensemble model achieved 99.27% accuracy. In addition, we trained and tested our model with two different datasets. We used 80% data from NCT-CRC-HE-100 K and 20% data from CRC-VAL-HE-7 K, respectively, for training and testing purposes, while the third ensemble model obtained 98.43% accuracy in multi-class classification. The results show that this new framework, which was created using the third ensemble model, can help experts figure out what kinds of CRC diseases people are dealing with at the very beginning of an investigation.</p>\u0000 </div>","PeriodicalId":55228,"journal":{"name":"Computational Intelligence","volume":"40 6","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142861764","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Owen Chambers, Robin Cohen, Maura R. Grossman, Liam Hebert, Elias Awad
{"title":"Mining User Study Data to Judge the Merit of a Model for Supporting User-Specific Explanations of AI Systems","authors":"Owen Chambers, Robin Cohen, Maura R. Grossman, Liam Hebert, Elias Awad","doi":"10.1111/coin.70015","DOIUrl":"https://doi.org/10.1111/coin.70015","url":null,"abstract":"<p>In this paper, we present a model for supporting user-specific explanations of AI systems. We then discuss a user study that was conducted to gauge whether the decisions for adjusting output to users with certain characteristics was confirmed to be of value to participants. We focus on the merit of having explanations attuned to particular psychological profiles of users, and the value of having different options for the level of explanation that is offered (including allowing for no explanation, as one possibility). Following the description of the study, we present an approach for mining data from user participant responses in order to determine whether the model that was developed for varying the output to users was well-founded. While our results in this respect are preliminary, we explain how using varied machine learning methods is of value as a concrete step toward validation of specific approaches for AI explanation. We conclude with a discussion of related work and some ideas for new directions with the research, in the future.</p>","PeriodicalId":55228,"journal":{"name":"Computational Intelligence","volume":"40 6","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/coin.70015","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142861637","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xuhui Wang, Yuanyuan Zhu, Fei Wu, Long Gao, Datun Qi, Xiaoyuan Jing, Chong Luo
{"title":"Resan: A Residual Dual-Attention Network for Abnormal Cardiac Activity Detection","authors":"Xuhui Wang, Yuanyuan Zhu, Fei Wu, Long Gao, Datun Qi, Xiaoyuan Jing, Chong Luo","doi":"10.1111/coin.70005","DOIUrl":"https://doi.org/10.1111/coin.70005","url":null,"abstract":"<div>\u0000 \u0000 <p>Cardiovascular disease is one of the leading causes of death worldwide. Early and accurate detection of abnormal cardiac activity can be an effective way to prevent serious cardiovascular events. Electrocardiogram (ECG) and phonocardiogram (PCG) signals provide an objective evaluation of the heart's electrical and acoustic functions, enabling medical professionals to make an accurate diagnosis. Therefore, the cardiologists often use them to make a preliminary diagnosis of abnormal cardiac activity in clinical practice. For this reason, many diagnostic models have been proposed. However, these models fail to utilize the interaction information within and between the signals to aid the diagnosis of disease. To address this issue, we designed a residual dual-attention network (ResAN) for the detection of abnormal cardiac activity using synchronized ECG and PCG signals. First, ResAN uses a feature learning module with two parallel residual networks, for example, ECG-ResNet and PCG-ResNet to automatically learn the deep modal-specific features from the ECG and PCG sequences, respectively. Second, to fully utilize the available information of different modal signals, ResAN uses a dual-attention fusion module to capture the salient features of the integrated ECG and PCG features learned by the feature learning module, as well as the alternating features between them based on the attention mechanisms. Finally, these fused features are merged and fed to the classification module to detect abnormal cardiac activity. Our model achieves an accuracy of 96.1%, surpassing the performances of comparison models by 1.0% to 9.9% when using synchronized ECG and PCG signals. Furthermore, the ablation study confirmed the efficacy of the components in ResAN and also showed that ResAN performs better with synchronized ECG and PCG signals compared to using single-modal signals. Overall, ResAN provides a valid solution for the early detection of abnormal cardiac activity using ECG and PCG signals.</p>\u0000 </div>","PeriodicalId":55228,"journal":{"name":"Computational Intelligence","volume":"40 6","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142861639","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A ViT-Based Adaptive Recurrent Mobilenet With Attention Network for Video Compression and Bit-Rate Reduction Using Improved Heuristic Approach Under Versatile Video Coding","authors":"D. Padmapriya, Ameelia Roseline A","doi":"10.1111/coin.70014","DOIUrl":"https://doi.org/10.1111/coin.70014","url":null,"abstract":"<div>\u0000 \u0000 <p>Video compression received attention from the communities of video processing and deep learning. Modern learning-aided mechanisms use a hybrid coding approach to reduce redundancy in pixel space across time and space, improving motion compensation accuracy. The experiments in video compression have important improvements in past years. The Versatile Video Coding (VVC) is the primary enhancing standard of video compression which is also referred to as H. 226. The VVC codec is a block-assisted hybrid codec, making it highly capable and complex. Video coding effectively compresses data while reducing compression artifacts, enhancing the quality and functionality of AI video technologies. However, the traditional models suffer from the incorrect compression of the motion and ineffective compensation frameworks of the motion leading to compression faults with a minimal trade-off of the rate distortion. This work implements an automated and effective video compression task under VVC using a deep learning approach. Motion estimation is conducted using the Motion Vector (MV) encoder-decoder model to track movements in the video. Based on these MV, the reconstruction of the frame is carried out to compensate for the motions. The residual images are obtained by using Vision Transformer-based Adaptive Recurrent MobileNet with Attention Network (ViT-ARMAN). The parameters optimization of the ViT-ARMAN is done using the Opposition-based Golden Tortoise Beetle Optimizer (OGTBO). Entropy coding is used in the training phase of the developed work to find the bit rate of residual images. Extensive experiments were conducted to demonstrate the effectiveness of the developed deep learning-based method for video compression and bit rate reduction under VVC.</p>\u0000 </div>","PeriodicalId":55228,"journal":{"name":"Computational Intelligence","volume":"40 6","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142860566","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}