{"title":"Boosting Depth Estimation for Self-Driving in a Self-Supervised Framework via Improved Pose Network","authors":"Yazan Dayoub;Andrey V. Savchenko;Ilya Makarov","doi":"10.1109/OJCS.2024.3505876","DOIUrl":"https://doi.org/10.1109/OJCS.2024.3505876","url":null,"abstract":"Depth estimation is a critical component of self-driving vehicles, enabling accurate scene understanding, obstacle detection, and precise localization. Improving the performance of depth estimation networks without increasing computational cost is highly advantageous for autonomous driving systems. In this article, we propose to enhance depth estimation by improving the pose network in a self-supervised framework. Unlike conventional pose networks, our approach preserves more detailed spatial information by integrating multi-scale features and normalized coordinates. This improved spatial awareness allows for more accurate depth predictions. Comprehensive evaluations on the KITTI and Make3D datasets show that our method yields a 2-7% improvement in the absolute relative error (abs_rel) metric. Furthermore, on the KITTI odometry dataset, our approach demonstrates competitive performance, with relative translational error (\u0000<inline-formula><tex-math>$t_{rel}$</tex-math></inline-formula>\u0000) of \u0000<inline-formula><tex-math>$6.11$</tex-math></inline-formula>\u0000 and \u0000<inline-formula><tex-math>$7.21$</tex-math></inline-formula>\u0000, and relative rotational error (\u0000<inline-formula><tex-math>$r_{rel}$</tex-math></inline-formula>\u0000) of \u0000<inline-formula><tex-math>$1.12$</tex-math></inline-formula>\u0000 and \u0000<inline-formula><tex-math>$2.05$</tex-math></inline-formula>\u0000 for sequences 9 and 10, respectively.","PeriodicalId":13205,"journal":{"name":"IEEE Open Journal of the Computer Society","volume":"6 ","pages":"109-118"},"PeriodicalIF":0.0,"publicationDate":"2024-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10767273","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142938079","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"DiReDi: Distillation and Reverse Distillation for AIoT Applications","authors":"Chen Sun;Qiang Tong;Wenshuang Yang;Wenqi Zhang","doi":"10.1109/OJCS.2024.3505195","DOIUrl":"https://doi.org/10.1109/OJCS.2024.3505195","url":null,"abstract":"Artificial Intelligence & Internet of Things (AIoT) have been widely utilized in various application scenarios. Significant efficiency can typically be achieved by deploying different edge-AI models in various real-world scenarios while a few large models manage those edge-AI models remotely from cloud servers. However, customizing edge-AI models for each user's specific application or extending current models to new application scenarios remains a challenge. Inappropriate local training or fine-tuning of edge-AI models by users can lead to model malfunction, potentially resulting in legal issues for the manufacturer. To address the aforementioned issues, this article proposes an innovative framework called “DiReDi”, which involves knowledge \u0000<bold>Di</b>\u0000stillation & \u0000<bold>Re</b>\u0000verse \u0000<bold>Di</b>\u0000stillation. In the initial step, an edge-AI model is trained with presumed data and a knowledge distillation (KD) process using the cloud AI model in the upper management cloud server. This edge-AI model is then dispatched to edge-AI devices solely for inference in the user's application scenario. When the user needs to update the edge-AI model to better fit the actual scenario, two reverse distillation (RD) processes are employed to extract the knowledge – the difference between user preferences and the manufacturer's presumptions from the edge-AI model using the user's exclusive data. Only the extracted knowledge is reported back to the upper management cloud server to update the cloud AI model, thus protecting user privacy by not using any exclusive data. The updated cloud AI can then update the edge-AI model with the extended knowledge. Simulation results demonstrate that the proposed DiReDi framework allows the manufacturer to update the user model by learning new knowledge from the user's actual scenario with private data. The initial redundant knowledge is reduced since the retraining emphasizes user private data. Furthermore, this model update approach via cloud allows manufacture to check model updates ensuring that all models are managed safely and effectively.","PeriodicalId":13205,"journal":{"name":"IEEE Open Journal of the Computer Society","volume":"5 ","pages":"748-760"},"PeriodicalIF":0.0,"publicationDate":"2024-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10766444","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142798053","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Marta Rey-Paredes;Carlos J. Pérez;Alfonso Mateos-Caballero
{"title":"Time Series Classification of Raw Voice Waveforms for Parkinson's Disease Detection Using Generative Adversarial Network-Driven Data Augmentation","authors":"Marta Rey-Paredes;Carlos J. Pérez;Alfonso Mateos-Caballero","doi":"10.1109/OJCS.2024.3504864","DOIUrl":"https://doi.org/10.1109/OJCS.2024.3504864","url":null,"abstract":"Parkinson's disease (PD) is a neurodegenerative disorder that affects more than 10 million people worldwide. Despite its prevalence, the detection of PD remains a complicated task, as no gold standard test has yet been developed to provide an accurate diagnosis. In this context, many recent studies have focused on the automatic detection and progression tracking of PD from voice-related characteristics, being feature engineering the most common approach. This work intends to address an existing research gap by introducing a novel strategy that analyzes raw voice waveforms. Despite recent advancements, one of the significant hurdles is still the lack of extensive and diverse datasets. This article also implements a data augmentation solution. Big Vocoder Slicing Adversarial Network (BigVSAN) is used to generate synthetic voice data that mimics the characteristics of real patients and healthy subjects. For the PD detection task, deep learning models such as ResNet, LSTM-FCN, InceptionTime, and CDIL-CNN are used. The experiments were performed using the speech task of sustained vowel /a/ in the PC-GITA database, which contains the recordings of healthy and PD subjects. CDIL-CNN achieves the best results, improving the accuracy by 15.87% (8.96%) compared to the model that does not use augmented data (from the best method found in the literature that uses voice waveforms). The results of this study indicate that models trained with raw waveforms showcase modest but promising performance, underlying the potential of audio analysis to improve the early detection of PD, providing a non-invasive and potentially remotely applicable method.","PeriodicalId":13205,"journal":{"name":"IEEE Open Journal of the Computer Society","volume":"6 ","pages":"72-84"},"PeriodicalIF":0.0,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10764737","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142938077","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multimodal EEG-fNIRS Seizure Pattern Decoding Using Vision Transformer","authors":"Rafat Damseh;Abdelhadi Hireche;Parikshat Sirpal;Abdelkader Nasreddine Belkacem","doi":"10.1109/OJCS.2024.3500032","DOIUrl":"https://doi.org/10.1109/OJCS.2024.3500032","url":null,"abstract":"Epilepsy has been analyzed through uni-modality non-invasive brain measurements such as electroencephalogram (EEG) signal, but identifying seizure patterns is more challenging due to the non-stationary nature of the brain activity and various non-brain artifacts. In this article, we leverage a vision transformer model (ViT) to classify three types of seizure patterns based on multimodal EEG and functional near-infrared spectroscopy (fNIRS) recordings. We used spectral encoding techniques to capture temporal and spatial relationships for brain signals as feature map inputs to the transformer architecture. We evaluated model performance using the receiver operating characteristic (ROC) curves and the area under the curve (AUC), demonstrating that multimodal EEG-fNIRS signals improved the classification accuracy of seizure patterns. Our work showed that power spectral density (PSD) features often led to better results than features derived from dynamic mode decomposition (DMD), particularly for seizures with high-frequency oscillations (HFO) and generalized spike-and-wave discharge (GSWD) patterns, with an accuracy of 93.14% and 91.69%, respectively. Low-voltage fast activity (LVFA) seizures achieved consistently high performance in EEG, fNIRS, and multimodal EEG-fNIRS setups. Overall, our findings suggest the effectiveness of using the ViT architecture with multimodal brain data accompanied by appropriate spectral features to classify the neural activity of epileptic seizure patterns.","PeriodicalId":13205,"journal":{"name":"IEEE Open Journal of the Computer Society","volume":"5 ","pages":"724-735"},"PeriodicalIF":0.0,"publicationDate":"2024-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10755173","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142713874","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"GHOSTForge: A Scalable Consensus Mechanism for DAG-Based Blockchains","authors":"Misbah Khan;Shabnam Kasra Kermanshahi;Jiankun Hu","doi":"10.1109/OJCS.2024.3497892","DOIUrl":"https://doi.org/10.1109/OJCS.2024.3497892","url":null,"abstract":"Blockchain scalability has long been a critical issue, and Directed Acyclic Graphs (DAGs) offer a promising solution by enabling higher throughput. However, despite their scalability, achieving global convergence or consensus in heterogeneous DAG networks remains a significant challenge. This work, introduces GHOSTForge, building on the Greedy Heaviest-Observed Sub-tree (GHOST) protocol to address these challenges. GHOSTForge incorporates unique coloring and scoring mechanisms alongside stability thresholds and order-locking processes. This protocol addresses the inefficiencies found in existing systems, such as PHANTOM, by offering a more proficient two-level coloring and scoring method that eliminates circular dependencies and enhances scalability. The use of stability thresholds enables the early locking of block orders, reducing computational overhead while maintaining robust security. GHOSTForge's design adapts dynamically to varying network conditions, ensuring quick block order convergence and strong resistance to attacks, such as double-spending. Our experimental results demonstrate that GHOSTForge excels in achieving both computational efficiency and rapid consensus, positioning it as a powerful and scalable solution for decentralized, heterogeneous DAG networks.","PeriodicalId":13205,"journal":{"name":"IEEE Open Journal of the Computer Society","volume":"5 ","pages":"736-747"},"PeriodicalIF":0.0,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10753055","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142736362","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The Metaverse for Industry 5.0 in NextG Communications: Potential Applications and Future Challenges","authors":"Prabadevi Boopathy;Natarajan Deepa;Praveen Kumar Reddy Maddikunta;Nancy Victor;Thippa Reddy Gadekallu;Gokul Yenduri;Wei Wang;Quoc-Viet Pham;Thien Huynh-The;Madhusanka Liyanage","doi":"10.1109/OJCS.2024.3497335","DOIUrl":"https://doi.org/10.1109/OJCS.2024.3497335","url":null,"abstract":"With the advent of new technologies and endeavours for automation in almost all day-to-day activities, the recent discussions on the metaverse life have a greater expectation. The metaverse enables people to communicate with each other by combining the physical world with the virtual world. However, realizing the Metaverse requires symmetric content delivery, low latency, dynamic network control, etc. Industry 5.0 is expected to reform the manufacturing processes through human-robot collaboration and effective utilization of technologies like Artificial intelligence for increased productivity and less maintenance. The metaverse with Industry 5.0 may have tremendous technological integration for a more immersive experience and enhanced productivity. In this review, we present an overview of the metaverse and Industry 5.0, focusing on key technologies that enable the industrial metaverse, including virtual and augmented reality, 3D modeling, artificial intelligence, edge computing, digital twins, blockchain, and 6G communication networks. The article then discusses the metaverse's diverse applications across various Industry 5.0 sectors, such as agriculture, supply chain management, healthcare, education, and transportation, illustrated through several research initiatives. Additionally, the article addresses the challenges of implementing the industrial metaverse, proposes potential solutions, and outlines directions for future research.","PeriodicalId":13205,"journal":{"name":"IEEE Open Journal of the Computer Society","volume":"6 ","pages":"4-24"},"PeriodicalIF":0.0,"publicationDate":"2024-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10752374","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142938073","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Object Re-Identification Based on Federated Incremental Subgradient Proximal Optimization","authors":"Li Kang;Chuanghong Zhao;Jianjun Huang","doi":"10.1109/OJCS.2024.3489875","DOIUrl":"https://doi.org/10.1109/OJCS.2024.3489875","url":null,"abstract":"Object Re-identification (Object ReID) is one of the key tasks in the field of computer vision. However, traditional centralized ReID methods face challenges related to privacy protection and data storage. Federated learning, as a distributed machine learning framework, can utilize dispersed data for model training without sharing raw data, thereby reducing communication costs and ensuring data privacy. However, the real statistical heterogeneity in federated object re-identification leads to domain shift issues, resulting in decreased performance and generalization ability of the ReID model. Therefore, to address the privacy constraints and real statistical heterogeneity in object re-identification, this article focuses on studying the object re-identification method based on the Federated Incremental Subgradient Proximal(FedISP) framework. FedISP effectively alleviates weight divergence and low communication efficiency issues through incremental sub-gradient proximal methods and ring topology, ensuring stable model convergence and efficient communication. Considering the complexity of ReID scenarios, this article adopts a ViT-based task model to cope with feature skew across clients. Additionally, it defines camera federated scenarios and dataset federated scenarios for problem modeling and analysis. Furthermore, due to the heterogeneous classifiers that clients may have, the approach intergrates personalized layers. In the experiments, instance datasets of two federated scenarios were constructed for model training. The final test results show that FedISP can effectively address the privacy protection and statistical heterogeneity issues faced by ReID.","PeriodicalId":13205,"journal":{"name":"IEEE Open Journal of the Computer Society","volume":"6 ","pages":"60-71"},"PeriodicalIF":0.0,"publicationDate":"2024-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10742512","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142938075","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Framework to Design Efficent Blockchain-Based Decentralized Federated Learning Architectures","authors":"Yannis Formery;Leo Mendiboure;Jonathan Villain;Virginie Deniau;Christophe Gransart","doi":"10.1109/OJCS.2024.3488512","DOIUrl":"https://doi.org/10.1109/OJCS.2024.3488512","url":null,"abstract":"Distributed machine learning, and Decentralized Federated Learning in particular, is emerging as an effective solution to cope with the ever-increasing amount of data and the need to process it faster and more reliably. It enables machine learning models to be trained without centralizing user data, which improves data confidentiality and optimizes performance compared with centralized approaches. However, scaling up such systems can have limitations in terms of data and model traceability and security. To address this limitation, the integration of Blockchain has been proposed, forming a global system leveraging Blockchain, called Blockchain Based Decentralized Federated Learning (BDFL), and taking advantage of the benefits of this technology, namely transparency, immutability and decentralization. For the time being, few studies have sought to characterize these BDFL systems, although it seems that they can be broken down into a set of layers (blockchain, interconnection of DFL nodes, client selection, data transmission, consensus management) that could have a major impact on the operation of the BDFL as a whole. The aim of this article is therefore to respond to this limitation by highlighting the different layers existing in the architecture of a BDFL system and the solutions proposed in the literature that can be integrated to optimise both the performance and the security of the system. This could ultimately lead to the design of more secure and efficient architectures with greater resilience to attacks and architectural changes.","PeriodicalId":13205,"journal":{"name":"IEEE Open Journal of the Computer Society","volume":"5 ","pages":"705-723"},"PeriodicalIF":0.0,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10738377","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142694673","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Syed Aun Muhammad Zaidi;Siddique Latif;Junaid Qadir
{"title":"Enhancing Cross-Language Multimodal Emotion Recognition With Dual Attention Transformers","authors":"Syed Aun Muhammad Zaidi;Siddique Latif;Junaid Qadir","doi":"10.1109/OJCS.2024.3486904","DOIUrl":"https://doi.org/10.1109/OJCS.2024.3486904","url":null,"abstract":"Despite the recent progress in emotion recognition, state-of-the-art systems are unable to achieve improved performance in cross-language settings. In this article we propose a Multimodal Dual Attention Transformer (MDAT) model to improve cross-language multimodal emotion recognition. Our model utilises pre-trained models for multimodal feature extraction and is equipped with dual attention mechanisms including graph attention and co-attention to capture complex dependencies across different modalities and languages to achieve improved cross-language multimodal emotion recognition. In addition, our model also exploits a transformer encoder layer for high-level feature representation to improve emotion classification accuracy. This novel construct preserves modality-specific emotional information while enhancing cross-modality and cross-language feature generalisation, resulting in improved performance with minimal target language data. We assess our model's performance on four publicly available emotion recognition datasets and establish its superior effectiveness compared to recent approaches and baseline models.","PeriodicalId":13205,"journal":{"name":"IEEE Open Journal of the Computer Society","volume":"5 ","pages":"684-693"},"PeriodicalIF":0.0,"publicationDate":"2024-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10736634","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142663505","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Auditable, Privacy-Preserving, Transparent Unspent Transaction Output Model for Blockchain-Based Central Bank Digital Currency","authors":"Md. Mainul Islam;Hoh Peter IN","doi":"10.1109/OJCS.2024.3486193","DOIUrl":"https://doi.org/10.1109/OJCS.2024.3486193","url":null,"abstract":"Auditability, privacy, transparency, and resiliency are four essential properties of a central bank digital currency (CBDC) system. However, it is difficult to satisfy these properties at once. This issue has become a crucial challenge to ongoing CBDC projects worldwide. In this article, we propose a novel unspent transaction output (UTXO) model, which offers auditable, privacy-preserving, transparent CBDC payments in a consortium blockchain network. The proposed model adopts a high-speed, non-interactive zero-knowledge proof scheme named zero-knowledge Lightweight Transparent ARgument of Knowledge (zk-LTARK) scheme to verify the ownership of UTXOs. The scheme provides low-latency proof generation and verification while maintaining 128-bit security with a smaller proof size. It also provides memory-efficient, privacy-preserving multi-party computation and multi-signature protocols. By using zk-LTARKs, users do not require numerous private–public key pairs to preserve privacy, which reduces risks in key management. Decentralized identifiers are used to authenticate users without interacting with any centralized server and avoid a single point of failure. The model was implemented in a customized consortium blockchain network with the proof-of-authority consensus algorithm.","PeriodicalId":13205,"journal":{"name":"IEEE Open Journal of the Computer Society","volume":"5 ","pages":"671-683"},"PeriodicalIF":0.0,"publicationDate":"2024-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10734236","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142663504","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}