IEEE Open Journal of the Computer Society最新文献

筛选
英文 中文
Multimodal EEG-fNIRS Seizure Pattern Decoding Using Vision Transformer 利用视觉变换器进行多模态脑电图-近红外成像系统发作模式解码
IEEE Open Journal of the Computer Society Pub Date : 2024-11-18 DOI: 10.1109/OJCS.2024.3500032
Rafat Damseh;Abdelhadi Hireche;Parikshat Sirpal;Abdelkader Nasreddine Belkacem
{"title":"Multimodal EEG-fNIRS Seizure Pattern Decoding Using Vision Transformer","authors":"Rafat Damseh;Abdelhadi Hireche;Parikshat Sirpal;Abdelkader Nasreddine Belkacem","doi":"10.1109/OJCS.2024.3500032","DOIUrl":"https://doi.org/10.1109/OJCS.2024.3500032","url":null,"abstract":"Epilepsy has been analyzed through uni-modality non-invasive brain measurements such as electroencephalogram (EEG) signal, but identifying seizure patterns is more challenging due to the non-stationary nature of the brain activity and various non-brain artifacts. In this article, we leverage a vision transformer model (ViT) to classify three types of seizure patterns based on multimodal EEG and functional near-infrared spectroscopy (fNIRS) recordings. We used spectral encoding techniques to capture temporal and spatial relationships for brain signals as feature map inputs to the transformer architecture. We evaluated model performance using the receiver operating characteristic (ROC) curves and the area under the curve (AUC), demonstrating that multimodal EEG-fNIRS signals improved the classification accuracy of seizure patterns. Our work showed that power spectral density (PSD) features often led to better results than features derived from dynamic mode decomposition (DMD), particularly for seizures with high-frequency oscillations (HFO) and generalized spike-and-wave discharge (GSWD) patterns, with an accuracy of 93.14% and 91.69%, respectively. Low-voltage fast activity (LVFA) seizures achieved consistently high performance in EEG, fNIRS, and multimodal EEG-fNIRS setups. Overall, our findings suggest the effectiveness of using the ViT architecture with multimodal brain data accompanied by appropriate spectral features to classify the neural activity of epileptic seizure patterns.","PeriodicalId":13205,"journal":{"name":"IEEE Open Journal of the Computer Society","volume":"5 ","pages":"724-735"},"PeriodicalIF":0.0,"publicationDate":"2024-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10755173","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142713874","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
GHOSTForge: A Scalable Consensus Mechanism for DAG-Based Blockchains GHOSTForge:基于 DAG 的区块链可扩展共识机制
IEEE Open Journal of the Computer Society Pub Date : 2024-11-14 DOI: 10.1109/OJCS.2024.3497892
Misbah Khan;Shabnam Kasra Kermanshahi;Jiankun Hu
{"title":"GHOSTForge: A Scalable Consensus Mechanism for DAG-Based Blockchains","authors":"Misbah Khan;Shabnam Kasra Kermanshahi;Jiankun Hu","doi":"10.1109/OJCS.2024.3497892","DOIUrl":"https://doi.org/10.1109/OJCS.2024.3497892","url":null,"abstract":"Blockchain scalability has long been a critical issue, and Directed Acyclic Graphs (DAGs) offer a promising solution by enabling higher throughput. However, despite their scalability, achieving global convergence or consensus in heterogeneous DAG networks remains a significant challenge. This work, introduces GHOSTForge, building on the Greedy Heaviest-Observed Sub-tree (GHOST) protocol to address these challenges. GHOSTForge incorporates unique coloring and scoring mechanisms alongside stability thresholds and order-locking processes. This protocol addresses the inefficiencies found in existing systems, such as PHANTOM, by offering a more proficient two-level coloring and scoring method that eliminates circular dependencies and enhances scalability. The use of stability thresholds enables the early locking of block orders, reducing computational overhead while maintaining robust security. GHOSTForge's design adapts dynamically to varying network conditions, ensuring quick block order convergence and strong resistance to attacks, such as double-spending. Our experimental results demonstrate that GHOSTForge excels in achieving both computational efficiency and rapid consensus, positioning it as a powerful and scalable solution for decentralized, heterogeneous DAG networks.","PeriodicalId":13205,"journal":{"name":"IEEE Open Journal of the Computer Society","volume":"5 ","pages":"736-747"},"PeriodicalIF":0.0,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10753055","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142736362","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Metaverse for Industry 5.0 in NextG Communications: Potential Applications and Future Challenges 下一代通信中工业5.0的元宇宙:潜在应用和未来挑战
IEEE Open Journal of the Computer Society Pub Date : 2024-11-13 DOI: 10.1109/OJCS.2024.3497335
Prabadevi Boopathy;Natarajan Deepa;Praveen Kumar Reddy Maddikunta;Nancy Victor;Thippa Reddy Gadekallu;Gokul Yenduri;Wei Wang;Quoc-Viet Pham;Thien Huynh-The;Madhusanka Liyanage
{"title":"The Metaverse for Industry 5.0 in NextG Communications: Potential Applications and Future Challenges","authors":"Prabadevi Boopathy;Natarajan Deepa;Praveen Kumar Reddy Maddikunta;Nancy Victor;Thippa Reddy Gadekallu;Gokul Yenduri;Wei Wang;Quoc-Viet Pham;Thien Huynh-The;Madhusanka Liyanage","doi":"10.1109/OJCS.2024.3497335","DOIUrl":"https://doi.org/10.1109/OJCS.2024.3497335","url":null,"abstract":"With the advent of new technologies and endeavours for automation in almost all day-to-day activities, the recent discussions on the metaverse life have a greater expectation. The metaverse enables people to communicate with each other by combining the physical world with the virtual world. However, realizing the Metaverse requires symmetric content delivery, low latency, dynamic network control, etc. Industry 5.0 is expected to reform the manufacturing processes through human-robot collaboration and effective utilization of technologies like Artificial intelligence for increased productivity and less maintenance. The metaverse with Industry 5.0 may have tremendous technological integration for a more immersive experience and enhanced productivity. In this review, we present an overview of the metaverse and Industry 5.0, focusing on key technologies that enable the industrial metaverse, including virtual and augmented reality, 3D modeling, artificial intelligence, edge computing, digital twins, blockchain, and 6G communication networks. The article then discusses the metaverse's diverse applications across various Industry 5.0 sectors, such as agriculture, supply chain management, healthcare, education, and transportation, illustrated through several research initiatives. Additionally, the article addresses the challenges of implementing the industrial metaverse, proposes potential solutions, and outlines directions for future research.","PeriodicalId":13205,"journal":{"name":"IEEE Open Journal of the Computer Society","volume":"6 ","pages":"4-24"},"PeriodicalIF":0.0,"publicationDate":"2024-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10752374","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142938073","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Object Re-Identification Based on Federated Incremental Subgradient Proximal Optimization 基于联邦增量亚梯度近端优化的目标再识别
IEEE Open Journal of the Computer Society Pub Date : 2024-11-04 DOI: 10.1109/OJCS.2024.3489875
Li Kang;Chuanghong Zhao;Jianjun Huang
{"title":"Object Re-Identification Based on Federated Incremental Subgradient Proximal Optimization","authors":"Li Kang;Chuanghong Zhao;Jianjun Huang","doi":"10.1109/OJCS.2024.3489875","DOIUrl":"https://doi.org/10.1109/OJCS.2024.3489875","url":null,"abstract":"Object Re-identification (Object ReID) is one of the key tasks in the field of computer vision. However, traditional centralized ReID methods face challenges related to privacy protection and data storage. Federated learning, as a distributed machine learning framework, can utilize dispersed data for model training without sharing raw data, thereby reducing communication costs and ensuring data privacy. However, the real statistical heterogeneity in federated object re-identification leads to domain shift issues, resulting in decreased performance and generalization ability of the ReID model. Therefore, to address the privacy constraints and real statistical heterogeneity in object re-identification, this article focuses on studying the object re-identification method based on the Federated Incremental Subgradient Proximal(FedISP) framework. FedISP effectively alleviates weight divergence and low communication efficiency issues through incremental sub-gradient proximal methods and ring topology, ensuring stable model convergence and efficient communication. Considering the complexity of ReID scenarios, this article adopts a ViT-based task model to cope with feature skew across clients. Additionally, it defines camera federated scenarios and dataset federated scenarios for problem modeling and analysis. Furthermore, due to the heterogeneous classifiers that clients may have, the approach intergrates personalized layers. In the experiments, instance datasets of two federated scenarios were constructed for model training. The final test results show that FedISP can effectively address the privacy protection and statistical heterogeneity issues faced by ReID.","PeriodicalId":13205,"journal":{"name":"IEEE Open Journal of the Computer Society","volume":"6 ","pages":"60-71"},"PeriodicalIF":0.0,"publicationDate":"2024-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10742512","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142938075","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Framework to Design Efficent Blockchain-Based Decentralized Federated Learning Architectures 设计基于区块链的高效去中心化联合学习架构的框架
IEEE Open Journal of the Computer Society Pub Date : 2024-10-30 DOI: 10.1109/OJCS.2024.3488512
Yannis Formery;Leo Mendiboure;Jonathan Villain;Virginie Deniau;Christophe Gransart
{"title":"A Framework to Design Efficent Blockchain-Based Decentralized Federated Learning Architectures","authors":"Yannis Formery;Leo Mendiboure;Jonathan Villain;Virginie Deniau;Christophe Gransart","doi":"10.1109/OJCS.2024.3488512","DOIUrl":"https://doi.org/10.1109/OJCS.2024.3488512","url":null,"abstract":"Distributed machine learning, and Decentralized Federated Learning in particular, is emerging as an effective solution to cope with the ever-increasing amount of data and the need to process it faster and more reliably. It enables machine learning models to be trained without centralizing user data, which improves data confidentiality and optimizes performance compared with centralized approaches. However, scaling up such systems can have limitations in terms of data and model traceability and security. To address this limitation, the integration of Blockchain has been proposed, forming a global system leveraging Blockchain, called Blockchain Based Decentralized Federated Learning (BDFL), and taking advantage of the benefits of this technology, namely transparency, immutability and decentralization. For the time being, few studies have sought to characterize these BDFL systems, although it seems that they can be broken down into a set of layers (blockchain, interconnection of DFL nodes, client selection, data transmission, consensus management) that could have a major impact on the operation of the BDFL as a whole. The aim of this article is therefore to respond to this limitation by highlighting the different layers existing in the architecture of a BDFL system and the solutions proposed in the literature that can be integrated to optimise both the performance and the security of the system. This could ultimately lead to the design of more secure and efficient architectures with greater resilience to attacks and architectural changes.","PeriodicalId":13205,"journal":{"name":"IEEE Open Journal of the Computer Society","volume":"5 ","pages":"705-723"},"PeriodicalIF":0.0,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10738377","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142694673","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhancing Cross-Language Multimodal Emotion Recognition With Dual Attention Transformers 利用双重注意力转换器增强跨语言多模态情感识别能力
IEEE Open Journal of the Computer Society Pub Date : 2024-10-28 DOI: 10.1109/OJCS.2024.3486904
Syed Aun Muhammad Zaidi;Siddique Latif;Junaid Qadir
{"title":"Enhancing Cross-Language Multimodal Emotion Recognition With Dual Attention Transformers","authors":"Syed Aun Muhammad Zaidi;Siddique Latif;Junaid Qadir","doi":"10.1109/OJCS.2024.3486904","DOIUrl":"https://doi.org/10.1109/OJCS.2024.3486904","url":null,"abstract":"Despite the recent progress in emotion recognition, state-of-the-art systems are unable to achieve improved performance in cross-language settings. In this article we propose a Multimodal Dual Attention Transformer (MDAT) model to improve cross-language multimodal emotion recognition. Our model utilises pre-trained models for multimodal feature extraction and is equipped with dual attention mechanisms including graph attention and co-attention to capture complex dependencies across different modalities and languages to achieve improved cross-language multimodal emotion recognition. In addition, our model also exploits a transformer encoder layer for high-level feature representation to improve emotion classification accuracy. This novel construct preserves modality-specific emotional information while enhancing cross-modality and cross-language feature generalisation, resulting in improved performance with minimal target language data. We assess our model's performance on four publicly available emotion recognition datasets and establish its superior effectiveness compared to recent approaches and baseline models.","PeriodicalId":13205,"journal":{"name":"IEEE Open Journal of the Computer Society","volume":"5 ","pages":"684-693"},"PeriodicalIF":0.0,"publicationDate":"2024-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10736634","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142663505","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Auditable, Privacy-Preserving, Transparent Unspent Transaction Output Model for Blockchain-Based Central Bank Digital Currency 基于区块链的中央银行数字货币的可审计、隐私保护、透明的未支出交易输出模型
IEEE Open Journal of the Computer Society Pub Date : 2024-10-24 DOI: 10.1109/OJCS.2024.3486193
Md. Mainul Islam;Hoh Peter IN
{"title":"An Auditable, Privacy-Preserving, Transparent Unspent Transaction Output Model for Blockchain-Based Central Bank Digital Currency","authors":"Md. Mainul Islam;Hoh Peter IN","doi":"10.1109/OJCS.2024.3486193","DOIUrl":"https://doi.org/10.1109/OJCS.2024.3486193","url":null,"abstract":"Auditability, privacy, transparency, and resiliency are four essential properties of a central bank digital currency (CBDC) system. However, it is difficult to satisfy these properties at once. This issue has become a crucial challenge to ongoing CBDC projects worldwide. In this article, we propose a novel unspent transaction output (UTXO) model, which offers auditable, privacy-preserving, transparent CBDC payments in a consortium blockchain network. The proposed model adopts a high-speed, non-interactive zero-knowledge proof scheme named zero-knowledge Lightweight Transparent ARgument of Knowledge (zk-LTARK) scheme to verify the ownership of UTXOs. The scheme provides low-latency proof generation and verification while maintaining 128-bit security with a smaller proof size. It also provides memory-efficient, privacy-preserving multi-party computation and multi-signature protocols. By using zk-LTARKs, users do not require numerous private–public key pairs to preserve privacy, which reduces risks in key management. Decentralized identifiers are used to authenticate users without interacting with any centralized server and avoid a single point of failure. The model was implemented in a customized consortium blockchain network with the proof-of-authority consensus algorithm.","PeriodicalId":13205,"journal":{"name":"IEEE Open Journal of the Computer Society","volume":"5 ","pages":"671-683"},"PeriodicalIF":0.0,"publicationDate":"2024-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10734236","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142663504","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Video-Based Deception Detection via Capsule Network With Channel-Wise Attention and Supervised Contrastive Learning 通过胶囊网络进行基于视频的欺骗检测,并辅以渠道关注和监督对比学习
IEEE Open Journal of the Computer Society Pub Date : 2024-10-24 DOI: 10.1109/OJCS.2024.3485688
Shuai Gao;Lin Chen;Yuancheng Fang;Shengbing Xiao;Hui Li;Xuezhi Yang;Rencheng Song
{"title":"Video-Based Deception Detection via Capsule Network With Channel-Wise Attention and Supervised Contrastive Learning","authors":"Shuai Gao;Lin Chen;Yuancheng Fang;Shengbing Xiao;Hui Li;Xuezhi Yang;Rencheng Song","doi":"10.1109/OJCS.2024.3485688","DOIUrl":"https://doi.org/10.1109/OJCS.2024.3485688","url":null,"abstract":"Deception detection is essential for protecting the public interest and maintaining social order. Its application in various fields helps to establish a safer and trustworthy social environment. This study focuses on the problem of deception detection in videos and proposes a visual deception detection method based on a capsule network (DDCapsNet). The DDCapsNet model predicts deception classification using the fusion of facial expression features and video-based heart rate feature via a channel attention mechanism. Supervised contrastive learning is further introduced to enhance the generalization ability of the DDCapsNet. The proposed model is evaluated on a self-collected dataset (physiological-assisted visual deception detection dataset, PV3D) and the public Bag-of-Lies (BOL) dataset, respectively. The results show that DDCapsNet outperforms the unimodal system and other state-of-the-art (SOTA) methods, where the ACC reaches 77.97% and the AUC reaches 78.45% on PV3D, and the ACC reaches 73.19% and the AUC reaches 72.78% on BOL dataset.","PeriodicalId":13205,"journal":{"name":"IEEE Open Journal of the Computer Society","volume":"5 ","pages":"660-670"},"PeriodicalIF":0.0,"publicationDate":"2024-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10734158","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142598655","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Innovative Dense ResU-Net Architecture With T-Max-Avg Pooling for Advanced Crack Detection in Concrete Structures 采用 T-Max-Avg 池的创新型密集 ResU-Net 架构,用于混凝土结构中的高级裂缝检测
IEEE Open Journal of the Computer Society Pub Date : 2024-10-16 DOI: 10.1109/OJCS.2024.3481000
Ali Sarhadi;Mehdi Ravanshadnia;Armin Monirabbasi;Milad Ghanbari
{"title":"An Innovative Dense ResU-Net Architecture With T-Max-Avg Pooling for Advanced Crack Detection in Concrete Structures","authors":"Ali Sarhadi;Mehdi Ravanshadnia;Armin Monirabbasi;Milad Ghanbari","doi":"10.1109/OJCS.2024.3481000","DOIUrl":"https://doi.org/10.1109/OJCS.2024.3481000","url":null,"abstract":"Computer vision which uses Convolutional Neural Network (CNN) models is a robust and accurate tool for precise monitoring and pixel-level detection of potential damage in concrete structures. Using a state-of-the-art Dense ResU-Net model integrated with T-Max-Avg pooling layers, the present study introduces a novel and effective method for crack detection in concrete structures. The major innovation of this research is the introduction of the T-Max-Avg pooling layer within the Dense ResU-Net architecture which synergistically combines the strengths of both max and average pooling to improve feature retention and minimize information loss during crack detection. In addition, the incorporation of Residual and Dense blocks within the U-Net framework significantly enhances feature extraction and network depth, resulting in a more robust anomaly detection. The implementation of extensive data augmentation techniques improves the robustness of the model while the application of spatial dropout and L2 regularization techniques prevents overfitting. The proposed model showed a superior performance, outperforming traditional and state-of-the-art models. It had a Dice Coefficient score of 97.41%, an Intersection-over-Union (IoU) score of 98.63%, and an accuracy of 99.2% using a batch size of 32. These results confirmed the reliability and efficacy of the Dense ResU-Net with T-Max-Avg pooling layer for accurate crack detection, demonstrating its potential for real-world applications in structural health monitoring. By taking advantage of advanced deep learning techniques, the proposed method addressed the limitations of traditional crack detection techniques and offered significant improvements in robustness and accuracy.","PeriodicalId":13205,"journal":{"name":"IEEE Open Journal of the Computer Society","volume":"5 ","pages":"636-647"},"PeriodicalIF":0.0,"publicationDate":"2024-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10720206","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142555104","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MusicTalk: A Microservice Approach for Musical Instrument Recognition MusicTalk:识别乐器的微服务方法
IEEE Open Journal of the Computer Society Pub Date : 2024-10-08 DOI: 10.1109/OJCS.2024.3476416
Yi-Bing Lin;Chang-Chieh Cheng;Shih-Chuan Chiu
{"title":"MusicTalk: A Microservice Approach for Musical Instrument Recognition","authors":"Yi-Bing Lin;Chang-Chieh Cheng;Shih-Chuan Chiu","doi":"10.1109/OJCS.2024.3476416","DOIUrl":"https://doi.org/10.1109/OJCS.2024.3476416","url":null,"abstract":"Musical instrument recognition is the process of using machine learning or audio signal processing to identify and classify different musical instruments from an audio recording. This capability enables more precise analysis of musical pieces, aiding in tasks like transcription, music recommendation, and automated composition. The challenges include (1) recognition models not being accurate enough, (2) the need to retrain the entire model when a new instrument is added, and (3) differences in audio formats that prevent direct usage. To address these challenges, this article introduces MusicTalk, a microservice based musical instrument (MI) detection system, with several key contributions. Firstly, MusicTalk introduces a novel patchout mechanism named Brightness Characteristic Based Patchout for the ViT algorithm, which enhances MI detection accuracy compared to existing solutions. Secondly, MusicTalk integrates individual MI detectors as microservices, facilitating efficient interaction with other microservices. Thirdly, MusicTalk incorporates an audio shaper that unifies diverse music open datasets such as Audioset, Openmic-2018, MedleyDB, URMP, and INSTDB. By employing Grad-CAM analysis on Mel-Spectrograms, we elucidate the characteristics of the MI detection model. This analysis allows us to optimize ensemble combinations of ViT with patchout and CNNs within MusicTalk, resulting in high accuracy rates. For instance, the system achieves precision and recall rates of 96.17% and 95.77% respectively for violin detection, which are the highest among previous approaches. An additional advantage of MusicTalk lies in its microservice-driven visualization capabilities. By integrating MI detectors as microservices, MusicTalk enables seamless visualization of songs using animated avatars. In a case study featuring “Peter and the Wolf,” we demonstrate that improved MI detection accuracy enhances the visual storytelling impact of music. The overall F1-score improvement of MusicTalk over previous approaches for this song is up to 12%.","PeriodicalId":13205,"journal":{"name":"IEEE Open Journal of the Computer Society","volume":"5 ","pages":"612-623"},"PeriodicalIF":0.0,"publicationDate":"2024-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10709650","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142518112","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信