Multimedia Tools and Applications最新文献_第2页

Potato leaf disease classification using fusion of multiple color spaces with weighted majority voting on deep learning architectures 利用深度学习架构上的加权多数表决融合多种色彩空间进行马铃薯叶病分类

IF 3.6 4区计算机科学

Multimedia Tools and Applications Pub Date : 2024-09-18 DOI: 10.1007/s11042-024-20173-3

Samaneh Sarfarazi, Hossein Ghaderi Zefrehi, Önsen Toygar

{"title":"Potato leaf disease classification using fusion of multiple color spaces with weighted majority voting on deep learning architectures","authors":"Samaneh Sarfarazi, Hossein Ghaderi Zefrehi, Önsen Toygar","doi":"10.1007/s11042-024-20173-3","DOIUrl":"https://doi.org/10.1007/s11042-024-20173-3","url":null,"abstract":"Early identification of potato leaf disease is challenging due to variations in crop species, disease symptoms, and environmental conditions. Existing methods for detecting crop species and diseases are limited, as they rely on models trained and evaluated solely on plant leaf images from specific regions. This study proposes a novel approach utilizing a Weighted Majority Voting strategy combined with multiple color space models to diagnose potato leaf diseases. The initial detection stage employs deep learning models such as AlexNet, ResNet50, and MobileNet. Our approach aims to identify Early Blight, Late Blight, and healthy potato leaf images. The proposed detection model is trained and tested on two datasets: the PlantVillage dataset and the PLD dataset. The novel fusion and ensemble method achieves an accuracy of 98.38% on the PlantVillage dataset and 98.27% on the PLD dataset with the MobileNet model. An ensemble of all models and color spaces using Weighted Majority Voting significantly increases classification accuracies to 98.61% on the PlantVillage dataset and 97.78% on the PLD dataset. Our contributions include a novel fusion method of color spaces and deep learning models, improving disease detection accuracy beyond the state-of-the-art.","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"195 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266307","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Multimodal emotion recognition based on a fusion of audiovisual information with temporal dynamics 基于视听信息与时间动态融合的多模态情感识别

IF 3.6 4区计算机科学

Multimedia Tools and Applications Pub Date : 2024-09-18 DOI: 10.1007/s11042-024-20227-6

José Salas-Cáceres, Javier Lorenzo-Navarro, David Freire-Obregón, Modesto Castrillón-Santana

{"title":"Multimodal emotion recognition based on a fusion of audiovisual information with temporal dynamics","authors":"José Salas-Cáceres, Javier Lorenzo-Navarro, David Freire-Obregón, Modesto Castrillón-Santana","doi":"10.1007/s11042-024-20227-6","DOIUrl":"https://doi.org/10.1007/s11042-024-20227-6","url":null,"abstract":"In the Human-Machine Interactions (HMI) landscape, understanding user emotions is pivotal for elevating user experiences. This paper explores Facial Expression Recognition (FER) within HMI, employing a distinctive multimodal approach that integrates visual and auditory information. Recognizing the dynamic nature of HMI, where situations evolve, this study emphasizes continuous emotion analysis. This work assesses various fusion strategies that involve the addition to the main network of different architectures, such as autoencoders (AE) or an Embracement module, to combine the information of multiple biometric cues. In addition to the multimodal approach, this paper introduces a new architecture that prioritizes temporal dynamics by incorporating Long Short-Term Memory (LSTM) networks. The final proposal, which integrates different multimodal approaches with the temporal focus capabilities of the LSTM architecture, was tested across three public datasets: RAVDESS, SAVEE, and CREMA-D. It showcased state-of-the-art accuracy of 88.11%, 86.75%, and 80.27%, respectively, and outperformed other existing approaches.","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"32 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266314","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Improvised method for analysis and synthesis of NUFB for Speech and ECG signal applications 用于语音和心电信号的 NUFB 分析与合成改进方法

IF 3.6 4区计算机科学

Multimedia Tools and Applications Pub Date : 2024-09-18 DOI: 10.1007/s11042-024-20211-0

B. Keerthana, N. Raju

{"title":"Improvised method for analysis and synthesis of NUFB for Speech and ECG signal applications","authors":"B. Keerthana, N. Raju","doi":"10.1007/s11042-024-20211-0","DOIUrl":"https://doi.org/10.1007/s11042-024-20211-0","url":null,"abstract":"This article presents a rapidly converging optimization technique using a single parameter for designing non-uniform cosine modulated filter banks (CMFBS). The non-uniform cosine modulated filter banks are derived from closed-form uniform cosine modulated filter banks by merging the relevant bandpass filters based on given decimation factors. In this proposed method, the cut-off frequency of the prototype filter is varied through analytically calculated step size using control parameters so that the filter coefficients at quadrature frequency are approximately equal to 0.707 and the formulated objective function is satisfied with the prescribed tolerance. Simulation results demonstrate that the proposed algorithm achieves superior performance, with amplitude distortion levels significantly outperforming existing methods in the literature, reaching as low as 2.4483 × 10⁻4. For the prototype filter design, a constrained equiripple finite impulse response (FIR) digital filter is employed, with the roll-off factor and error ratio chosen based on a stopband attenuation, a passband attenuation and a filter order. The results highlight the proposed algorithm’s effectiveness for high-quality reconstruction of speech signals, particularly in speech coding and enhancement, as well as ECG signals. This makes the method highly versatile and suitable for various practical applications, including sub-band coding of real-time and near real-time signals.","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"49 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266306","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Template-based text field segmentation for ID documents using dynamic squeezeboxes packing 使用动态挤压框包装基于模板的身份证件文本字段分割

IF 3.6 4区计算机科学

Multimedia Tools and Applications Pub Date : 2024-09-18 DOI: 10.1007/s11042-024-20162-6

Michael Zingerenko, Elena Limonova, Vladimir V. Arlazarov

{"title":"Template-based text field segmentation for ID documents using dynamic squeezeboxes packing","authors":"Michael Zingerenko, Elena Limonova, Vladimir V. Arlazarov","doi":"10.1007/s11042-024-20162-6","DOIUrl":"https://doi.org/10.1007/s11042-024-20162-6","url":null,"abstract":"In this paper, we focus on the problem of text field segmentation in identity documents. These documents, characterized by their fixed layouts, present an opportunity to apply computationally efficient template-based algorithms. We consider the Dynamic Squeezeboxes Packing method and demonstrate its integration into document recognition systems, utilizing a single sample per document type. We benchmark text field segmentation on the MIDV-2019 public dataset using standard intersection-over-union and our custom intersection-over-template metrics, while also measuring processing time. We demonstrate that Dynamic Squeezeboxes Packing maintains competitive quality compared to text in the wild methods (EAST, CRAFT) and named-entity recognition method (LayoutLMv2). A significant advantage of this method is its processing speed, averaging 9 ms per image on the x86_64 platform, which is substantially faster than EAST (980 ms), CRAFT (2030 ms), and LayoutLMv2 (2210 ms). The obtained results suggest that the considered method has strong potential as a method in document image analysis, particularly for processing identity documents.","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"99 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266309","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Enhancement of single foggy image using feature based fusion technique 使用基于特征的融合技术增强单幅雾图像

IF 3.6 4区计算机科学

Multimedia Tools and Applications Pub Date : 2024-09-18 DOI: 10.1007/s11042-024-20181-3

Pooja Pandey, Rashmi Gupta, Nidhi Goel

引用次数: 0

Integration of Blockchain and IPFS: healthcare data management & sharing for IoT Environment 区块链与 IPFS 的整合：物联网环境下的医疗数据管理与共享

IF 3.6 4区计算机科学

Multimedia Tools and Applications Pub Date : 2024-09-17 DOI: 10.1007/s11042-024-20092-3

Rajiv Kumar Mishra, Rajesh Kumar Yadav, Prem Nath

{"title":"Integration of Blockchain and IPFS: healthcare data management & sharing for IoT Environment","authors":"Rajiv Kumar Mishra, Rajesh Kumar Yadav, Prem Nath","doi":"10.1007/s11042-024-20092-3","DOIUrl":"https://doi.org/10.1007/s11042-024-20092-3","url":null,"abstract":"The immense volume of data generated and collected by smart devices has significantly enhanced various aspects of our daily lives. However, safeguarding the sensitive information shared among these devices is crucial. Ensuring the security of the Internet of Things (IoT) ecosystem from unauthorized access is imperative. Blockchain technology emerges as a promising solution to address these security concerns. Nevertheless, the effectiveness of Blockchain in handling the extensive data generated by smart devices is challenged by the rapid pace of IoT data generation and the slower transaction validation speed within Blockchain networks. This research aims to resolve these issues by integrating Blockchain with the Inter-Planetary File System (IPFS), creating a robust framework for secure data recording on a distributed storage network while enabling authorized access to the stored data. The proposed mechanism involves defining and recording access policies and cryptographic hash content on the Blockchain network, while storing the actual IoT-generated data on IPFS to enhance the confidentiality, integrity, and availability (CIA) triad. Performance assessments of the proposed scheme demonstrate its security and practicality, validating its potential for real-world application.","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"1 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266109","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Improving agility in projects using machine learning algorithm 利用机器学习算法提高项目的敏捷性

IF 3.6 4区计算机科学

Multimedia Tools and Applications Pub Date : 2024-09-17 DOI: 10.1007/s11042-024-19909-y

Janani Varun, R A Karthika

引用次数: 0

Machine learning-driven IoT device for women’s safety: a real-time sexual harassment prevention system 促进妇女安全的机器学习驱动型物联网设备：实时性骚扰预防系统

IF 3.6 4区计算机科学

Multimedia Tools and Applications Pub Date : 2024-09-17 DOI: 10.1007/s11042-024-20228-5

Md Reazul Islam, Khondokar Oliullah, Mohsin Kabir, Ashifur Rahman, M. F. Mridha, Muhammed Fayyaz Khan, Nilanjan Dey

{"title":"Machine learning-driven IoT device for women’s safety: a real-time sexual harassment prevention system","authors":"Md Reazul Islam, Khondokar Oliullah, Mohsin Kabir, Ashifur Rahman, M. F. Mridha, Muhammed Fayyaz Khan, Nilanjan Dey","doi":"10.1007/s11042-024-20228-5","DOIUrl":"https://doi.org/10.1007/s11042-024-20228-5","url":null,"abstract":"Sexual harassment is an all-encompassing problem that affects individuals in diverse environments including educational institutions, workplaces, and public areas. Despite increased awareness and advocacy efforts, many women continue to face harassment daily, especially on the Indian sub-continent, with underreporting and impunity exacerbating the problem. As technology advances, there is a growing opportunity to use innovative solutions to address this problem. In recent years, the Internet of Things (IoT) and machine learning have emerged as promising technologies for developing systems that can detect and prevent sexual harassment in real-time. This study presents a novel approach for real-time sexual harassment monitoring using a machine learning-based IoT system. The system incorporates nine force-sensitive resistors strategically embedded in women’s dresses to capture relevant data. It is portable and can be affixed to any type of dressing. If the user wishes to change their attire, the system can be easily removed from the current dress and attached to another dress of choice. This flexibility allows users to adapt the system to suit various clothing preferences and styles. The sensor data are transmitted to the cloud via the NodeMCU, enabling continuous monitoring. In the cloud, a pre-trained machine learning model, specifically the AdaBoost classifier, was employed to classify incoming data in real time. We applied four ML methods: RF with GridSearchCV, Bagging Classifier, XGBoost, and Adaboost Classifier. The AdaBoost classifier performed best with an accuracy of 99.3% using a dataset prepared by our lab, which consists of 1048 instances and was collected from 50 students. If a sexual harassment event is detected, an alert is generated through a mobile application and promptly sent to appropriate authorities for immediate action to save the victim. By integrating wearable sensors, IoT technology, and machine learning, this system offers a proactive and efficient approach, especially in uncertain situations, to detect and address sexual harassment incidents and enhance safety and security in various settings.","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"7 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266312","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Enhancing multi-target tracking stability using knowledge graph integration within the Gaussian Mixture Probability Hypothesis Density Filter 利用高斯混杂概率假设密度滤波器中的知识图谱集成增强多目标跟踪稳定性

IF 3.6 4区计算机科学

Multimedia Tools and Applications Pub Date : 2024-09-17 DOI: 10.1007/s11042-024-20180-4

Ali Mehrizi, Hadi Sadoghi Yazdi

引用次数: 0

Effective video deblurring based on feature-enhanced deep learning network for daytime and nighttime images 基于特征增强型深度学习网络的昼夜图像有效去模糊技术

IF 3.6 4区计算机科学

Multimedia Tools and Applications Pub Date : 2024-09-16 DOI: 10.1007/s11042-024-20222-x

Deng-Yuan Huang, Chao-Ho Chen, Tsong-Yi Chen, Jia-En Li, Hsueh-Liang Hsiao, Da-Jinn Wang, Cheng-Kang Wen

{"title":"Effective video deblurring based on feature-enhanced deep learning network for daytime and nighttime images","authors":"Deng-Yuan Huang, Chao-Ho Chen, Tsong-Yi Chen, Jia-En Li, Hsueh-Liang Hsiao, Da-Jinn Wang, Cheng-Kang Wen","doi":"10.1007/s11042-024-20222-x","DOIUrl":"https://doi.org/10.1007/s11042-024-20222-x","url":null,"abstract":"Motion-blurred images are usually generated when captured with a handheld or wearable video camera, owing to rapid movement of the camera or foreground (i.e., moving object captured). Most traditional algorithm-based approaches cannot effectively restore the nonlinear motion-blurred images. Deep learning network-based approaches with intensive computations have recently been developed for deblurring blind motion-blurred images. However, they still achieve limited effect in restoring the details of the images, especially for blurred nighttime images. To effectively deblur the blurred daytime and nighttime images, the proposed video deblurring method consists of three major parts: an image storage module (storing the previous deblurred frame), adjacent frames alignment module (performing optimal feature point selection and perspective transformation matrix), and video-deblurring neural network module (containing two sub-networks of single image deblurring and adjacent frames fusion deblurring). The proposed approach’s main strategy is to design a blurred attention block to extract more effective features (especially for nighttime images) to restore the edges or details of objects. Additionally, the skip connection is introduced into such two sub-networks to improve the model’s ability to fuse contextual features across different layers to enhance the deblurring effect further. Quantitative evaluations demonstrate that our method achieves an average PSNR of 32.401 dB and SSIM of 0.9107, surpassing the next-best method by 1.635 dB in PSNR and 0.0381 in SSIM. Such improvements reveal the effectiveness of the proposed approach in addressing deblurring challenges across both daytime and nighttime scenarios, especially for making the alphanumeric characters in the really blurred nighttime images legible.","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"50 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266108","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0