{"title":"Potato leaf disease classification using fusion of multiple color spaces with weighted majority voting on deep learning architectures","authors":"Samaneh Sarfarazi, Hossein Ghaderi Zefrehi, Önsen Toygar","doi":"10.1007/s11042-024-20173-3","DOIUrl":"https://doi.org/10.1007/s11042-024-20173-3","url":null,"abstract":"<p>Early identification of potato leaf disease is challenging due to variations in crop species, disease symptoms, and environmental conditions. Existing methods for detecting crop species and diseases are limited, as they rely on models trained and evaluated solely on plant leaf images from specific regions. This study proposes a novel approach utilizing a Weighted Majority Voting strategy combined with multiple color space models to diagnose potato leaf diseases. The initial detection stage employs deep learning models such as AlexNet, ResNet50, and MobileNet. Our approach aims to identify Early Blight, Late Blight, and healthy potato leaf images. The proposed detection model is trained and tested on two datasets: the PlantVillage dataset and the PLD dataset. The novel fusion and ensemble method achieves an accuracy of 98.38% on the PlantVillage dataset and 98.27% on the PLD dataset with the MobileNet model. An ensemble of all models and color spaces using Weighted Majority Voting significantly increases classification accuracies to 98.61% on the PlantVillage dataset and 97.78% on the PLD dataset. Our contributions include a novel fusion method of color spaces and deep learning models, improving disease detection accuracy beyond the state-of-the-art.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266307","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Michael Zingerenko, Elena Limonova, Vladimir V. Arlazarov
{"title":"Template-based text field segmentation for ID documents using dynamic squeezeboxes packing","authors":"Michael Zingerenko, Elena Limonova, Vladimir V. Arlazarov","doi":"10.1007/s11042-024-20162-6","DOIUrl":"https://doi.org/10.1007/s11042-024-20162-6","url":null,"abstract":"<p>In this paper, we focus on the problem of text field segmentation in identity documents. These documents, characterized by their fixed layouts, present an opportunity to apply computationally efficient template-based algorithms. We consider the Dynamic Squeezeboxes Packing method and demonstrate its integration into document recognition systems, utilizing a single sample per document type. We benchmark text field segmentation on the MIDV-2019 public dataset using standard intersection-over-union and our custom intersection-over-template metrics, while also measuring processing time. We demonstrate that Dynamic Squeezeboxes Packing maintains competitive quality compared to text in the wild methods (EAST, CRAFT) and named-entity recognition method (LayoutLMv2). A significant advantage of this method is its processing speed, averaging 9 ms per image on the x86_64 platform, which is substantially faster than EAST (980 ms), CRAFT (2030 ms), and LayoutLMv2 (2210 ms). The obtained results suggest that the considered method has strong potential as a method in document image analysis, particularly for processing identity documents.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266309","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Improvised method for analysis and synthesis of NUFB for Speech and ECG signal applications","authors":"B. Keerthana, N. Raju","doi":"10.1007/s11042-024-20211-0","DOIUrl":"https://doi.org/10.1007/s11042-024-20211-0","url":null,"abstract":"<p>This article presents a rapidly converging optimization technique using a single parameter for designing non-uniform cosine modulated filter banks (CMFB<sub>S</sub>). The non-uniform cosine modulated filter banks are derived from closed-form uniform cosine modulated filter banks by merging the relevant bandpass filters based on given decimation factors. In this proposed method, the cut-off frequency of the prototype filter is varied through analytically calculated step size using control parameters so that the filter coefficients at quadrature frequency are approximately equal to 0.707 and the formulated objective function is satisfied with the prescribed tolerance. Simulation results demonstrate that the proposed algorithm achieves superior performance, with amplitude distortion levels significantly outperforming existing methods in the literature, reaching as low as 2.4483 × 10⁻<sup>4</sup>. For the prototype filter design, a constrained equiripple finite impulse response (FIR) digital filter is employed, with the roll-off factor and error ratio chosen based on a stopband attenuation, a passband attenuation and a filter order. The results highlight the proposed algorithm’s effectiveness for high-quality reconstruction of speech signals, particularly in speech coding and enhancement, as well as ECG signals. This makes the method highly versatile and suitable for various practical applications, including sub-band coding of real-time and near real-time signals.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266306","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Enhancement of single foggy image using feature based fusion technique","authors":"Pooja Pandey, Rashmi Gupta, Nidhi Goel","doi":"10.1007/s11042-024-20181-3","DOIUrl":"https://doi.org/10.1007/s11042-024-20181-3","url":null,"abstract":"<p>Foggy and hazy weather conditions are very common natural phenomenon which reduces the visibility of acquired outdoor pictures. Poor visibility creates innumerable problems in various facets of life <i>viz</i>. in tracking, surveillance and in many more fields. In this paper, an efficient feature based fusion technique has been used to enhance the single foggy image at transmission level. Fusion at this level retains most significant features of foggy image and using this fused single input at transmission level, output defog image is calculated. Proposed methodology overcomes the shortcoming of existing Dark Channel Prior and Bright Channel Prior methods.Output of proposed method shows promising result for all types of datasets varying in fog density as well as in size. The foremost major advantage of this method is that it does not require any pre-processing or post processing and thus, very simple to implement.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266308","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Integration of Blockchain and IPFS: healthcare data management & sharing for IoT Environment","authors":"Rajiv Kumar Mishra, Rajesh Kumar Yadav, Prem Nath","doi":"10.1007/s11042-024-20092-3","DOIUrl":"https://doi.org/10.1007/s11042-024-20092-3","url":null,"abstract":"<p>The immense volume of data generated and collected by smart devices has significantly enhanced various aspects of our daily lives. However, safeguarding the sensitive information shared among these devices is crucial. Ensuring the security of the Internet of Things (IoT) ecosystem from unauthorized access is imperative. Blockchain technology emerges as a promising solution to address these security concerns. Nevertheless, the effectiveness of Blockchain in handling the extensive data generated by smart devices is challenged by the rapid pace of IoT data generation and the slower transaction validation speed within Blockchain networks. This research aims to resolve these issues by integrating Blockchain with the Inter-Planetary File System (IPFS), creating a robust framework for secure data recording on a distributed storage network while enabling authorized access to the stored data. The proposed mechanism involves defining and recording access policies and cryptographic hash content on the Blockchain network, while storing the actual IoT-generated data on IPFS to enhance the confidentiality, integrity, and availability (CIA) triad. Performance assessments of the proposed scheme demonstrate its security and practicality, validating its potential for real-world application.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266109","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Improving agility in projects using machine learning algorithm","authors":"Janani Varun, R A Karthika","doi":"10.1007/s11042-024-19909-y","DOIUrl":"https://doi.org/10.1007/s11042-024-19909-y","url":null,"abstract":"<p>All the software products developed will need testing to ensure the quality and accuracy of the product. It makes the life of testers much easier when they can optimize on the effort spent and predict defects for the upcoming modules in the Agile era. The functionality being discussed in this paper is to predict the defects using Random Forest Algorithm. Predictive analytics draws on information from the past to create forecasts about the outcomes of future events. Product team always have the difficulty in delivering the product as per schedule. As we are in the agile era, the requirement keeps changing and team is unsure on upcoming releases. Prediction helps the team to focus on the complex and error prone modules in upcoming releases. The Predictive analytics model designed, can predict defects with an accuracy rate of 88% with the help of historical data. By predicting, testers can focus on the module where there are a greater number of defects predicted by the model and left shift the delivery.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266311","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Md Reazul Islam, Khondokar Oliullah, Mohsin Kabir, Ashifur Rahman, M. F. Mridha, Muhammed Fayyaz Khan, Nilanjan Dey
{"title":"Machine learning-driven IoT device for women’s safety: a real-time sexual harassment prevention system","authors":"Md Reazul Islam, Khondokar Oliullah, Mohsin Kabir, Ashifur Rahman, M. F. Mridha, Muhammed Fayyaz Khan, Nilanjan Dey","doi":"10.1007/s11042-024-20228-5","DOIUrl":"https://doi.org/10.1007/s11042-024-20228-5","url":null,"abstract":"<p>Sexual harassment is an all-encompassing problem that affects individuals in diverse environments including educational institutions, workplaces, and public areas. Despite increased awareness and advocacy efforts, many women continue to face harassment daily, especially on the Indian sub-continent, with underreporting and impunity exacerbating the problem. As technology advances, there is a growing opportunity to use innovative solutions to address this problem. In recent years, the Internet of Things (IoT) and machine learning have emerged as promising technologies for developing systems that can detect and prevent sexual harassment in real-time. This study presents a novel approach for real-time sexual harassment monitoring using a machine learning-based IoT system. The system incorporates nine force-sensitive resistors strategically embedded in women’s dresses to capture relevant data. It is portable and can be affixed to any type of dressing. If the user wishes to change their attire, the system can be easily removed from the current dress and attached to another dress of choice. This flexibility allows users to adapt the system to suit various clothing preferences and styles. The sensor data are transmitted to the cloud via the NodeMCU, enabling continuous monitoring. In the cloud, a pre-trained machine learning model, specifically the AdaBoost classifier, was employed to classify incoming data in real time. We applied four ML methods: RF with GridSearchCV, Bagging Classifier, XGBoost, and Adaboost Classifier. The AdaBoost classifier performed best with an accuracy of 99.3% using a dataset prepared by our lab, which consists of 1048 instances and was collected from 50 students. If a sexual harassment event is detected, an alert is generated through a mobile application and promptly sent to appropriate authorities for immediate action to save the victim. By integrating wearable sensors, IoT technology, and machine learning, this system offers a proactive and efficient approach, especially in uncertain situations, to detect and address sexual harassment incidents and enhance safety and security in various settings.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266312","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Enhancing multi-target tracking stability using knowledge graph integration within the Gaussian Mixture Probability Hypothesis Density Filter","authors":"Ali Mehrizi, Hadi Sadoghi Yazdi","doi":"10.1007/s11042-024-20180-4","DOIUrl":"https://doi.org/10.1007/s11042-024-20180-4","url":null,"abstract":"<p> This paper proposes a novel approach to enhancing multi-target tracking of vehicles in videos with frequent camera occlusions. Our method integrates prior knowledge about vehicle behavior into a Gaussian Mixture Probability Hypothesis Density (GMPHD) filter framework. This knowledge, extracted as a knowledge graph from historical vehicle trajectories, allows the tracker to maintain persistence even during significant interruptions. The knowledge graph models expected movement patterns and generates pseudo-observations during occlusions, similar to how time series analysis leverages historical data for forecasting. We evaluate the proposed method on both simulated and real-world video datasets using the Optimal Sub Pattern Assignment (OSPA) metric, which assesses tracking accuracy. The results show a 19.5% improvement for simulated data and a 16.5% improvement for real-world video data under fully occluded conditions, demonstrating a significant enhancement in performance.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266313","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Effective video deblurring based on feature-enhanced deep learning network for daytime and nighttime images","authors":"Deng-Yuan Huang, Chao-Ho Chen, Tsong-Yi Chen, Jia-En Li, Hsueh-Liang Hsiao, Da-Jinn Wang, Cheng-Kang Wen","doi":"10.1007/s11042-024-20222-x","DOIUrl":"https://doi.org/10.1007/s11042-024-20222-x","url":null,"abstract":"<p>Motion-blurred images are usually generated when captured with a handheld or wearable video camera, owing to rapid movement of the camera or foreground (i.e., moving object captured). Most traditional algorithm-based approaches cannot effectively restore the nonlinear motion-blurred images. Deep learning network-based approaches with intensive computations have recently been developed for deblurring blind motion-blurred images. However, they still achieve limited effect in restoring the details of the images, especially for blurred nighttime images. To effectively deblur the blurred daytime and nighttime images, the proposed video deblurring method consists of three major parts: an image storage module (storing the previous deblurred frame), adjacent frames alignment module (performing optimal feature point selection and perspective transformation matrix), and video-deblurring neural network module (containing two sub-networks of single image deblurring and adjacent frames fusion deblurring). The proposed approach’s main strategy is to design a blurred attention block to extract more effective features (especially for nighttime images) to restore the edges or details of objects. Additionally, the skip connection is introduced into such two sub-networks to improve the model’s ability to fuse contextual features across different layers to enhance the deblurring effect further. Quantitative evaluations demonstrate that our method achieves an average PSNR of 32.401 dB and SSIM of 0.9107, surpassing the next-best method by 1.635 dB in PSNR and 0.0381 in SSIM. Such improvements reveal the effectiveness of the proposed approach in addressing deblurring challenges across both daytime and nighttime scenarios, especially for making the alphanumeric characters in the really blurred nighttime images legible.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266108","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"DMR $$^2$$ G: diffusion model for radiology report generation","authors":"Huan Ouyang, Zheng Chang, Binghao Tang, Si Li","doi":"10.1007/s11042-024-20206-x","DOIUrl":"https://doi.org/10.1007/s11042-024-20206-x","url":null,"abstract":"<p>Radiology report generation aims to generate pathological assessments from given radiographic images accurately. Prior methods largely rely on autoregressive models, where the sequential token-by-token generation process always results in longer inference time and suffers from the sequential error accumulation. In order to enhance the efficiency of report generation without compromising diagnostic accuracy, we present a novel radiology report generation approach based on diffusion models. By integrating a graph-guided image feature extractor informed by a radiology knowledge graph, our model adeptly identifies critical abnormalities within images. We also introduce an auxiliary lesion classification loss mechanism using pseudo labels as supervision to align image features and textual disease keyword representations accurately. By adopting the accelerated sampling strategy inherent to diffusion models, our approach significantly reduces the inference time. Through comprehensive evaluation on the IU-Xray and MIMIC-CXR benchmarks, our approach outperforms autoregressive models in inference speed while maintaining high quality, offering a significant advancement in automating radiology report generation task.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266316","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}