Yangtao Wang, Yicheng Ye, Yanzhao Xie, Maobin Tang, Lisheng Fan
{"title":"HSALC: hard sample aware label correction for medical image classification","authors":"Yangtao Wang, Yicheng Ye, Yanzhao Xie, Maobin Tang, Lisheng Fan","doi":"10.1007/s11042-024-20114-0","DOIUrl":"https://doi.org/10.1007/s11042-024-20114-0","url":null,"abstract":"<p>Medical image automatic classification has always been a research hotspot, but the existing methods suffer from the label noise problem, which either discards those samples with noisy labels or produces wrong label correction, seriously preventing the medical image classification performance improvement. To address the above problems, in this paper, we propose a hard sample aware label correction (termed as HSALC) method for medical image classification. Our HSALC mainly consists of a sample division module, a clean<span>(cdot )</span>hard<span>(cdot )</span>noisy (termed as CHN) detection module and a label noise correction module. First, in the sample division module, we design a sample division criterion based on the training difficulty and training losses to divide all samples into three preliminary subsets: clean samples, hard samples and noisy samples. Second, in the CHN detection module, we add noise to the above clean samples and repeatedly adopt the sample division criterion to effectively detect all data, which helps obtain highly reliable clean samples, hard samples and noisy samples. Finally, in the label noise correction module, in order to make full use of each available sample, we train a correction model to purify and correct the wrong labels of noisy samples as much as possible, which brings a highly purified dataset. We conduct extensive experiments on five image datasets including three medical image datasets and two natural image datasets. Experimental results demonstrate that HSALC can greatly promote classification performance on noisily labeled datasets, especially with high noise ratios. The source code of this paper is publicly available at GitHub: https://github.com/YYC117/HSALC.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"60 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203578","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Roopa R. Kulkarni, Abhishek D. Sharma, Bhuvan K. Koundinya, Chokkanahalli Anirudh, Yashas N
{"title":"Effective and efficient automatic detection, prediction and prescription of potential disease in berry family","authors":"Roopa R. Kulkarni, Abhishek D. Sharma, Bhuvan K. Koundinya, Chokkanahalli Anirudh, Yashas N","doi":"10.1007/s11042-024-19896-0","DOIUrl":"https://doi.org/10.1007/s11042-024-19896-0","url":null,"abstract":"<p>The grape cultivation industry in India faces significant challenges from fungal pests and diseases, leading to substantial economic losses. Detecting leaf diseases in grape plants at an early stage is crucial to prevent infections from spreading, minimize crop damage, and apply timely and precise treatments. This proactive approach is vital for maintaining the productivity and quality of grape cultivation. Integrated technology is crucial for improving grape production and minimizing the use of harmful pesticides. Developing smart robots and computer vision-enabled systems can efficiently detect and predict diseases, reducing human labor and optimizing grape production. The CNN algorithm achieved an accuracy of 98% using the real-time dataset, making it a highly effective method for image training and classification. VGG16 and Improved VGG16 achieved accuracies of 95% and 96%, respectively, indicating their strong performance. MobileNet and Improved MobileNet achieved accuracies of 86% and 97%, respectively. Utilizing Convolutional Neural Networks (CNN) for grape plant leaf detection facilitates precise and automated differentiation between healthy and diseased leaves by analyzing their visual features. This method not only enables early disease detection but also calculates the total area of the leaf affected by the disease. Such an approach presents a promising solution to enhance productivity in grape cultivation.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"24 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203587","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Parkinsonian gait modelling from an anomaly deep representation","authors":"Edgar Rangel, Fabio Martínez","doi":"10.1007/s11042-024-19961-8","DOIUrl":"https://doi.org/10.1007/s11042-024-19961-8","url":null,"abstract":"<p>Parkinson’s Disease (PD) is associated with gait movement disorders, such as bradykinesia, stiffness, tremors and postural instability. Hence, a kinematic gait analysis for PD characterization is key to support diagnosis and to carry out an effective treatment planning. Nowadays, automatic classification and characterization strategies are based on deep learning representations, following supervised rules, and assuming large and stratified data. Nonetheless, such requirements are far from real clinical scenarios. Additionally, supervised rules may introduce bias into architectures from expert’s annotations. This work introduces a self-supervised generative representation to learn gait-motion-related patterns, under the pretext task of video reconstruction. Following an anomaly detection framework, the proposed architecture can avoid inter-class variance, learning hidden and complex kinematics locomotion relationships. In this study, the proposed model was trained and validated with an owner dataset (14 Parkinson and 23 control). Also, an external public dataset (16 Parkinson, 30 control, and 50 Knee-arthritis) was used only for testing, measuring the generalization capability of the method. During training, the method learns from control subjects, while Parkinson subjects are detected as anomaly samples. From owner dataset, the proposed approach achieves a ROC-AUC of 95% in classification task. Regarding the external dataset, the architecture evidence generalization capabilities, achieving a 75% of ROC-AUC (shapeness and homoscedasticity of 66.7%), without any additional training. The proposed model has remarkable performance in detecting gait parkinsonian patterns, recorded in markerless videos, even competitive results with classes non-observed during training.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"2 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203588","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jie Xie, Mingying Zhu, Kai Hu, Jinglan Zhang, Ya Guo
{"title":"Investigation of attention mechanism for speech command recognition","authors":"Jie Xie, Mingying Zhu, Kai Hu, Jinglan Zhang, Ya Guo","doi":"10.1007/s11042-024-20129-7","DOIUrl":"https://doi.org/10.1007/s11042-024-20129-7","url":null,"abstract":"<p>As an application area of speech command recognition, the smart home has provided people with a convenient way to communicate with various digital devices. Deep learning has demonstrated its effectiveness in speech command recognition. However, few studies have conducted extensive research on leveraging attention mechanisms to enhance its performance. In this study, we aim to investigate the deep learning architectures for improved speaker-independent speech command recognition. Specifically, we first compare the log-Mel-spectrogram and log-Gammatone spectrogram using VGG style and VGG-skip style networks. Next, the best-performing model is selected and investigated using different attention mechanisms including channel-time attention, channel-frequency attention, and channel-time-frequency attention. Finally, a dual CNN with cross-attention is used for speech command classification. A self-made dataset including 40 participants with 12 classes is used for the experiment which are all recorded in Mandarin Chinese, utilizing a variety of smartphone devices across diverse settings. Experimental results indicate that using log-Gammatone spectrogram and VGG-skip style networks with cross attention can achieve the best performance, where the accuracy, precision, recall and F1-score are 94.59%, 95.84%, 94.64%, and 94.57%, respectively.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"7 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203650","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Deep learning-based algorithm for automated detection of glaucoma on eye fundus images","authors":"Hervé Tampa, Martial Mekongo, Alain Tiedeu","doi":"10.1007/s11042-024-19989-w","DOIUrl":"https://doi.org/10.1007/s11042-024-19989-w","url":null,"abstract":"<p>Projections predict that about one hundred and twelve million people will be affected by glaucoma by 2040. It can be ranked as a serious public health problem, being a significant cause of blindness. However, if detected early, total blindness can be delayed. A computerized analysis of images of the eye fundus can be a tool for early diagnosis of glaucoma. In this paper, we have developed a deep-learning-based algorithm for the automated detection of this condition using images from Origa-light and Origa databases. A total of 1300 images were used in the study. The algorithm consists of two steps, namely processing and classification. The images were processed respectively by blue component extraction, conversion into greyscale images, ellipse fitting, median filtering, sobel filter application and finally binarizing by a simple global thresholding method. The classification was carried out using a modified VGGNet19 (Visual Geometric Group Net 19) powered by transfer learning. The algorithm was tested on 260 images. A sensitivity of 100%, a specificity of 97.69%, an accuracy of 98.84%, an F1 score of 98.85%, and finally an area under the ROC-curve (AUC) of 0.989 were obtained. These values are encouraging and better than those yielded by many state-of-the-art methods.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"12 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141944233","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Histopathology image analysis for gastric cancer detection: a hybrid deep learning and catboost approach","authors":"Danial Khayatian, Alireza Maleki, Hamid Nasiri, Morteza Dorrigiv","doi":"10.1007/s11042-024-19816-2","DOIUrl":"https://doi.org/10.1007/s11042-024-19816-2","url":null,"abstract":"<p>Since gastric cancer is growing fast, accurate and prompt diagnosis is essential, utilizing computer-aided diagnosis (CAD) systems is an efficient way to achieve this goal. Using methods related to computer vision enables more accurate predictions and faster diagnosis, leading to timely treatment. CAD systems can categorize photos effectively using deep learning techniques based on image analysis and classification. Accurate and timely classification of histopathology images is critical for enabling immediate treatment strategies, but remains challenging. We propose a hybrid deep learning and gradient-boosting approach that achieves high accuracy in classifying gastric histopathology images. This approach examines two classifiers for six networks known as pre-trained models to extract features. Extracted features will be fed to the classifiers separately. The inputs are gastric histopathological images. The GasHisSDB dataset provides these inputs containing histopathology gastric images in three 80px, 120px, and 160px cropping sizes. According to these achievements and experiments, we proposed the final method, which combines the EfficientNetV2B0 model to extract features from the images and then classify them using the CatBoost classifier. The results based on the accuracy score are 89.7%, 93.1%, and 93.9% in 80px, 120px, and 160px cropping sizes, respectively. Additional metrics including precision, recall, and F1-scores were above 0.9, demonstrating strong performance across various evaluation criteria. In another way, to approve and see the model efficiency, the GradCAM algorithm was implemented. Visualization via Grad-CAM illustrated discriminative regions identified by the model, confirming focused learning on histologically relevant features. The consistent accuracy and reliable detections across diverse evaluation metrics substantiate the robustness of the proposed deep learning and gradient-boosting approach for gastric cancer screening from histopathology images. For this purpose, two types of outputs (The heat map and the GradCAM output) are provided. Additionally, t-SNE visualization showed a clear clustering of normal and abnormal cases after EfficientNetV2B0 feature extraction. The cross-validation and visualizations provide further evidence of generalizability and focused learning of meaningful pathology features for gastric cancer screening from histopathology images.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"24 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141944238","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Classification of electroencephalograms before or after applying transcutaneous electrical nerve stimulation therapy using fractional empirical mode decomposition","authors":"Jiaqi Liu, Bingo Wing-Kuen Ling, Zhaoheng Zhou, Weirong Wu, Ruilin Li, Qing Liu","doi":"10.1007/s11042-024-19992-1","DOIUrl":"https://doi.org/10.1007/s11042-024-19992-1","url":null,"abstract":"<p>It is worth noting that applying the transcutaneous electrical nerve stimulation (TENS) therapy at the superficial nerve locations can modulate the brain activities. This paper aims to further investigate whether applying the TENS therapy at the superficial nerve locations can improve the attention of the subjects or not when the subjects are playing the mathematical game or reading a technical paper. First, the electroencephalograms (EEGs) are acquired before and after the TENS therapy is applied at the superficial nerve locations. Then, both the EEGs acquired before and after applying the TENS therapy are mixed together. Next, the preprocessing is applied to these acquired EEGs. Second, the fractional empirical mode decomposition (FEMD) is employed for extracting the features. Subsequently, the genetic algorithm (GA) is employed for performing the feature selection to obtain the optimal features. Finally, the support vector machine (SVM) and the random forest (RF) are used to classify whether the EEGs are acquired before or after the TENS therapy is applied. Since the higher classification accuracy refers to the larger difference of the EEGs acquired before and after the TENS therapy is applied, the classification accuracy reflects the effectiveness of applying the TENS therapy for improving the attention of the subjects. It is found that the percentages of the classification accuracies based on the EEGs acquired via the one channel device during playing the online mathematical game via the SVM and the RF by our proposed method are between 78.90% and 98.31% as well as between 78.44% and 100%, respectively. The percentages of the classification accuracies based on the EEGs acquired via the eight channel device during playing the online mathematical game via the SVM and the RF by our proposed method are between 80.84% and 93.63% as well as between 86.83% and 99.09%, respectively. the percentages of the classification accuracies based on the EEGs acquired via the one channel device during reading a technical paper via the SVM and the RF by our proposed method are between 77.67% and 83.67% as well as between 79.61% and 84.69%, respectively. the percentages of the classification accuracies based on the EEGs acquired via the sixteen channel device during reading a technical paper via the SVM and the RF by our proposed method are between 82.30% and 90.02% as well as between 91.72% and 95.91%, respectively. As our proposed method yields a higher classification accuracy than the states of the arts methods, this demonstrates the potential of using our proposed method as a tool for the medical officers to perform the precise clinical diagnoses and make the therapeutic decisions based on TENS.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"40 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141944261","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A secure and privacy-preserving technique based on coupled chaotic system and plaintext encryption for multimodal medical images","authors":"Hongwei Xie, Yuzhou Zhang, Jing Bian, Hao Zhang","doi":"10.1007/s11042-024-19956-5","DOIUrl":"https://doi.org/10.1007/s11042-024-19956-5","url":null,"abstract":"<p>In medical diagnosis, colored and gray medical images contain different pathological features, and the fusion of the two images can help doctors make a more intuitive diagnosis. Fusion medical images contain a large amount of private information, and ensuring their security during transmission is critical. This paper proposes a multi-modal medical image security protection scheme based on coupled chaotic mapping. Firstly, a sequentially coupled chaotic map is proposed using Logistic mapping and Cubic mapping as seed chaotic maps, and its chaotic performance is verified by Lyapunov index analysis, phase diagram attractor distribution analysis, and NIST randomness test. Secondly, combining the process of image encryption with the process of image fusion, a plaintext-associated multimodal medical image hierarchical encryption algorithm is proposed. Finally, a blind watermarking algorithm based on forward Meyer wavelet transform and singular value decomposition is proposed to embed the EMR report into the encrypted channel to realize the mutual authentication of the EMR report and medical image. The experimental results show that compared with the related algorithms, the proposed algorithm has better encryption authentication performance, histogram, and scatter plot are nearly uniform distribution, and the NPCR and UACI of plaintext sensitivity and key sensitivity are close to 99.6094% and 33.4635%, respectively, and has strong robustness to noise attacks and clipping attacks.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"85 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141944236","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alessandra Alaniz Macedo, Vinícius de S. Gonçalves, Patrícia P. Mandrá, Vivian Motti, Renato F. Bulcão-Neto, Kamila Rios da Hora Rodrigues
{"title":"A mobile application and system architecture for online speech training in Portuguese: design, development, and evaluation of SofiaFala","authors":"Alessandra Alaniz Macedo, Vinícius de S. Gonçalves, Patrícia P. Mandrá, Vivian Motti, Renato F. Bulcão-Neto, Kamila Rios da Hora Rodrigues","doi":"10.1007/s11042-024-19980-5","DOIUrl":"https://doi.org/10.1007/s11042-024-19980-5","url":null,"abstract":"<p>Online Speech Therapy (OST) systems are assistive tools that provide online support for training, planning, and executing specific speech sounds. The most traditional OST systems are mobile applications, mainly supporting English and Spanish languages. This paper describes the development of a mobile assistive application for speech training –SofiaFala is freely available and provides support for Portuguese. The app builds upon computational modules that support speech therapy for people with neurodevelopmental disorders, including Down Syndrome. Specifically, the development of SofiaFala was iterative, involving target users actively. Speech-language pathologists as well as parents, caregivers, and children with speech disorders, mostly with Down Syndrome, contributed to the app development. This paper also describes the design of SofiaFala and the app functionalities. Also, we discuss usage workload and findings from an experimental study. In addition to analyzing the related work, we explain how we (i) elicited SofiaFala features; (ii) developed the software architecture of the app, seeking to connect speech-language pathologists’ and patients’ activities (including therapy session planning and home training exercises); (iii) evaluated SofiaFala through a field study involving target users; and (iv) addressed key challenges along the implementation process. SofiaFala provides integrated features that aim to maximize communication effectiveness, enhance language skills among users, and ultimately improve the quality of life of people with speech impairments.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"18 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141944260","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sansiri Tarnpradab, Pavat Poonpinij, Nattawut Na Lumpoon, Naruemon Wattanapongsakorn
{"title":"Real-time masked face recognition and authentication with convolutional neural networks on the web application","authors":"Sansiri Tarnpradab, Pavat Poonpinij, Nattawut Na Lumpoon, Naruemon Wattanapongsakorn","doi":"10.1007/s11042-024-19953-8","DOIUrl":"https://doi.org/10.1007/s11042-024-19953-8","url":null,"abstract":"<p>The COVID-19 outbreak has highlighted the importance of wearing a face mask to prevent virus transmission. During the peak of the pandemic, everyone was required to wear a face mask both inside and outside the building. Nowadays, even though the pandemic has passed, it is still necessary to wear a face mask in some situations/areas. Nevertheless, a face mask becomes a major barrier, especially in places where full-face authentication is required; most facial recognition systems are unable to recognize masked faces accurately, thereby resulting in incorrect predictions. To address this challenge, this study proposes a web-based application system to accomplish three main tasks: (1) recognizing, in real-time, whether an individual entering the location is wearing a face mask; and (2) correctly identifying an individual as a biometric authentication despite facial features obscured by a face mask with varying types, shapes and colors. (3) easily updating the recognition model with the most recent user list, with a user-friendly interface from the real-time web application. The underlying model to perform detection and recognition is convolutional neural networks. In this study, we experimented with VGG16, VGGFace, and InceptionResNetV2. Experimental cases to determine model performance are; using only masked-face images, and using both full-face and masked-face images together. We evaluate the models using performance metrics including accuracy, recall, precision, F1-score, and training time. The results have shown superior performance compared with those from related works. Our best model could reach an accuracy of 93.3%, a recall of 93.8%, and approximately 93-94% for precision and F1-score, when recognizing 50 individuals.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"24 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141944237","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}