Multimedia Tools and Applications最新文献

筛选
英文 中文
Negotiation strategies in ubiquitous human-computer interaction: a novel storyboards scale & field study 无处不在的人机交互中的谈判策略:一种新颖的故事板规模和实地研究
IF 3.6 4区 计算机科学
Multimedia Tools and Applications Pub Date : 2024-09-19 DOI: 10.1007/s11042-024-20240-9
Sofia Yfantidou, Georgia Yfantidou, Panagiota Balaska, Athena Vakali
{"title":"Negotiation strategies in ubiquitous human-computer interaction: a novel storyboards scale & field study","authors":"Sofia Yfantidou, Georgia Yfantidou, Panagiota Balaska, Athena Vakali","doi":"10.1007/s11042-024-20240-9","DOIUrl":"https://doi.org/10.1007/s11042-024-20240-9","url":null,"abstract":"<p>In today’s connected society, self-tracking technologies (STTs), such as wearables and mobile fitness apps, empower humans to improve their health and well-being through ubiquitous physical activity monitoring, with several personal and societal benefits. Despite the advances in such technologies’ hardware, low user engagement and decreased effectiveness limitations demand more informed and theoretically-founded Human-Computer Interaction designs. To address these challenges, we build upon the previously unexplored Leisure Constraints Negotiation Model and the Transtheoretical Model to systematically define and assess the effectiveness of STTs’ features that acknowledge users’ contextual constraints and establish human-negotiated STTs narratives. Specifically, we introduce and validate a human-centric scale, StoryWear, which exploits and explores eleven dimensions of negotiation strategies that humans utilize to overcome constraints regarding exercise participation, captured through an inclusive storyboards format. Based on our preliminary studies, StoryWear shows high reliability, rendering it suitable for future work in ubiquitous computing. Our results indicate that negotiation strategies vary in perceived effectiveness and have higher appeal for existing STTs’ users, with self-motivation, commitment, and understanding of the negative impact of non-exercise placed at the top. Finally, we give actionable guidelines for real-world implementation and a commentary on the future of personalized training.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2024-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266303","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Unified pre-training with pseudo infrared images for visible-infrared person re-identification 利用伪红外图像进行统一预训练,实现可见光-红外人员再识别
IF 3.6 4区 计算机科学
Multimedia Tools and Applications Pub Date : 2024-09-19 DOI: 10.1007/s11042-024-20217-8
ZhiGang Liu, Yan Hu
{"title":"Unified pre-training with pseudo infrared images for visible-infrared person re-identification","authors":"ZhiGang Liu, Yan Hu","doi":"10.1007/s11042-024-20217-8","DOIUrl":"https://doi.org/10.1007/s11042-024-20217-8","url":null,"abstract":"<p>In the pre-training task of visible-infrared person re-identification(VI-ReID), two main challenges arise: i) Domain disparities. A significant domain gap exists between the ImageNet utilized in public pre-trained models and the specific person data in the VI-ReID task. ii) Insufficient sample. Due to the challenge of gathering cross-modal paired samples, there is currently a scarcity of large-scale datasets suitable for pretraininge. To address the aforementioned issues, we propose a new unified pre-training framework (UPPI). Firstly, we established a large-scale visible-pseudo infrared paired sample repository (UnitCP) based on the existing visible person dataset, encompassing nearly 170,000 sample pairs. Benefiting from this repository, not only are training samples significantly expanded, but pre-training on this foundation also effectively bridges the domain disparities. Simultaneously, to fully harness the potential of the repository, we devised an innovative feature fusion mechanism(CF<span>(^2)</span>) during pre-training. It leverages redundant features present in the paired images to steer the model towards cross-modal feature fusion. In addition, during fine-tuning, to adapt the model to datasets lacking paired images, we introduced a center contrast loss(C<span>(^2)</span>). This loss guides the model to prioritize cross-modal features with consistent identities. Extensive experimental results on two standard benchmarks (SYSU-MM01 and RegDB) demonstrate that the proposed UPPI performs favorably against state-of-the-art methods.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2024-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266302","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hybrid golden jackal fusion based recommendation system for spatio-temporal transportation's optimal traffic congestion and road condition classification 基于混合金豺融合的时空交通最佳交通拥堵和路况分类推荐系统
IF 3.6 4区 计算机科学
Multimedia Tools and Applications Pub Date : 2024-09-19 DOI: 10.1007/s11042-024-20133-x
Tukaram K. Gawali, Shailesh S. Deore
{"title":"Hybrid golden jackal fusion based recommendation system for spatio-temporal transportation's optimal traffic congestion and road condition classification","authors":"Tukaram K. Gawali, Shailesh S. Deore","doi":"10.1007/s11042-024-20133-x","DOIUrl":"https://doi.org/10.1007/s11042-024-20133-x","url":null,"abstract":"<p>Traffic congestion, influenced by varying traffic density levels, remains a critical challenge in transportation management, significantly impacting efficiency and safety. This research addresses these challenges by proposing an Enhanced Hybrid Golden Jackal (EGJ) fusion-based recommendation system for optimal traffic congestion and road condition categorization. In the first phase, road vehicle images are processed using Enhanced Geodesic Filtering (EGF) to classify traffic density as heterogeneous or homogeneous across heavy, medium and light flows using Enhanced Consolidated Convolutional Neural Network (ECNN). Simultaneously, text data from road safety datasets undergo preprocessing through crisp data conversion, splitting and normalization techniques. This data is then categorized into weather conditions, speed, highway conditions, rural/urban settings and light conditions using Adaptive Drop Block Enhanced Generative Adversarial Networks (ADGAN). In the third phase, the EGJ fusion method integrates outputs from ECNN and ADGAN classifiers to enhance classification accuracy and robustness. The proposed approach addresses challenges like accurately assessing traffic density variations and optimizing traffic flow in historical pattern scenarios. The simulation outcomes establish the efficiency of the EGJ fusion-based system, achieving significant performance metrics. Specifically, the system achieves 98% accuracy, 99.1% precision and 98.2% F1-Score in traffic density and road condition classification tasks. Additionally, error performance like mean absolute error of 0.043, root mean square error of 0.05 and mean absolute percentage error of 0.148 further validate the robustness and accuracy of the introduced approach.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2024-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266205","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Identification and location monitoring through Live video Streaming by using blockchain 利用区块链通过实时视频流进行身份识别和位置监控
IF 3.6 4区 计算机科学
Multimedia Tools and Applications Pub Date : 2024-09-19 DOI: 10.1007/s11042-024-20197-9
Sana Zeba, Mohammad Amjad
{"title":"Identification and location monitoring through Live video Streaming by using blockchain","authors":"Sana Zeba, Mohammad Amjad","doi":"10.1007/s11042-024-20197-9","DOIUrl":"https://doi.org/10.1007/s11042-024-20197-9","url":null,"abstract":"<p>Surveillance through video surveillance is the basis for the increasing demand for security. Users who are capable can manipulate video images, timestamps, and camera settings digitally; they can also physically manipulate camera locations, orientation, and mechanical settings. Advanced video manipulation techniques can easily alter cameras and videos, which are essential for criminal investigations. To ensure security, it is necessary to increase the level of security for the camera and video data. Blockchain technology has gained a lot of attention in the last decade due to its ability to create trust between users without the use of third-party intermediaries, which allows for many applications. Our goal is to create a CCTV camera system that utilizes blockchain technology to guarantee the reliability of video or image data. The truthfulness of stored data can be confirmed by authorities using blockchain technology, which enables data creation and storage in a distributed manner. The workflow of tracking and blockchain storage to secure data was discussed for security purposes. Develop an algorithm that synchronizes all updated criminal records of all users with IoT devices. Our final step involved calculating the accuracy of tracking the recognized face in diverse datasets with different resolutions and assessing the efficiency of the location being tracked. The accuracy of recognition has changed depending on the resolution. Low-resolution datasets have more accuracy than high-resolution datasets. According to the analysis, the system's average accuracy is 98.5%, and its tracking efficiency is 99%. In addition, smart devices in various locations can take actions on specific individuals according to the distributed blockchain server storage.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2024-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266304","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deep-Dixon: Deep-Learning frameworks for fusion of MR T1 images for fat and water extraction Deep-Dixon:用于融合 MR T1 图像以提取脂肪和水分的深度学习框架
IF 3.6 4区 计算机科学
Multimedia Tools and Applications Pub Date : 2024-09-19 DOI: 10.1007/s11042-024-20255-2
Snehal V. Laddha, Rohini S. Ochawar, Krushna Gandhi, Yu-Dong Zhang
{"title":"Deep-Dixon: Deep-Learning frameworks for fusion of MR T1 images for fat and water extraction","authors":"Snehal V. Laddha, Rohini S. Ochawar, Krushna Gandhi, Yu-Dong Zhang","doi":"10.1007/s11042-024-20255-2","DOIUrl":"https://doi.org/10.1007/s11042-024-20255-2","url":null,"abstract":"<p>Medical image fusion plays a crucial role in understanding the necessity of medical procedures and it also assists radiologists in decision-making for surgical operations. Dixon has mathematically described a fat suppression technique that differentiates between fat and water signals by utilizing in-phase and out-of-phase MR imaging. The fusion of MR T1 images can be performed by adding or subtracting in-phase and out-phase images, respectively. The dataset used in this study was collected from the CHAOS grand challenge, comprising DICOM data sets from two different MRI sequences (T1 in-phase and out-phase). Our methodology involved training of deep learning models; VGG 19 and RESNET18 to extract features from this dataset to implement the Dixon technique, effectively separating the water and fat components. Using VGG19 and ResNet18 models, we were able to accomplish the image fusion accuracy for water-only images with EN as high as 5.70, 4.72, MI as 2.26, 2.21; SSIM as 0.97, 0.81; Qabf as 0.73, 0.72; Nabf as low as 0.18, 0.19 using VGG19 and ResNet18 models respectively. For fat-only images we have achieved EN as 4.17, 4.06; MI as 0.80, 0.77; SSIM as 0.45, 0.39; Qabf as 0.53, 0.48; Nabf as low as 0.22, 0.27. The experimental findings demonstrated the superior performance of our proposed method in terms of the enhanced accuracy and visual quality of water-only and fat-only images using several quantitative assessment parameters over other models experimented by various researchers. Our models are the stand-alone models for the implementation of the Dixon methodology using deep learning techniques. This model has experienced an improvement of 0.62 in EN, and 0.29 in Qabf compared to existing fusion models for different image modalities. Also, it can better assist radiologists in identifying tissues and blood vessels of abdominal organs that are rich in protein and understanding the fat content in lesions.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2024-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266301","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MeVs-deep CNN: optimized deep learning model for efficient lung cancer classification MeVs-deep CNN:用于高效肺癌分类的优化深度学习模型
IF 3.6 4区 计算机科学
Multimedia Tools and Applications Pub Date : 2024-09-19 DOI: 10.1007/s11042-024-20230-x
Ranjana M. Sewatkar, Asnath Victy Phamila Y
{"title":"MeVs-deep CNN: optimized deep learning model for efficient lung cancer classification","authors":"Ranjana M. Sewatkar, Asnath Victy Phamila Y","doi":"10.1007/s11042-024-20230-x","DOIUrl":"https://doi.org/10.1007/s11042-024-20230-x","url":null,"abstract":"<p>Lung cancer is a dangerous condition that impacts many people. The type and location of cancer are critical factors in determining the appropriate medical treatment. Early identification of cancer cells can save numerous lives, making the development of automated detection techniques essential. Although many methods have been proposed by researchers over the years, achieving high prediction accuracy remains a persistent challenge. Addressing this issue, this research employs Memory-Enabled Vulture Search Optimization based on Deep Convolutional Neural Networks (MeVs-deep CNN) to develop an autonomous, accurate lung cancer categorization system. The data is initially gathered from the PET/CT dataset and preprocessed using the Non-Local Means (NL-Means) approach. The proposed MeVs optimization approach is then used to segment the data. The feature extraction process incorporates statistical, texture, and intensity-based features and Resnet-101-based features, resulting in the creation of the final feature vector for cancer classification and the multi-level standardized convolutional fusion model. Subsequently, the MeVs-deep CNN leverages the MeVs optimization technique to automatically classify lung cancer. The key contribution of the research is the MeVs optimization, which effectively adjusts the classifier's parameters using the fitness function. The output is evaluated using metrics such as accuracy, sensitivity, specificity, AUC, and loss function. The efficiency of the MeVs-deep CNN is demonstrated through these metrics, achieving values of 97.08%, 97.93%, 96.42%, 95.88%, and 2.92% for training phase; 95.78%, 95.34%, 96.42%, 93.48%, and 4.22% for testing percentage; 96.33%, 95.20%, 97.65%, 94.83%, and 3.67% for k-fold train data; and 94.16%, 95.20%, 93.30%, 91.66%, and 5.84% for k-fold test data. These results demonstrate the effectiveness of the research.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2024-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Text-driven clothed human image synthesis with 3D human model estimation for assistance in shopping 通过三维人体模型估算进行文字驱动的着装人体图像合成,用于购物辅助
IF 3.6 4区 计算机科学
Multimedia Tools and Applications Pub Date : 2024-09-19 DOI: 10.1007/s11042-024-20187-x
S. Karkuzhali, A. Syed Aasim, A. StalinRaj
{"title":"Text-driven clothed human image synthesis with 3D human model estimation for assistance in shopping","authors":"S. Karkuzhali, A. Syed Aasim, A. StalinRaj","doi":"10.1007/s11042-024-20187-x","DOIUrl":"https://doi.org/10.1007/s11042-024-20187-x","url":null,"abstract":"&lt;p&gt;Online shopping has become an integral part of modern consumer culture. Yet, it is plagued by challenges in visualizing clothing items based on textual descriptions and estimating their fit on individual body types. In this work, we present an innovative solution to address these challenges through text-driven clothed human image synthesis with 3D human model estimation, leveraging the power of Vector Quantized Variational AutoEncoder (VQ-VAE). Creating diverse and high-quality human images is a crucial yet difficult undertaking in vision and graphics. With the wide variety of clothing designs and textures, existing generative models are often not sufficient for the end user. In this proposed work, we introduce a solution that is provided by various datasets passed through several models so the optimized solution can be provided along with high-quality images with a range of postures. We use two distinct procedures to create full-body 2D human photographs starting from a predetermined human posture. 1) The provided human pose is first converted to a human parsing map with some sentences that describe the shapes of clothing. 2) The model developed is then given further information about the textures of clothing as an input to produce the final human image. The model is split into two different sections the first one being a codebook at a coarse level that deals with overall results and a fine-level codebook that deals with minute detailing. As mentioned previously at fine level concentrates on the minutiae of textures, whereas the codebook at the coarse level covers the depictions of textures in structures. The decoder trained together with hierarchical codebooks converts the anticipated indices at various levels to human images. The created image can be dependent on the fine-grained text input thanks to the utilization of a blend of experts. The quality of clothing textures is refined by the forecast for finer-level indexes. Implementing these strategies can result in more diversified and high-quality human images than state-of-the-art procedures, according to numerous quantitative and qualitative evaluations. These generated photographs will be converted into a 3D model, resulting in several postures and outcomes, or you may just make a 3D model from a dataset that produces a variety of stances. The application of the PIFu method uses the Marching cube algorithm and Stacked Hourglass method to produce 3D models and realistic images respectively. This results in the generation of high-resolution images based on textual description and reconstruction of the generated images as 3D models. The inception score and Fréchet Intercept Distance, SSIM, and PSNR that was achieved was 1.64 ± 0.20 and 24.64527782349843, 0.642919520, and 32.87157744102002 respectively. The implemented method scores well in comparison with other techniques. This technology holds immense promise for reshaping the e-commerce landscape, offering a more immersive and informativ","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2024-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266204","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Laplacian nonlinear logistic stepwise and gravitational deep neural classification for facial expression recognition 用于面部表情识别的拉普拉斯非线性逻辑逐步分类和引力深度神经分类
IF 3.6 4区 计算机科学
Multimedia Tools and Applications Pub Date : 2024-09-18 DOI: 10.1007/s11042-024-20079-0
Binthu Kumari M, Sivagami B
{"title":"Laplacian nonlinear logistic stepwise and gravitational deep neural classification for facial expression recognition","authors":"Binthu Kumari M, Sivagami B","doi":"10.1007/s11042-024-20079-0","DOIUrl":"https://doi.org/10.1007/s11042-024-20079-0","url":null,"abstract":"<p>Facial expression recognition is the paramount segment of non-verbal communication and one frequent procedure of human communication. However, different facial expressions and attaining accuracy remain major issues to be focused on. Laplacian Non-linear Logistic Regression and Gravitational Deep Learning (LNLR-GDL) for facial expression recognition is proposed to select righteous features from face image data, via feature selection to achieve high performance at minimum time. The proposed method is split into three sections, namely, preprocessing, feature selection, and classification. In the first section, preprocessing is conducted with the face recognition dataset where noise-reduced preprocessed face images are obtained by employing the Unsharp Masking Laplacian Non-linear Filter model. Second with the preprocessed face images, computationally efficient relevant features are selected using a Logistic Stepwise Regression-based feature selection model. Finally, the Gravitational Deep Neural Classification model is applied to the selected features for robust recognition of facial expressions. The proposed method is compared with existing methods using three evaluation metrics namely, facial expression recognition accuracy, facial expression recognition time, and PSNR. The obtained results demonstrate that the proposed LNLR-GDL method outperforms the state-of-the-art methods.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266310","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Advancements in automated sperm morphology analysis: a deep learning approach with comprehensive classification and model evaluation 精子形态自动分析的进展:一种具有综合分类和模型评估功能的深度学习方法
IF 3.6 4区 计算机科学
Multimedia Tools and Applications Pub Date : 2024-09-18 DOI: 10.1007/s11042-024-20188-w
Rania Maalej, Olfa Abdelkefi, Salima Daoud
{"title":"Advancements in automated sperm morphology analysis: a deep learning approach with comprehensive classification and model evaluation","authors":"Rania Maalej, Olfa Abdelkefi, Salima Daoud","doi":"10.1007/s11042-024-20188-w","DOIUrl":"https://doi.org/10.1007/s11042-024-20188-w","url":null,"abstract":"<p>Automated sperm morphology analysis is crucial in reproductive medicine for assessing male fertility, but existing methods often lack robustness in handling diverse morphological abnormalities across different regions of sperm. This study proposes a deep learning-based approach utilizing the ResNet50 architecture trained on a new SMD/MSS benchmarked dataset, which includes comprehensive annotations of 12 morphological defects across head, midpiece, and tail regions of sperm. Our approach achieved promising results with an accuracy of 95%, demonstrating effective classification across various sperm morphology classes. However, certain classes exhibited lower precision and recall rates, highlighting challenges in model performance for specific abnormalities. The findings underscore the potential of our proposed system in enhancing sperm morphology assessment. In fact, it is the first to comprehensively diagnose a spermatozoon by examining each part, including the head, intermediate piece, and tail, by identifying the type of anomaly in each part according to David's classification, which includes 12 different anomalies, to perform multi-label classification for a more precise diagnosis. It is unlike SOTA works which either study only the head or simply indicate whether each part of the sperm is normal or abnormal.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266305","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multimodal emotion recognition based on a fusion of audiovisual information with temporal dynamics 基于视听信息与时间动态融合的多模态情感识别
IF 3.6 4区 计算机科学
Multimedia Tools and Applications Pub Date : 2024-09-18 DOI: 10.1007/s11042-024-20227-6
José Salas-Cáceres, Javier Lorenzo-Navarro, David Freire-Obregón, Modesto Castrillón-Santana
{"title":"Multimodal emotion recognition based on a fusion of audiovisual information with temporal dynamics","authors":"José Salas-Cáceres, Javier Lorenzo-Navarro, David Freire-Obregón, Modesto Castrillón-Santana","doi":"10.1007/s11042-024-20227-6","DOIUrl":"https://doi.org/10.1007/s11042-024-20227-6","url":null,"abstract":"<p>In the Human-Machine Interactions (HMI) landscape, understanding user emotions is pivotal for elevating user experiences. This paper explores Facial Expression Recognition (FER) within HMI, employing a distinctive multimodal approach that integrates visual and auditory information. Recognizing the dynamic nature of HMI, where situations evolve, this study emphasizes continuous emotion analysis. This work assesses various fusion strategies that involve the addition to the main network of different architectures, such as autoencoders (AE) or an Embracement module, to combine the information of multiple biometric cues. In addition to the multimodal approach, this paper introduces a new architecture that prioritizes temporal dynamics by incorporating Long Short-Term Memory (LSTM) networks. The final proposal, which integrates different multimodal approaches with the temporal focus capabilities of the LSTM architecture, was tested across three public datasets: RAVDESS, SAVEE, and CREMA-D. It showcased state-of-the-art accuracy of 88.11%, 86.75%, and 80.27%, respectively, and outperformed other existing approaches.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266314","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信