Proceedings of the 19th International Conference on Content-based Multimedia Indexing最新文献_第3页

Hyperspectral Image Reconstruction of Heritage Artwork Using RGB Images and Deep Neural Networks 基于RGB图像和深度神经网络的文物艺术品高光谱图像重建

Proceedings of the 19th International Conference on Content-based Multimedia Indexing Pub Date : 2022-09-14 DOI: 10.1145/3549555.3549583

Ailin Chen, R. Jesus, M. Vilarigues

引用次数: 0

Skin Cancer Detection using Ensemble Learning and Grouping of Deep Models 基于深度模型集成学习和分组的皮肤癌检测

Proceedings of the 19th International Conference on Content-based Multimedia Indexing Pub Date : 2022-09-14 DOI: 10.1145/3549555.3549584

Takfarines Guergueb, M. Akhloufi

引用次数: 3

Wildfire Segmentation using Deep-RegSeg Semantic Segmentation Architecture 基于Deep-RegSeg语义分割架构的野火分割

Proceedings of the 19th International Conference on Content-based Multimedia Indexing Pub Date : 2022-09-14 DOI: 10.1145/3549555.3549586

Rafik Ghali, M. Akhloufi, Wided Souidène Mseddi, Marwa Jmal

引用次数: 3

Analysis of the Complementarity of Latent and Concept Spaces for Cross-Modal Video Search 跨模态视频搜索中潜在空间和概念空间的互补性分析

Proceedings of the 19th International Conference on Content-based Multimedia Indexing Pub Date : 2022-09-14 DOI: 10.1145/3549555.3549600

Varsha Devi, P. Mulhem, G. Quénot

引用次数: 0

Learning to Detect Fallen People in Virtual Worlds 学习在虚拟世界中发现堕落的人

Proceedings of the 19th International Conference on Content-based Multimedia Indexing Pub Date : 2022-09-14 DOI: 10.1145/3549555.3549573

F. Carrara, Lorenzo Pasco, C. Gennaro, F. Falchi

{"title":"Learning to Detect Fallen People in Virtual Worlds","authors":"F. Carrara, Lorenzo Pasco, C. Gennaro, F. Falchi","doi":"10.1145/3549555.3549573","DOIUrl":"https://doi.org/10.1145/3549555.3549573","url":null,"abstract":"Falling is one of the most common causes of injury in all ages, especially in the elderly, where it is more frequent and severe. For this reason, a tool that can detect a fall in real time can be helpful in ensuring appropriate intervention and avoiding more serious damage. Some approaches available in the literature use sensors, wearable devices, or cameras with special features such as thermal or depth sensors. In this paper, we propose a Computer Vision deep-learning based approach for human fall detection based on largely available standard RGB cameras. A typical limitation of this kind of approaches is the lack of generalization to unseen environments. This is due to the error generated during human detection and, more generally, due to the unavailability of large-scale datasets that specialize in fall detection problems with different environments and fall types. In this work, we mitigate these limitations with a general-purpose object detector trained using a virtual world dataset in addition to real-world images. Through extensive experimental evaluation, we verified that by training our models on synthetic images as well, we were able to improve their ability to generalize. Code to reproduce results is available at https://github.com/lorepas/fallen-people-detection.","PeriodicalId":191591,"journal":{"name":"Proceedings of the 19th International Conference on Content-based Multimedia Indexing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131080096","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Learning Co-occurrence Features Across Spatial and Temporal Domains for Hand Gesture Recognition 手势识别的时空共现特征学习

Proceedings of the 19th International Conference on Content-based Multimedia Indexing Pub Date : 2022-09-14 DOI: 10.1145/3549555.3549591

Mohammad Rehan, H. Wannous, Jafar Alkheir, Kinda Aboukassem

{"title":"Learning Co-occurrence Features Across Spatial and Temporal Domains for Hand Gesture Recognition","authors":"Mohammad Rehan, H. Wannous, Jafar Alkheir, Kinda Aboukassem","doi":"10.1145/3549555.3549591","DOIUrl":"https://doi.org/10.1145/3549555.3549591","url":null,"abstract":"Hand gesture is the most natural modality for human-machine interaction and its recognition can be considered one of the most complicated and interesting challenges for computer vision community. In recent years, there has been a noticeable advancement in the field of machine learning and computer vision. However, providing a hand gesture recognition system robust enough to work in real-time applications remains challenging. Dynamic hand gestures can be seen as variations in shape or movement during hand motion and often both together. To tackle these challenges, we propose a dynamic hand gesture recognition approach based on hand skeletal sequences. In particular, we introduce a simple but effective deep network architecture to deal with Spatio-temporal co-occurrence features computed on 3D coordinates of hand joints along the gesture sequence. Experimental results show that our approach outperforms state-of-the-art methods on two public datasets, First Person Hand Action and SHREC’2017, with an efficient time computational model compared to most existing approaches.","PeriodicalId":191591,"journal":{"name":"Proceedings of the 19th International Conference on Content-based Multimedia Indexing","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125718282","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Toolchain for Extracting and Visualising Road Traffic Data 道路交通数据提取和可视化工具链

Proceedings of the 19th International Conference on Content-based Multimedia Indexing Pub Date : 2022-09-14 DOI: 10.1145/3549555.3549580

H. Neuschmied, Florian Krebs, Stefan Ladstätter, Elisabeth Eder, Mohamed Redouane Berrazouane, G. Thallinger

引用次数: 0

A Virtual Reality Talking Avatar for Investigative Interviews of Maltreat Children 用于虐待儿童调查访谈的虚拟现实说话化身

Proceedings of the 19th International Conference on Content-based Multimedia Indexing Pub Date : 2022-09-14 DOI: 10.1145/3549555.3549572

Syed Zohaib Hassan, Pegah Salehi, M. Riegler, M. Johnson, G. Baugerud, Pål Halvorsen, S. Sabet

{"title":"A Virtual Reality Talking Avatar for Investigative Interviews of Maltreat Children","authors":"Syed Zohaib Hassan, Pegah Salehi, M. Riegler, M. Johnson, G. Baugerud, Pål Halvorsen, S. Sabet","doi":"10.1145/3549555.3549572","DOIUrl":"https://doi.org/10.1145/3549555.3549572","url":null,"abstract":"Interviews conducted with the maltreated children are often the primary source of evidence in prosecution. Many alleged incidents of abuse are not prosecuted because the children’s testimony is collected in an unreliable way. Research shows the consistent poor quality of these interviews and highlights the need for better training of Child Protection Services (CPS) and police personnel who interview abused child witnesses. The currently available systems for training of CPS and police personnel are developed in a rigid way that lag behind in generating dynamic responses. Moreover, these systems require human input such as employing an actor mimicking a child or an operator controlling prerecorded child responses during the interactions. This paper demonstrates the prototype of an interview training program with an artificial intelligent Child Avatar in Virtual Reality (VR), enabling CPS and police personnel to practice interviewing with abused children. The program is developed using Unity game engine and artificial intelligence-based technologies such as dialogue models, talking visual avatars, text-to-speech, and speech-to-text components.","PeriodicalId":191591,"journal":{"name":"Proceedings of the 19th International Conference on Content-based Multimedia Indexing","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115301188","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Hybrid Transformer Network for Deepfake Detection 用于深度伪造检测的混合变压器网络

Proceedings of the 19th International Conference on Content-based Multimedia Indexing Pub Date : 2022-08-11 DOI: 10.1145/3549555.3549588

Sohail Ahmed Khan, Duc-Tien Dang-Nguyen

{"title":"Hybrid Transformer Network for Deepfake Detection","authors":"Sohail Ahmed Khan, Duc-Tien Dang-Nguyen","doi":"10.1145/3549555.3549588","DOIUrl":"https://doi.org/10.1145/3549555.3549588","url":null,"abstract":"Deepfake media is becoming widespread nowadays because of the easily available tools and mobile apps which can generate realistic looking deepfake videos/images without requiring any technical knowledge. With further advances in this field of technology in the near future, the quantity and quality of deepfake media is also expected to flourish, while making deepfake media a likely new practical tool to spread mis/disinformation. Because of these concerns, the deepfake media detection tools are becoming a necessity. In this study, we propose a novel hybrid transformer network utilizing early feature fusion strategy for deepfake video detection. Our model employs two different CNN networks, i.e., (1) XceptionNet and (2) EfficientNet-B4 as feature extractors. We train both feature extractors along with the transformer in an end-to-end manner on FaceForensics++, DFDC benchmarks. Our model, while having relatively straightforward architecture, achieves comparable results to other more advanced state-of-the-art approaches when evaluated on FaceForensics++ and DFDC benchmarks. Besides this, we also propose novel face cut-out augmentations, as well as random cut-out augmentations. We show that the proposed augmentations improve the detection performance of our model and reduce overfitting. In addition to that, we show that our model is capable of learning from considerably small amount of data.","PeriodicalId":191591,"journal":{"name":"Proceedings of the 19th International Conference on Content-based Multimedia Indexing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116174730","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

Analysing the Memorability of a Procedural Crime-Drama TV Series, CSI 程序犯罪剧《犯罪现场调查》的记忆性分析

Proceedings of the 19th International Conference on Content-based Multimedia Indexing Pub Date : 2022-08-06 DOI: 10.1145/3549555.3549592

Sean Cummins, Lorin Sweeney, A. Smeaton

引用次数: 1