Advances in Multimedia最新文献

筛选

英文中文

Video Abnormal Action Recognition Based on Multimodal Heterogeneous Transfer Learning 基于多模态异质迁移学习的视频异常动作识别

Advances in Multimedia Pub Date : 2024-01-19 DOI: 10.1155/2024/4187991

Hong-Bo Huang, Yao-Lin Zheng, Zhi-Ying Hu

{"title":"Video Abnormal Action Recognition Based on Multimodal Heterogeneous Transfer Learning","authors":"Hong-Bo Huang, Yao-Lin Zheng, Zhi-Ying Hu","doi":"10.1155/2024/4187991","DOIUrl":"https://doi.org/10.1155/2024/4187991","url":null,"abstract":"Human abnormal action recognition is crucial for video understanding and intelligent surveillance. However, the scarcity of labeled data for abnormal human actions often hinders the development of high-performance models. Inspired by the multimodal approach, this paper proposes a novel approach that leverages text descriptions associated with abnormal human action videos. Our method exploits the correlation between the text domain and the video domain in the semantic feature space and introduces a multimodal heterogeneous transfer learning framework from the text domain to the video domain. The text of the videos is used for feature encoding and knowledge extraction, and knowledge transfer and sharing are realized in the feature space, which is used to assist in the training of the abnormal action recognition model. The proposed method reduces the reliance on labeled video data, improves the performance of the abnormal human action recognition algorithm, and outperforms the popular video-based models, particularly in scenarios with sparse data. Moreover, our framework contributes to the advancement of automatic video analysis and abnormal action recognition, providing insights for the application of multimodal methods in a broader context.","PeriodicalId":503869,"journal":{"name":"Advances in Multimedia","volume":"10 4","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139525341","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Design of 3D Environment Combining Digital Image Processing Technology and Convolutional Neural Network 结合数字图像处理技术和卷积神经网络的 3D 环境设计

Advances in Multimedia Pub Date : 2024-01-12 DOI: 10.1155/2024/5528497

Xiaofei Lu, Shouwang Li

{"title":"Design of 3D Environment Combining Digital Image Processing Technology and Convolutional Neural Network","authors":"Xiaofei Lu, Shouwang Li","doi":"10.1155/2024/5528497","DOIUrl":"https://doi.org/10.1155/2024/5528497","url":null,"abstract":"As virtual reality technology advances, 3D environment design and modeling have garnered increasing attention. Applications in networked virtual environments span urban planning, industrial design, and manufacturing, among other fields. However, existing 3D modeling methods exhibit high reconstruction error precision, limiting their practicality in many domains, particularly environmental design. To enhance 3D reconstruction accuracy, this study proposes a digital image processing technology that combines binocular camera calibration, stereo correction, and a convolutional neural network (CNN) algorithm for optimization and improvement. By employing the refined stereo-matching algorithm, a 3D reconstruction model was developed to augment 3D environment design and reconstruction accuracy while optimizing the 3D reconstruction effect. An experiment using the ShapeNet dataset demonstrated that the evaluation indices—Chamfer distance (CD), Earth mover’s distance (EMD), and intersection over union—of the model constructed in this study outperformed those of alternative methods. After incorporating the CNN module in the ablation experiment, CD and EMD increased by an average of 0.1 and 0.06, respectively. This validates that the proposed CNN module effectively enhances point cloud reconstruction accuracy. Upon adding the CNN module, the CD index and EMD index in the dataset increased by an average of 0.34 and 0.54, respectively. These results indicate that the proposed CNN module exhibits strong predictive capabilities for point cloud coordinates. Furthermore, the model demonstrates good generalization performance.","PeriodicalId":503869,"journal":{"name":"Advances in Multimedia","volume":"56 40","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139532906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0