Raghav S K, Jahnavi A B, Vivek S D, Kirtan T S, P. Agarwal
{"title":"Detail-Preserving Video-based Virtual Try-On (DPV-VTON)","authors":"Raghav S K, Jahnavi A B, Vivek S D, Kirtan T S, P. Agarwal","doi":"10.1145/3599589.3599599","DOIUrl":"https://doi.org/10.1145/3599589.3599599","url":null,"abstract":"Virtual Try-on systems enable the try-on of a desired clothing on a target person image. These systems have led to vast research and have attracted commercial interest. However, the existing techniques are image-based systems limited to using an in-shop target clothing from a pre-defined dataset. To address this, we propose a video-based virtual try-on network DPV-VTON, that simulates the try-on using the target cloth extracted from the fashion videos on a target person image, while preserving the details and the characteristics. The core of the DPV-VTON pipeline is made up of (i) Best Frame Selection (BFS) module that extracts the best frame from the video (ii) Clothing Extraction module (CEM) extracts the target clothing from the selected best frame and generates a binary mask. (iii) A virtual try-on module synthesizes a final virtual try-on. Experiments on the existing benchmark datasets and a curated video dataset demonstrate that DPV-VTON generates photo-realistic and visually promising results. The proposed model obtains the lowest FID, LPIPS and the highest SSIM scores compared to the existing systems.","PeriodicalId":123753,"journal":{"name":"Proceedings of the 2023 8th International Conference on Multimedia and Image Processing","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122090255","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Study of intracranial haematoma localisation based on improved RetinaNet","authors":"Junyuan Cheng, Kai Gao, Lixiang Zhou","doi":"10.1145/3599589.3599601","DOIUrl":"https://doi.org/10.1145/3599589.3599601","url":null,"abstract":"Intracranial haemorrhage is described as bleeding within the skull. It is a serious cranio-cerebral disorder recognized for its high mortality and lethality rate, which usually requires urgent follow-up diagnosis and determination of the location and subtype of intracranial hemorrhagic lesions.In this study, we experimented with multiple available deep learning architectures to localize the location of hemorrhagic lesions after traumatic brain injury (ICH). To improve the probability of successful patient resuscitation. In this paper, we propose an improved model based on RetinaNet. The accuracy problem of lesion localisation is not effeactively addressed due to the complex structure of the lesion location in intracranial haemorrhage and the large variation in the morphology of the lesion for different subtypes. To address these problems, the paper then proceeds to optimise the original RetinaNet model in terms of its feature extraction network structure, training techniques and Anchor settings. Through comparison experiments, it can be found that the improved model is better than the three target detection models, Faster R-CNN, RetinaNet and YOLOv4.","PeriodicalId":123753,"journal":{"name":"Proceedings of the 2023 8th International Conference on Multimedia and Image Processing","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127121145","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shiyan Zheng, Haohan Zhang, H. Zhao, Bowei Zhang, Yu Zhou, Chuang Han
{"title":"Design and Implementation of Medical Ultrasound Image Processing System based on MATLAB GUI","authors":"Shiyan Zheng, Haohan Zhang, H. Zhao, Bowei Zhang, Yu Zhou, Chuang Han","doi":"10.1145/3599589.3599600","DOIUrl":"https://doi.org/10.1145/3599589.3599600","url":null,"abstract":"Due to the physical characteristics of ultrasonic imaging, there are many factors in the process of imaging, which lead to the low quality of imaging images. There may be artifacts, noise interference, unclear edge contour of diseased tissue, and other problems. This paper designs and implements an image processing system for medical ultrasound images based on MATLAB GUI. The system realizes the functions of image enhancement, image segmentation, image filtering, edge detection, and morphological processing of medical ultrasound images. Through the detection of breast duct ultrasound images, the noise interference is greatly reduced in the processed ultrasound images compared with the original images. In addition, there is an obvious highlighting effect on the ultrasound images of some typical lesions, which makes the detailed information of the images more obvious and the boundaries of the lesions clearer. The processed images were compared with the original images by subjective evaluation. The evaluation results of professional doctors all show that the treatment method in this paper can greatly improve the readability of medical ultrasound images.","PeriodicalId":123753,"journal":{"name":"Proceedings of the 2023 8th International Conference on Multimedia and Image Processing","volume":"159 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114911100","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yuzhu Song, Li Liu, Huaxiang Zhang, Dongmei Liu, Hongzhen Li
{"title":"Arbitrary Style Transfer with Multiple Self-Attention","authors":"Yuzhu Song, Li Liu, Huaxiang Zhang, Dongmei Liu, Hongzhen Li","doi":"10.1145/3599589.3599605","DOIUrl":"https://doi.org/10.1145/3599589.3599605","url":null,"abstract":"Style transfer aims to transfer the style information of a given style image to the other images, but most existing methods cannot transfer the texture details in style images well while maintaining the content structure. This paper proposes a novel arbitrary style transfer network that achieves arbitrary style transfer with more local style details through the cross-attention mechanism in visual transforms. The network uses a pre-trained VGG network to extract content and style features. The self-attention-based content and style enhancement module is utilized to enhance content and style feature representation. The transformer-based style cross-attention module is utilized to learn the relationship between content features and style features to transfer appropriate styles at each position of the content feature map and achieve style transfer with local details. Extensive experiments show that the proposed arbitrary style transfer network can generate high-quality stylized images with better visual quality.","PeriodicalId":123753,"journal":{"name":"Proceedings of the 2023 8th International Conference on Multimedia and Image Processing","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127985325","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multimodal Emotion Detection based on Visual and Thermal Source Fusion","authors":"Peixin Tian, Dehu Li, Dong Zhang","doi":"10.1145/3599589.3599590","DOIUrl":"https://doi.org/10.1145/3599589.3599590","url":null,"abstract":"The contactless emotion detection is an interesting research topic today. In this paper, we first study the physiological basis of human emotions to better understand what happens in our body when emotions arise and change. We then introduce the interconnection between the brain trunk vessels and the facial vessels. The investigation reveals that the variations of human emotions could be reflected by facial blood horizontal flow, and the detection of facial blood horizontal flow could be realized by mainly measuring the Remote photoplethysmography (rPPG) and gray scale variation on human cheeks. To validate these findings, we set up an emotional evoking experiment to capture the RGB and thermal videos of human testees, extract out horizontal facial blood flows, and finally classify these features into three different emotions (i.e., fear, happiness and sadness) by learning. The reported classification accuracy reaches 0.841, based on total 45 testees.","PeriodicalId":123753,"journal":{"name":"Proceedings of the 2023 8th International Conference on Multimedia and Image Processing","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126948251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Single-station multi-view global calibration based on the concentric circle 3D target","authors":"Pengfei Sun, Fuqiang Zhou, Haishu Tan","doi":"10.1145/3599589.3599604","DOIUrl":"https://doi.org/10.1145/3599589.3599604","url":null,"abstract":"In view of the disadvantage that traditional tracking 3D scanners can only use global binocular camera to complete single-station multi-view global calibration, a concentric circle 3D target (CC3DT) is designed in this paper. For the designed CC3DT, a single-station multi-view global calibration algorithm is proposed. This method only needs one camera to complete the global calibration function of global binocular camera. The designed CC3DT has simple structure and low cost. In this paper, the validity and feasibility of the proposed global calibration algorithm are verified by real experiments. It has widely application prospect and practical theoretical research value.","PeriodicalId":123753,"journal":{"name":"Proceedings of the 2023 8th International Conference on Multimedia and Image Processing","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114857733","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. Liu, Feng Zhang, Hao Zhao, Qi De Lu, Bing Feng, Lichang Feng
{"title":"Recognition and Detection of UAV Based on Transfer Learning","authors":"J. Liu, Feng Zhang, Hao Zhao, Qi De Lu, Bing Feng, Lichang Feng","doi":"10.1145/3599589.3599591","DOIUrl":"https://doi.org/10.1145/3599589.3599591","url":null,"abstract":"With the increasing application scenarios of UAVs in industry, agriculture, military and other fields, the potential threats to national security and public security cannot be ignored. In addition, effective UAV detection and/or tracking is becoming an increasingly important security service. This paper integrates deep learning and image processing technology to conduct research in this context. In this paper, a transfer learning based UAV detection model (YOLOV5-UAV) is proposed. In order to reduce the influence of the amount of supervised data and the imbalance of target distribution on the performance of the model, the dataset is constructed based on self-shot videos and Internet downloaded videos in different natural scenes, combined with Mosaic data enhancement and adaptive scaling techniques. Therefore, the problem of data security is also effectively solved. Furthermore, real-time tests were carried out in two different time periods, namely day and night, from multiple scales, multiple perspectives and multiple natural scenes, for purpose of verifying the validity of the model. The applicability of different detection models is compared and analyzed for small target, moving background and weak contrast between UAV and background. The results show that YOLOV5-UAV model has a good performance in both detection accuracy and detection speed.","PeriodicalId":123753,"journal":{"name":"Proceedings of the 2023 8th International Conference on Multimedia and Image Processing","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117338121","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Simulating Quantum Turing Machine in Augmented Reality","authors":"Wanwan Li","doi":"10.1145/3599589.3599606","DOIUrl":"https://doi.org/10.1145/3599589.3599606","url":null,"abstract":"As quantum computing theory is attracting attention from researchers nowadays, visualizing the quantum computing process is necessary for fundamental quantum computing education and research. Especially, connecting traditional computational theory with advanced quantum computing concepts is an extremely important step in learning and understanding quantum computing. In this paper, we propose a practical interactive interface for simulating Quantum Turing Machine (QTM) in Augmented Reality (AR) that combines the traditional Turing machine computational model with the quantum computing simulation. Through such an interface, users can use a C-like script to represent a QTM and simulate such QTM in an immersive augmented reality platform through the Vuforia AR engine. After validating our proposed QTM AR simulator through a series of experiments, we show the great potential to apply our QTM AR simulator to quantum computing education through an interactive visualization interface in augmented reality.","PeriodicalId":123753,"journal":{"name":"Proceedings of the 2023 8th International Conference on Multimedia and Image Processing","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126936732","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Reconstruction of hyperspectral images with compressed sensing based on linear mixing model and affinity propagation clustering algorithm","authors":"Youli zou, Zhi-yun Xiao, Kuntao Ye","doi":"10.1145/3599589.3599602","DOIUrl":"https://doi.org/10.1145/3599589.3599602","url":null,"abstract":"The increasing spatial and spectral resolution of hyperspectral images results in a significant rise in data volume, which poses a challenge for data storage and transmission. Therefore, improving the efficiency of storage and transmission by enhancing the reconstruction performance of hyperspectral images at low sampling rates or same sampling rates conditions is a crucial topic in compressed sensing. Previous research has shown that a linear mixing model and distributed compressed sensing method outperform traditional compressed sensing reconstruction algorithms in recovering original data. However, the low estimating accuracy of both the endmembers matrix and abundance matrix due to the random selection of reference bands limits the reconstruction performance. To address this problem, we proposed a compressed sensing reconstruction algorithm based on a linear mixing model and affinity propagation clustering algorithm. Our method improves reconstruction performance by enhancing the estimating accuracy of the endmembers and abundance matrices. During the sampling stage, the affinity propagation clustering algorithm is used to group the spectral bands according to the spectral correlation of hyperspectral images, where the clustering center serving as the reference band and the other bands as non-reference bands. During the reconstruction stage, the number of endmembers from the reference band is estimated fist, and the endmembers matrix and the abundance matrix are then estimated. Finally, the endmembers matrix and estimated abundance matrix are used for reconstruction. Experimental results show that our proposed algorithm achieves higher performance in reconstructing hyperspectral images than the linear mixing model-based distributed compressed sensing method.","PeriodicalId":123753,"journal":{"name":"Proceedings of the 2023 8th International Conference on Multimedia and Image Processing","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125431466","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fast Recognition of Distributed Fiber Optic Vibration Sensing Signal based on Machine Vision in High-speed Railway Security","authors":"Nachuan Yang, Yongjun Zhao, Fuqiang Wang","doi":"10.1145/3599589.3599603","DOIUrl":"https://doi.org/10.1145/3599589.3599603","url":null,"abstract":"Accurate and effective identification of multi-vibration events detected based on the phase-sensitive optical time-domain reflectometer (Φ-OTDR) is an effective method to achieve precise alarm. This study proposes a real-time classification method of Φ-OTDR multi-vibration events based on the combination of convolutional neural network (CNN), bi-directional long short-term memory network (Bi-LSTM) and connectionist temporal classification (CTC), which can quickly and effectively identify the type and number of vibrations contained in the data image when multiple vibration signals are present in a single image, and manual alignment is not required for model training. Noncoherent integration and pulse cancellers are used for raw signal processing to generate spatio-temporal images. CNN is used to extract spatial dimensional features in spatio-temporal images, Bi-LSTM extracts temporal dimensional correlation features, and the hybrid features are automatically aligned with the labels by CTC. A dataset of 8,000 vibration images containing 17,589 abnormal vibration events is collected for model training, validation and testing. Experiments show that the recognition model C3B3 trained with this method can achieve 210 FPS and 99.62% F1 score on the test set. The system can achieve the real-time classification of multiple vibration targets at the perimeter of high-speed railway and effectively reduce the false alarm rate of the system.","PeriodicalId":123753,"journal":{"name":"Proceedings of the 2023 8th International Conference on Multimedia and Image Processing","volume":"82 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126732946","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}