Hengyu Zhang, Jingxuan Xu, Mengyu Wang, Yanfeng Li
{"title":"DenseATT-Net: Densely-Connected Neural Network with Intensive Attention Modules for 3D ABUS Mass Segmentation","authors":"Hengyu Zhang, Jingxuan Xu, Mengyu Wang, Yanfeng Li","doi":"10.1109/ICIVC55077.2022.9886080","DOIUrl":"https://doi.org/10.1109/ICIVC55077.2022.9886080","url":null,"abstract":"Accurate segmentation of breast mass in 3D automated breast ultrasound (ABUS) images is important in breast cancer analysis. However, it is hard to obtain enough labeled ABUS images for training segmentation networks, which may lead to over-fitting problem in deep learning based methods. Aiming at this problem, a lightweight segmentation network D2U-Net is selected as the baseline. ABUS images have a low signal-to-noise ratio and serious artifacts, which makes mass boundary unclear. To address this problem, different kinds of attention modules are inserted into the segmentation network. These attention modules include spatial attention, channel attention, convolutional block attention module (CBAM) and squeeze-and-excitation (SE) block. The whole segmentation network is termed as DenseATT-Net. An ABUS dataset with 170 volumes is employed to verify the segmentation performance. Experimental results show that the proposed method performs better than other segmentation models on 3D ABUS images.","PeriodicalId":227073,"journal":{"name":"2022 7th International Conference on Image, Vision and Computing (ICIVC)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115439817","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kailiang Han, Haodong Pei, Zhentao Huang, Tao Huang, Shangshi Qin
{"title":"Non-cooperative Space Target High-Speed Tracking Measuring Method Based on FPGA","authors":"Kailiang Han, Haodong Pei, Zhentao Huang, Tao Huang, Shangshi Qin","doi":"10.1109/ICIVC55077.2022.9887187","DOIUrl":"https://doi.org/10.1109/ICIVC55077.2022.9887187","url":null,"abstract":"Based on the analysis of visible image features of space targets, a high-speed tracking and measurement method is proposed for space non-cooperative targets in a starry background. The interference of background stars on the real target detection results is excluded by using algorithms such as clustering analysis of target-oriented graphical features, and the tracking detection of the target is completed by the shape center extraction algorithm. The algorithm is finally applied to an FPGA-based space embedded system. Finally, the software operation efficiency is improved by optimizing the clustering algorithm and setting the region of interest, and the parallel processing capability of FPGA is used to realize the processing of image data while reading. The software image processing speed is tested to reach 50Hz for targets with imaging scales between 3x3 pixels and 500x500 pixels. Currently, the design has been applied to a practical system with good application results.","PeriodicalId":227073,"journal":{"name":"2022 7th International Conference on Image, Vision and Computing (ICIVC)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123182233","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ruifeng Huang, Chong Chen, Rui Cheng, Y. Zhang, Jiabing Zhu
{"title":"Human Action Recognition Based on Three-Stream Network with Frame Sequence Features","authors":"Ruifeng Huang, Chong Chen, Rui Cheng, Y. Zhang, Jiabing Zhu","doi":"10.1109/ICIVC55077.2022.9887162","DOIUrl":"https://doi.org/10.1109/ICIVC55077.2022.9887162","url":null,"abstract":"In the field of human action recognition (HAR), two-stream models have been widely employed. In recent years, traditional two-stream network models have disregarded the interframe sequence characteristics of video, resulting in a decrease in model robustness when local sequence information and long-term motion information interact. In light of this, a novel three-stream neural network is proposed by combining the long-term and short-term characteristics of a frame sequence with spatio-temporal information. Initially, the optical flow sequence image frames and RGB image frames in the video are extracted, the optical flow motion information and image space information in the video is obtained, the corresponding time network and space network are entered, and the spatial information is entered into the sequence feature processing network; the three networks are then pretrained. At the conclusion of training, the operation of feature extraction is executed, the features are incorporated with the parallel fusion algorithm by adding weights, and the behavior categories are classified using Multi-Layer Perception. Experimental results on the UCF11, UCF50, and HMDB51 datasets demonstrate that our model effectively integrates the spatial-temporal and frame-sequence information of human actions, resulting in a significant improvement in recognition accuracy. Its classification accuracy on the three datasets was 99.17%, 97.40%, and 96.88%, respectively, significantly enhancing the generalization capability and validity of conventional two-stream or three-stream models.","PeriodicalId":227073,"journal":{"name":"2022 7th International Conference on Image, Vision and Computing (ICIVC)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128364103","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Similarity Measurement Human Actions with GNN","authors":"Xiuxiu Li, Pu Zhang, Chaoxian Wang, Shengjun Wu","doi":"10.1109/ICIVC55077.2022.9887189","DOIUrl":"https://doi.org/10.1109/ICIVC55077.2022.9887189","url":null,"abstract":"Measuring the similarity of human actions represented with human skeletons from motion capture plays an important role in classification, retrieval and analysis of actions. In this paper, a similarity measurement method of human action based on graph neural network is proposed. In this method, due to the introduction of graph convolution neural network, the dependence between adjacent joints in human skeleton can be obtained, which makes the expression of human action in a frame more accurate. In the further action similarity measurement, LSTM with self- attention is used to extract temporal feature of the human action sequence, and finally MMD-NCA is used to measure the similarity of action sequences. Experiments on public dataset verify the effectiveness of this method in action recognition and action similarity measurement.","PeriodicalId":227073,"journal":{"name":"2022 7th International Conference on Image, Vision and Computing (ICIVC)","volume":"48 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120908042","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Encoder-Decoder Network with Residual and Attention Blocks for Full-Face 3D Gaze Estimation","authors":"Xinyuan Song, Shaoxiang Guo, Zhenfu Yu, Junyu Dong","doi":"10.1109/ICIVC55077.2022.9886734","DOIUrl":"https://doi.org/10.1109/ICIVC55077.2022.9886734","url":null,"abstract":"This paper proposes a novel end-to-end network to improve the accuracy of gaze estimation task with full-face image as input. We first explored the possibility of using the encoder-decoder network to reconstruct the input face image, then we used U-Net with residual blocks to retain eyes features hidden in high resolution feature map layers, which are often lost during down-sampling and convolution layers. Finally, we applied spatial and channel-wise attention blocks to our model to better consider the relations among different regions globally and enhance the contribution of valuable gaze-related regions. We conducted experiments on the ETH-XGaze dataset. The results turned out that our proposed model is very competitive compared with existing state-of-the-art methods for person-independent gaze estimation.","PeriodicalId":227073,"journal":{"name":"2022 7th International Conference on Image, Vision and Computing (ICIVC)","volume":"15 4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120928903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yi Sun, Wenna Wang, Qianyu Zhang, Han Ni, Xiuwei Zhang
{"title":"Improved YOLOv5 with Transformer for Large Scene Military Vehicle Detection on SAR Image","authors":"Yi Sun, Wenna Wang, Qianyu Zhang, Han Ni, Xiuwei Zhang","doi":"10.1109/ICIVC55077.2022.9887095","DOIUrl":"https://doi.org/10.1109/ICIVC55077.2022.9887095","url":null,"abstract":"With the development of SAR technology, large scene object detection on SAR images has attracted more and more attention. Exiting large scene object detection is mainly based on the CNN network, which limits the obtaining of global context information. On the other hand, due to the high acquisition cost of SAR images, there are no existing public datasets in military vehicle detection. To solve these problems, we adopt the Transformer module to construct the neck block based on YOLOv5. This design can gain global context information, and also has better performance for small objects detection. Furthermore, to achieve the detection of large-scale military ground vehicles, we construct a dataset based on the MSTAR dataset, named LSGVOD. Extensive experiments have been conducted on LSGVOD, and experimental results show that the proposed method greatly improves detection accuracy. Compared to other methods, it achieves the best accuracy with 93.3% mAP.","PeriodicalId":227073,"journal":{"name":"2022 7th International Conference on Image, Vision and Computing (ICIVC)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121209558","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Robust Facial Expression Recognition Based on Dual Branch Multi-feature Learning","authors":"Xuewen Liu, Zhe Guo, Boya Yuan, Haojie Guo","doi":"10.1109/ICIVC55077.2022.9886565","DOIUrl":"https://doi.org/10.1109/ICIVC55077.2022.9886565","url":null,"abstract":"Facial expression recognition (FER) is a key factor in human behavior analysis. Most algorithms are difficult to distinguish the subtle differences of local facial features, such as facial wrinkles and mouth corners. To solve the above problems, We propose the Dual Branch Multi-feature Learning Network (DBML-Net) to explore the latent incentive. It contains two branchs. One branch works to extract apparent features from the original images, the other uses two texture features (CS-LOP and ALDP) to enhance the detailed information. A Densely Connected Dynamic Selective Kernel Network (Dense-SK) is constructed as the feature extraction section of branch one. The extensive experimental results show that the DBML-Net achieves state-of-the-art performance on three widely used FER datasets: CK+, Oulu-CASIA and JAFFE, which demonstrate the effectiveness of our method.","PeriodicalId":227073,"journal":{"name":"2022 7th International Conference on Image, Vision and Computing (ICIVC)","volume":"100 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122746477","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
R. Qi, Yuanlong Chen, Xiaojiang Sun, Si Qingaowa, Shenghui Chen
{"title":"Research on Fast Extraction of Information System from Online Social Network Images Based on Big Data Algorithm","authors":"R. Qi, Yuanlong Chen, Xiaojiang Sun, Si Qingaowa, Shenghui Chen","doi":"10.1109/ICIVC55077.2022.9886207","DOIUrl":"https://doi.org/10.1109/ICIVC55077.2022.9886207","url":null,"abstract":"The paper proposes a social network image tag sorting big data algorithm, which combines SIFT features, convolutional neural network features, and visual bag-of-words model to obtain the target image's visual neighbour image set from the image training set. The paper makes all the visual neighbour images the initial label of the target image for weighted voting and calculates the voting weight through the linear fusion of visual image similarity and label semantic similarity. Simultaneously, the paper uses the target image labels and their visual neighbors to construct a label graph model and uses the weighted voting results to perform a random walk on the label graph to complete the label ranking task. The experimental results verify the effectiveness of the proposed method.","PeriodicalId":227073,"journal":{"name":"2022 7th International Conference on Image, Vision and Computing (ICIVC)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126736417","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Method of Sound Event Localization and Detection Based on Three-Dimension Convolution","authors":"Pengcheng Mei, Jibin Yang, Qiang Zhang, Xian Huang","doi":"10.1109/ICIVC55077.2022.9886722","DOIUrl":"https://doi.org/10.1109/ICIVC55077.2022.9886722","url":null,"abstract":"Deep Learning methods represented by convolutional neural networks can jointly realize Sound Event Detection (SED) and Sound Source Location (SSL). However, due to the noise and reverberation in real scenes, the accuracy of direction estimation is still dissatisfactory. Since three-dimensional convolution can carry out convolution calculation in time, frequency and channel domains for multichannel input simultaneously, it can learn more inter-channel and intra-channel features and effectively solve the above problems compared to two-dimensional convolution. Inspired by it, a method based on three-dimension convolution feature extraction called SELD3Dnet is proposed. The amplitude and phase characteristics of input multi-channel audio are calculated, and the deep feature representation is extracted through multiple 3D convolutional structures. Finally, the category and spatial location of sound events are estimated by recurrent neural networks and fully connection layers. Comparative experiments are conducted on TUT2018 datasets, and the results show that the proposed method improves the F1 metric by 13.9% and the frame recall metric by 21.1% on average under various types of real scene data subset ov1, ov2, ov3, which can validate the performance of the proposed method.","PeriodicalId":227073,"journal":{"name":"2022 7th International Conference on Image, Vision and Computing (ICIVC)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125459489","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Stroke Based Shadow Generation For Line Drawings","authors":"Huanhuan Xue, Chunmeng Kang","doi":"10.1109/ICIVC55077.2022.9887289","DOIUrl":"https://doi.org/10.1109/ICIVC55077.2022.9887289","url":null,"abstract":"We present a method to generate stylized shadows for line drawings. To begin with, we disturb the RGB values of the image in a small range, and propose a new calculation method to estimate the stroke density of the disturbed image. Then the light effect map is generated based on the wave function which is combined with the original image to produce a shadow effect for the original image. We use image enhancement techniques to improve the quality of the shadows and enhance the subjective visual effect. Our algorithm adapts to the image structure, and simplifies the user’s workflow, reduces the user’s workload, and saves time when drawing image shadows. Abundant experiments prove that our method solves the difficulty of adding light and shadow to line drawings using stroke density.","PeriodicalId":227073,"journal":{"name":"2022 7th International Conference on Image, Vision and Computing (ICIVC)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131895546","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}