{"title":"Preconditioned Diffusion Multitask Clustering Graph Filters","authors":"Ying-Shin Lai, F. Chen, Tiantian Wang","doi":"10.1145/3512388.3512438","DOIUrl":"https://doi.org/10.1145/3512388.3512438","url":null,"abstract":"In this work, we are interested in the design of node-variant FIR graph filters, in which the graph filter estimates the filter coefficients from the stream data. Considering the estimation of filter coefficients as a task, we introduce concept of the multitask into graph filters. The filter coefficients can be divided into different clusters, and the cooperation between clusters is beneficial. Then, a multitask graph diffusion LMS algorithm is proposed. In order to improve convergence speed and performance, a multitask graph diffusion preconditioned algorithm is proposed. The simulation results verify the feasibility of algorithms.","PeriodicalId":434878,"journal":{"name":"Proceedings of the 2022 5th International Conference on Image and Graphics Processing","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117130683","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"RPViT: Vision Transformer Based on Region Proposal","authors":"Jing Ge, Qianxiang Wang, Jiahui Tong, Guangyu Gao","doi":"10.1145/3512388.3512421","DOIUrl":"https://doi.org/10.1145/3512388.3512421","url":null,"abstract":"Vision Transformers constantly absorb the characteristics of convolutional neural networks to solve its shortcomings in translational invariance and scale invariance. However, dividing the image by a simple grid often destroys the position and scale features in the image at the beginning of the network. In this paper, we propose a vision transformer based on region proposal, which obtains the inductive bias in a simple way. Specifically, RPViT achieves locality and scale-invariance by extracting regions with locality using a traditional region proposal algorithm and deflating objects of different scales to the same scale by a bilinear interpolation algorithm. In addition, to enable the network to fully utilize and encode diverse candidate objects, a multi-class token approach based on orthogonalization is proposed and applied. Experiments on ImageNet demonstrate that RPViT outperforms baseline converters and related work.","PeriodicalId":434878,"journal":{"name":"Proceedings of the 2022 5th International Conference on Image and Graphics Processing","volume":"414 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122793735","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yunxiang Liu, Jianlin Zhu, Xinxin Yuan, Chunya Wang
{"title":"Automatic Marking based on Deep Learning","authors":"Yunxiang Liu, Jianlin Zhu, Xinxin Yuan, Chunya Wang","doi":"10.1145/3512388.3512410","DOIUrl":"https://doi.org/10.1145/3512388.3512410","url":null,"abstract":"In order to solve the problem of a lot of time and energy being wasted when the marking teacher corrects the test papers, this paper proposes an automated test paper correction system based on deep learning. The system is roughly divided into three modules: text extraction from test papers, text encoding and text matching. Using the method of combining DB and CRNN to extract text from the test paper, it has a high accuracy of text recognition; and respectively uses the BERT pre-training model and cosine similarity as the method of text encoding and text matching. The experimental results prove that the automated test paper The average result of the correction system and the scoring teacher's score is only 0.5, which achieves an excellent evaluation effect.","PeriodicalId":434878,"journal":{"name":"Proceedings of the 2022 5th International Conference on Image and Graphics Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130618912","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chuanbo Zhou, Guoan Yang, Zhengzhi Lu, Deyang Liu, Yong Yang
{"title":"A Noise-robust Feature Fusion Model Combining Non-local Attention for Material Recognition","authors":"Chuanbo Zhou, Guoan Yang, Zhengzhi Lu, Deyang Liu, Yong Yang","doi":"10.1145/3512388.3512450","DOIUrl":"https://doi.org/10.1145/3512388.3512450","url":null,"abstract":"Material recognition, as an important task of computer vision, is hugely challenging, due to large intra-class variances and small inter-class variances between material images. To address those recognition problems, multi-scale feature fusion methods based on deep convolutional neural networks are presented, which has been widely studied in recent years. However, the past research works paid too much attention to the local features of the image, while ignoring the non-local features that are also crucial for fine image recognition tasks such as material recognition. In this paper, Non-local Attentional Feature Fusion Network (NLA-FFNet) is proposed that combines local and non-local feature of images to improve the feature representation capability. Firstly, we utilize the pre-trained deep convolutional neural network to extract the image feature. Secondly, a Multilayer Non-local Attention (MNLA) block is designed to generate a non-local attention map which represents the long-range dependencies between features of different positions. Therefore, it can achieve stronger noise-robustness of model and better ability to represent fine features. Finally, combined our Multilayer Non-local Attention block with bilinear pooling which has been proved to be effective for feature fusion, we propose a deep neural network framework, NLA-FFNet, with noise-robust multi-layer feature fusion. Experiment prove that our model can achieve a competitive classification accuracy in material image recognition, and has stronger noise-robustness at the same time.","PeriodicalId":434878,"journal":{"name":"Proceedings of the 2022 5th International Conference on Image and Graphics Processing","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128085000","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Visualization of Plant Leaf Classification Process Based on Multi-Layer Network Model","authors":"Ziyi Wang, Hongjun Li","doi":"10.1145/3512388.3512430","DOIUrl":"https://doi.org/10.1145/3512388.3512430","url":null,"abstract":"Tree recognition is a ubiquitous topic in the field of artificial intelligence, and the identification of leaf types is one of the important ways in analyzing tree species. Based on the focalized attention mechanism and feature fusion strategy, we in this paper establish a multi-layer network model classifier: Leaf-AMNet to classify the Liriodendron leaves and Ginkgo leaves. In order to explore the \"black box\" problem of deep learning, we visualize the classification process of Leaf-AMNet. Experimental results show that Leaf-AMNet achieves high accuracy on Leaf Dataset and shows potential information via visual effects.","PeriodicalId":434878,"journal":{"name":"Proceedings of the 2022 5th International Conference on Image and Graphics Processing","volume":"25 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127247124","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Le Liu, Jian Su, HuLin Liu, Weiqiang Zhao, Xiaogang Du, Tao Lei
{"title":"MAU-Net: A Multiscale Attention Encoder-decoder Network for Liver and Liver-tumor Segmentation","authors":"Le Liu, Jian Su, HuLin Liu, Weiqiang Zhao, Xiaogang Du, Tao Lei","doi":"10.1145/3512388.3512418","DOIUrl":"https://doi.org/10.1145/3512388.3512418","url":null,"abstract":"U-Net and improved U-Nets suffer from two problems for liver and liver-tumor segmentation. The first is that skip connections in encoder-decoder networks bring interference information. The second is that the convolutional kernel with the fixed receptive field does not match the liver-tumor with changing shape and position. To address the above problems, we propose a multiscale attention encoder-decoder network (MAU-Net) for liver and liver-tumor segmentation. First, MAU-Net employs self-attentive gating guidance module in the skip connection to suppresses irrelevant regions. Secondly, MAU-Net employs a multi-branch feature fusion module to extract multiscale features for the segmentation of liver-tumor. We evaluate the proposed method on the public LiTS dataset. The experimental results show that the average dice of liver and liver-tumor segmentation by MAU-Net are 96.11% and 86.90%, respectively. Experiments demonstrate that MAU-Net is superior to state-of-the-art networks for liver and liver-tumor segmentation.","PeriodicalId":434878,"journal":{"name":"Proceedings of the 2022 5th International Conference on Image and Graphics Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129948289","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SHTVS: Shot-level based Hierarchical Transformer for Video Summarization","authors":"Yubo An, Shenghui Zhao","doi":"10.1145/3512388.3512427","DOIUrl":"https://doi.org/10.1145/3512388.3512427","url":null,"abstract":"In this paper, a Shot-level based Hierarchical Transformer for Video Summarization (SHTVS) is proposed for supervised video summarization. Different from most existing methods that employ bidirectional long short-term memory or use self-attention to replace certain components while keeping their overall structure in place, our methods show that a pure Transformer with video feature sequences as its input can achieve competitive performance in video summarization. In addition, to make better use of the multi-shot characteristic in a video, each video feature sequence is firstly split into shot-level feature sequences with kernel temporal segmentation, and then fed into shot-level Transformer encoder to learn shot-level representations. Finally, shot-level representations and original video feature sequence are integrated for the frame-level Transformer encoder to predict frame-level importance scores. Extensive experimental results on two benchmark datasets (SumMe and TVSum) prove the effectiveness of our methods.","PeriodicalId":434878,"journal":{"name":"Proceedings of the 2022 5th International Conference on Image and Graphics Processing","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125309255","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SpikeFormer: Image Reconstruction from the Sequence of Spike Camera Based on Transformer","authors":"Chen She, Laiyun Qing","doi":"10.1145/3512388.3512399","DOIUrl":"https://doi.org/10.1145/3512388.3512399","url":null,"abstract":"The recently invented retina-inspired spike camera produces asynchronous binary spike streams to record the dynamic light intensity variation process. This paper develops a novel image reconstruction method, called SpikeFormer, which reconstructs the dynamic scene from binary spike streams in a supervised learning strategy. We construct the training dataset which composes of spike streams and corresponding ground truth images by simulating the working mechanism of spike camera. Spike noises are also taken into consideration in the simulator. Firstly, the input spike stream is encoded as an enlarged binary image by interlacing temporal and spatial information. Then the binary image is inputted to the SpikeFormer to recover the dynamic scene. SpikeFormer adopts Transformer architecture which includes an encoder and a decoder. In particular, we propose a hierarchical architecture encoder to exploit multi-scale temporal and spatial features progressively. The decoder aggregates information from different stages to incorporate both local and global attention. Multi-task loss including reconstruction loss, perception loss, edge loss, and temporal consistency loss are combined to restrict the model. Extensive experimental results demonstrate that the proposed framework achieves encouraging results in details reconstruction and noise alleviation.","PeriodicalId":434878,"journal":{"name":"Proceedings of the 2022 5th International Conference on Image and Graphics Processing","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116739326","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SU-UNet: A Novel Self-Updating Network for Hepatic Vessel Segmentation in CT Images","authors":"Yang Liu, Xukun Zhang, Haopeng Kuang, Zhongwei Yang, Shichao Yan, Peng Zhai, Lihua Zhang","doi":"10.1145/3512388.3512420","DOIUrl":"https://doi.org/10.1145/3512388.3512420","url":null,"abstract":"Hepatectomy is currently one of the most commonly used treatment methods for malignant liver tumors. It is of great significance to clinical surgery to perform accurate hepatic vessel segmentation in preoperative CT images. However, due to the complex structure of hepatic vessels and low contrast in the CT images, it is difficult for experienced doctors to perform accurate manual labeling. Based on this, the labels of the existing public datasets are noisy. In this paper, we propose a double UNet structure based on the soft-constraint method to more accurately segment the vessels from the noisy annotation dataset. First, two different Unet output different segmentation predictions. Then a Self-updating module (SUM) is designed to optimize the noisy vessel label based on segmentation predictions so that the optimized label can better guide the network training. This method can guide the network to get better segmentation predictions. Extensive experiments using a noisy public dataset demonstrate the superiority of our method.","PeriodicalId":434878,"journal":{"name":"Proceedings of the 2022 5th International Conference on Image and Graphics Processing","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116779492","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Bayesian-based Security Distributed Estimation","authors":"Tiantian Wang, Feng Chen, Ying-Shin Lai","doi":"10.1145/3512388.3512445","DOIUrl":"https://doi.org/10.1145/3512388.3512445","url":null,"abstract":"In recent years, the distributed estimation of wireless sensor networks has been widely studied, but there are often security threats in practical applications. For example, attackers damage data information in different ways and reduce the performance of network estimation. In order to solve this problem, this paper proposes an algorithm framework of attack detection based on distributed LMS. The algorithm classifies the states of adjacent nodes, and then realizes attack detection through Bayesian criterion. An adaptive detection threshold is proposed to improve the detection performance. The reliable information of the last time is used to replace the detected lossy information and fuse to ensure the performance of the algorithm. Finally, the simulation results of several algorithms under different attack models are given to prove the effectiveness of the algorithm.","PeriodicalId":434878,"journal":{"name":"Proceedings of the 2022 5th International Conference on Image and Graphics Processing","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130172940","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}