Jianwei Zhang, Xiaoguang Yuan, Zhewei Xu, Wenjun Ke
{"title":"Extracting RoIs for Robust Far-infrared Pedestrian Detection on Board","authors":"Jianwei Zhang, Xiaoguang Yuan, Zhewei Xu, Wenjun Ke","doi":"10.1145/3529466.3529494","DOIUrl":"https://doi.org/10.1145/3529466.3529494","url":null,"abstract":"Far-infrared pedestrian detection onboard is more challenging compared to pedestrian detection of visible light. Existing works proved that the output of RoIs extraction, named as the proposal, is significantly related to recall rate and computational cost for pedestrian detection. However, it is non-trivial for RoIs extraction due to low resolution, blurred details, and pedestrian morphological features in far-infrared scenes. The paper proposes a novel RoIs extraction framework for far-infrared pedestrian detection on Board, named FIR-RoIEF, by using edge to obtain pedestrian contour feature and cascade filtering to gain valuable RoIs. Given pedestrian morphological features, we further present a vertical edge strategy to enhance pedestrian vertical features. A T-shaped template and RoIs reordering are used in the bounding box evaluation process to output pure and high-quality RoIs. Under basic and standard metrics, we perform experiments on public datasets, SCUT and KAIST, which both contain large far-infrared pedestrian objects. As for the result, given 100, 400, 1000 proposals, we both achieve the best recall in 1000 proposals, 94%, and 87% respectively. By the way, it can be proved that our method keeps a good balance between computational cost and good real-time.","PeriodicalId":375562,"journal":{"name":"Proceedings of the 2022 6th International Conference on Innovation in Artificial Intelligence","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131330059","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Controllable Person Image Synthesis GAN and Its Reconfigurable Energy-efficient Hardware Implementation","authors":"Shaoyue Lin, Yanjun Zhang","doi":"10.1145/3529466.3529500","DOIUrl":"https://doi.org/10.1145/3529466.3529500","url":null,"abstract":"At this stage, how to controllably generate higher quality person image is still the challenge of person image synthesis. At the same time, the update of image synthesis network is far ahead of its hardware implementation. Therefore, this paper proposes a GAN network for person image synthesis that can generate high quality person image with controllable pose and attributes. The newly designed network is more convenient for hardware implementation while ensuring that the generated image is controllable. This paper also designs a synthesizable library for GAN to pursue faster hardware reconfiguration. We completed the new model proposed in this paper based on this library. Finally, the proposed network achieves better results both quantitatively and qualitatively compared with previous work. Compared with GPU and CPU, the hardware implementation based on FPGA can achieve the highest energy efficient of 73.67 GOPS / W.","PeriodicalId":375562,"journal":{"name":"Proceedings of the 2022 6th International Conference on Innovation in Artificial Intelligence","volume":"119 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123250994","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Human Action Recognition Based on Multi-Scale Feature Augmented Graph Convolutional Network","authors":"Wangyang Lv, Yinghua Zhou","doi":"10.1145/3529466.3529501","DOIUrl":"https://doi.org/10.1145/3529466.3529501","url":null,"abstract":"Nowadays, video has gradually become the mainstream media of communication, and the massive amounts of videos bring challenge to the task of manual review of the videos. So, using computers to understand the videos is of great significance. Among the approaches of automatic action recognition, skeleton-based approach has many advantages, such as strong robustness to light changes, strong action expression ability, small amount of computation time, etc. In this paper, a multi-scale feature augmented graph convolutional network is proposed. It uses the spatial multi-scale GCN module to extract spatial features of different scales, the multi-scale temporal augmentation module to capture temporal features of different scales. To prove the performance of the proposed method, experiments were performed on two public datasets, NTU-RGB+D and The Kinetics-Skeleton. Compared with other advanced action recognition methods, the proposed method can accomplish action recognize effectively, and the recognition accuracy is improved.","PeriodicalId":375562,"journal":{"name":"Proceedings of the 2022 6th International Conference on Innovation in Artificial Intelligence","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131616568","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Transformer-based Encoder-Decoder Model for Surface Defect Detection","authors":"Xiaofeng Lu, Wentao Fan","doi":"10.1145/3529466.3529471","DOIUrl":"https://doi.org/10.1145/3529466.3529471","url":null,"abstract":"Recently, deep learning approaches have been gaining popularity in industrial quality control (e.g. surface defect detection), due to their ability for automatically extracting more representative features. In this paper, we propose a two-stage end-to-end approach through a Transformer-based encoder-decoder for surface defect detection. First, we develop a surface defect detection model to train the slicing of input raw images with the same final resolution of the input images and the output images, which better expands the perceptual field. After that, a 1×1 convolution layer is applied to its final layer, thus reducing the number of channels to obtain a single-channel output mask. Then, we combine this single-channel output mask with the output obtained from the last layer of the first stage as the input of the second stage decision layer. Considering different types of sample data, we design two different decision network strategies, namely: plain-up sampling and dynamic-up sampling. Our experimental studies on several publicly available datasets show that the proposed approach is general and effective in detecting defects, and we only need a relatively small number of samples to train the model, which has a good applicability in industrial practice where the sample size is normally limited.","PeriodicalId":375562,"journal":{"name":"Proceedings of the 2022 6th International Conference on Innovation in Artificial Intelligence","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130253963","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Mass Mahjong Decision System Based on Transfer Learning","authors":"Yajun Zheng, Shuqin Li","doi":"10.1145/3529466.3529485","DOIUrl":"https://doi.org/10.1145/3529466.3529485","url":null,"abstract":"In this paper, we propose a transfer learning to solve the problem of lacking in data and the difficulty in constructing models effectively, which is typically represented by Mass Mahjong in the field of imperfect information. Design and implement the Mass Mahjong Discard model based on transfer learning. The previously well-trained Blood Mahjong Discard model on a large dataset is migrated to Mass Mahjong Discard model in a similar domain. In the subsequent model optimization, a self-play based approaching is used to improve the Mass Mahjong Discard model. The experimental results show that the transfer learning-based Mass Mahjong Discard model performs well in the situation of less data, and can fit the Mass Mahjong Discard rule. And the model won the second prize in the Mass Mahjong event of the National University Computer Gaming Competition in 2021.","PeriodicalId":375562,"journal":{"name":"Proceedings of the 2022 6th International Conference on Innovation in Artificial Intelligence","volume":"133 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120867905","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Neural Sign Language Translation with SF-Transformer","authors":"Qifang Yin, Wenqi Tao, Xiaolong Liu, Yu Hong","doi":"10.1145/3529466.3529503","DOIUrl":"https://doi.org/10.1145/3529466.3529503","url":null,"abstract":"The popular methods are based on the combination of CNNs and RNNs in the sign language translation. Recently, Transformer has also attracted the attention of researchers and achieved success in this subject. However, researchers usually only focus on the accuracy of their model, while ignoring the practical application value. In this paper, we propose the SF-Transformer, a lightweight model based on Encoder-Decoder architecture for sign language translation, which achieves new state-of-the-art performance on Chinese Sign Language (CSL) dataset. We used 2D/3D convolution blocks of SF-Net and Transformer's Decoders to build our network. Benefiting from fewer parameters and a high level of parallelization, the training and inference speed of our model is faster. We hope that our method can contribute to the practical application of sign language translation on low-computing devices such as mobile phones.","PeriodicalId":375562,"journal":{"name":"Proceedings of the 2022 6th International Conference on Innovation in Artificial Intelligence","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121460530","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Object Detection Based on Feature Balance Pyramid in UAV Imagery","authors":"Jiao Xu, Jian Xu, Zeming Xu, Zhengguang Xie","doi":"10.1145/3529466.3529469","DOIUrl":"https://doi.org/10.1145/3529466.3529469","url":null,"abstract":"Compared with images taken from the ground perspective, small objects account for a large proportion of aerial UAV images and the image perspective changes greatly, which affects the target detection effect of aerial UAV images. In this paper, the Yolov5 algorithm is improved to adapt to UAV object detection. Given a large number of small objects in aerial images, the feature balance pyramid structure is added to improve the loss of low-level features and improve the detection effect of the small object. In the feature balance pyramid, Pixel un-Shuffle is used to adjust the scale of the feature, which preserves the low-level feature information and reduces the computational cost. The cross self-attention module is proposed to improve the balanced feature map and improve the positioning accuracy of the small object. The Angle of view of aerial images varies greatly. In this paper, the deformable convolutional network is added to the backbone network of Yolov5 to enhance the feature extraction capability of the model for multi-view objects. Experimental results show that on the visdrone data set, the improved algorithm improves the average accuracy (mAP) by 1.4 percentage points compared with the original algorithm.","PeriodicalId":375562,"journal":{"name":"Proceedings of the 2022 6th International Conference on Innovation in Artificial Intelligence","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131456561","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Chinese Named Entity Recognition Fusing Lexical and Syntactic Information","authors":"Min Zhang, Bicheng Li, Qilong Liu, Jing Wu","doi":"10.1145/3529466.3529506","DOIUrl":"https://doi.org/10.1145/3529466.3529506","url":null,"abstract":"Chinese named entity recognition is an important research field in natural language processing and is a significant factor in multitasking. However, the accuracy of Chinese named entity recognition is impaired by the lack of entity boundary information and the neglect of potential syntactic information. In this context. In the present study, a Chinese named entity recognition fusing lexical and syntactic information was proposed, and the steps are as follows. Firstly, input text is mapped to character vectors, and external lexical information is introduced with an improved word set matching method and integrated into the input representation of each character. Secondly, BiLSTM is used to obtain the context vector based on the input representation of the character. Thirdly, the syntactic vector is constructed by means of the key value memory network, and the feature vector is obtained by the weighted fusion of context vector and syntactic vector by the gating mechanism. Finally, the feature vector is input into CRF to realize Chinese named entity recognition. The experimental results reveal that the proposed method had better performance on Resume, Weibo and MSRA, and was superior to the current mainstream Chinese named entity recognition methods.","PeriodicalId":375562,"journal":{"name":"Proceedings of the 2022 6th International Conference on Innovation in Artificial Intelligence","volume":"102 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132827773","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Weiquan Zhang, Suqin Tang, Danni He, Tinghui Li, Changchun Pan
{"title":"Named Entity Recognition of Zhuang Language Based on the Feature of Initial Letter in Word","authors":"Weiquan Zhang, Suqin Tang, Danni He, Tinghui Li, Changchun Pan","doi":"10.1145/3529466.3529478","DOIUrl":"https://doi.org/10.1145/3529466.3529478","url":null,"abstract":"Named entity recognition is an important task and basis for the intelligent information processing and knowledge representation learning of the Zhuang Language. A BilSTM-CNN-CRF network model combining the uppercase and lowercase characters of words is proposed to be applied to the named entity recognition task of the Zhuang language, which lacks corpus for named entity labeling. Firstly, word2vec is used to train in unmarked Zhuang text to get the word vector of the Zhuang language. Then convolutional neural network is used to extract the character features of Zhuang words, and the character feature vector is obtained. The above two vectors were connected with the initial case feature vectors, which are randomly generated, and then the connected vectors were input into a BilSTM-CNN-CRF model for training; thus, the end-to-end named entity recognition model of Zhuang language was constructed. Experimental results show that, without relying on artificial features and external dictionaries, the proposed method in this study is superior to contrastive models by achieving an 80.37% F1 value in the named entity recognition task, which leads to the realization of automated named entity recognition of Zhuang language.","PeriodicalId":375562,"journal":{"name":"Proceedings of the 2022 6th International Conference on Innovation in Artificial Intelligence","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127709961","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Mixture Density Hyperspherical Generative Adversarial Networks","authors":"Qinyang Li, Wentao Fan","doi":"10.1145/3529466.3529475","DOIUrl":"https://doi.org/10.1145/3529466.3529475","url":null,"abstract":"The Generative Adversarial Networks (GANs) are deep generative models that can generate realistic samples, but they are difficult to train in practice due to the problem of mode collapse, where the generator only repeatedly generates one mode in samples during the learning process, or only generates a small number of modes after reaching the Nash equilibrium during the adversarial training. In order to solve this issue while making the generator contains promising generation ability, we propose a mixture density hyperspherical generative model namely MDH-GAN that combines variational autoencoder (VAE) and generative adversarial network. Unlike most of the GAN-based generative models that consider a Gaussian prior, MDH-GAN adopts the von Mises-Fisher (vMF) prior defined on a unit hypersphere. Our model combines VAE with GAN by integrating the encoder of VAE with GAN to form a jointly training framework. Therefore, the generator of our model can learn data distribution with a hyperspherical latent structure, leading to an improved generative ability of the generator. Moreover, a vMF mixture model is deployed in the discriminator to form a hypersphere space to avoid mode collapse of the model. In our experiments, by calculating the Fréchet Inception distance (FID) between the generated images and real ones, we prove that MDH-GAN has a better ability to generate high-quality images with high diversity.","PeriodicalId":375562,"journal":{"name":"Proceedings of the 2022 6th International Conference on Innovation in Artificial Intelligence","volume":"146 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123321311","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}