{"title":"Research on Handwritten Digital Image Recognition Model Based On Deep Learning and Construction of Browser Service Platform","authors":"Han-Ting Huang, Zhu Chen, Tongyuan Bai, Zhihong Zhao","doi":"10.1145/3512388.3512404","DOIUrl":"https://doi.org/10.1145/3512388.3512404","url":null,"abstract":"At present, the accuracy of OCR scanning is increasing in the digital era, which can increase the work efficiency. In order to make the handwritten digits better recognized, team used 100 collected handwritten digit forms to segment and extract the grid based on OpenCV image recognition technology that obtain 10,000 handwritten digit images. Then, compare the accuracy of homemade handwritten digit dataset with MNIST dataset under different models, the results show that the former can be recognized better than latter. And the small LeNet-5 network is selected as the final model, and the accuracy of the test set is 98.30%. This dataset can be better applied in practical work and life. Based on this model, a handwritten digit recognition website is built using HTML and FLASK framework to provide convenience for users.","PeriodicalId":434878,"journal":{"name":"Proceedings of the 2022 5th International Conference on Image and Graphics Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121605066","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yuting Fu, Dan Xu, Kangjian He, Haipeng Li, Tingting Zhang
{"title":"Image Inpainting Based on Edge Features and Attention Mechanism","authors":"Yuting Fu, Dan Xu, Kangjian He, Haipeng Li, Tingting Zhang","doi":"10.1145/3512388.3512398","DOIUrl":"https://doi.org/10.1145/3512388.3512398","url":null,"abstract":"Image inpainting as a kind important application in our life and entertainment, it also is a popular task of computer vision. The latest deep learning-based approaches have shown promising results for the challenging task of inpainting damaged regions of an image. However, there are still structural differences between the restored images and the ground truth images. Aiming at this problem, we propose a model of image inpainting called ECF-Net, ECF-Net incorporates edge information into the process of image inpainting to help damaged images to obtain more structures similar to the ground truth images, which to guide the generation of the feature of the damaged area. At the same time, we introduce the knowledge consistency attention mechanism in ECF-Net, which can obtain more reasonable semantics to eliminate blurs for image inpainting. Extensive experiments on various datasets such as CelebA-HQ, Places2 and the Paris StreetView clearly demonstrate that our method gets a better performance in vision.","PeriodicalId":434878,"journal":{"name":"Proceedings of the 2022 5th International Conference on Image and Graphics Processing","volume":"92 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122659459","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Tiny Object Detection based on YOLOv5","authors":"Tongyuan Huang, Minhao Cheng, Yuling Yang, Xiangling Lv, Jia Xu","doi":"10.1145/3512388.3512395","DOIUrl":"https://doi.org/10.1145/3512388.3512395","url":null,"abstract":"In view of the poor accuracy of mainstream object detection algorithms in detecting tiny objects, A tiny object detection algorithm based on improved YOLOv5 is proposed. The main feature extraction network of YOLOv5 was modified to generate four feature images to enhance feature extraction of the original input images. Modified the YOLOv5 Neck part, combined with FPN and PANet, carried out feature fusion for four feature maps containing different semantic information, generated better features, and improved the performance of tiny object detection. GIoU loss function was introduced to replace the IoU loss function in the original algorithm to improve the positioning accuracy of tiny objects. Swish activation function was used to replace the original ReLU activation function to better retain target features. The Mosaic data enhancement method was used to enrich the object detection background, and the learning rate cosine annealing attenuation training method was used to dynamically update the learning rate parameters, and the improved YoloV5 algorithm was fused. In this paper, a comparison test is conducted between the original YoloV5 algorithm and CityPrersons data set. Experimental results show that the improved YoloV5 algorithm can effectively improve the detection accuracy of tiny objects.","PeriodicalId":434878,"journal":{"name":"Proceedings of the 2022 5th International Conference on Image and Graphics Processing","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132380598","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Application of Image Similarity Detection based on Typical ITAI Servers","authors":"Su-Jiau Chen, Zhimin Wu, M. Guo, Zhenyu Wang","doi":"10.1145/3512388.3512392","DOIUrl":"https://doi.org/10.1145/3512388.3512392","url":null,"abstract":"In this paper, image detection algorithms based on traditional methods and deep-learning method are migrated from Intel X86 architecture ecosystem to X86 and ARM architectures on the typical localized servers and softwares in Information Technology Application Innovation (ITAI) catalog. By solving the adaptation problems as well as considering the different processor architectures of the localized server, targeted performance optimization methods are proposed, and large-scale performance testing on massive images have been experimented on physical machines, verifying the production capacity of current ITAI servers in image detection scenario.","PeriodicalId":434878,"journal":{"name":"Proceedings of the 2022 5th International Conference on Image and Graphics Processing","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132539081","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Qiang Wang, Jingru Cui, Zunying Qin, Ninggang An, Xiaofei Ma, Guodong Li
{"title":"Using DSCB: A Depthwise Separable Convolution Block Rebuild MTCNN for Face Detection","authors":"Qiang Wang, Jingru Cui, Zunying Qin, Ninggang An, Xiaofei Ma, Guodong Li","doi":"10.1145/3512388.3512389","DOIUrl":"https://doi.org/10.1145/3512388.3512389","url":null,"abstract":"Nowadays, there are huge demands of face detection in images and videos for surveillance, education, autonomous driving and health care. These application scenarios need high accuracy and efficiency of face detection. However, in some scene, unconstrained pose variation, occlusion, large number of faces and illumination bring great challenges to existing face detection methods. In view of above problems, we propose a depthwise separable convolution block (DSCB) which can maintain the speed of training and improve the accuracy at the same time. Then, using the proposed DSCB, we design a face detection model based on MTCNN (Multi-task Convolution Neural Network) to improve performance of occlusion, unconstrained pose variation, large numbers of small targets. In order to better evaluate the proposed method, we built a new dataset which is derived from the classroom teaching scene for training and evaluating. Our dataset consists of 7168 images and 294924 face bounding boxes with occlusion, unconstrained pose variation, and large numbers of small targets. The comparative experiments on our dataset show that the proposed method is superior to other state-of-the-art methods in accuracy and speed of face detection. Compared with the original MTCNN, the face detection method we proposed can bring about 3.9%, 8.66% and 1.39 times overall performance improvement on precision, recall and detection speed respectively.","PeriodicalId":434878,"journal":{"name":"Proceedings of the 2022 5th International Conference on Image and Graphics Processing","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131677874","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Bed-Leaving Action Recognition Based on YOLOv3 and AlphaPose","authors":"Caixia Zhang, Xiaoyu Yang","doi":"10.1145/3512388.3512406","DOIUrl":"https://doi.org/10.1145/3512388.3512406","url":null,"abstract":"Considering a very few of research on bed-leaving action recognition, especially which can recognize the human actions with occlusion, we propose a bed-leaving action recognition algorithm based on YOLOv3 and AlphaPose. Six kinds of specific human actions are classified in the process of leaving the bed, which particularly include the normal bed-exit action(BEA) and the abnormal bed-fall action(BFA). First, YOLOv3 is used to extract the bed region and human region. Then, combined five skeleton key points extracted by AlphaPose with the angle and length measurements, the multi-layer neural network is constructed and trained for classification and recognition. The experiments on our datasets show that the accuracy of BEA and BFA could reach 98.82% and 95.27%, so our method can assist the medical staff to monitor the bed-leaving action.","PeriodicalId":434878,"journal":{"name":"Proceedings of the 2022 5th International Conference on Image and Graphics Processing","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131685896","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Anatomy-guided Multi-View Fusion Framework for Abdominal CT Multi-Organ Segmentation","authors":"Zhongwei Yang, Haopeng Kuang, Xukun Zhang, Yang Liu, Peng Zhai, Lubin Chen, Lihua Zhang","doi":"10.1145/3512388.3512413","DOIUrl":"https://doi.org/10.1145/3512388.3512413","url":null,"abstract":"Multi-organ segmentation from abdominal CT images plays a vital role in clinical practice. However, due to the low contrast of soft tissues in CT images and the significant differences in the shape and appearance of organs, this is a challenging task. In this paper, we propose a two-stage framework based on multi-view fusion to solve this challenge. Specifically, the first stage is to segment the organs in the original abdominal CT image quickly. Based on this, we introduce anatomical knowledge to robustly extract the image region of a single organ. Then, inspired by the clinician's image reading, the organ image blocks from three views are used as the input of the second stage network, and the features from different views are adaptively fused to output accurate segmentation results. We conduct extensive experiments on a public CT dataset, and the experimental results show that our method is accurate and robust to this challenging segmentation task.","PeriodicalId":434878,"journal":{"name":"Proceedings of the 2022 5th International Conference on Image and Graphics Processing","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134478965","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Structure-based Street Tree Extraction from Mobile Laser Scanning Point Clouds","authors":"W. Hao, Zhanbin Zuo, W. Liang","doi":"10.1145/3512388.3512443","DOIUrl":"https://doi.org/10.1145/3512388.3512443","url":null,"abstract":"∗ We present an automatic method based on structure analysis for extracting street trees from mobile laser scanning (MLS) data. Tree trunk and canopy can be characterized by the shape information and height above the ground level. Therefore, MLS point clouds are firstly divided into three layers (ground, low layer above the ground, high layer above the ground) with respect to the vertical height. For the points above the ground, the DBSCAN (Density-Based Spatial Clustering of Applications with Noise) clustering method is applied to cluster the points into segments and geometrical features are used to extract trunk candidates and canopy candidates. Then, a “structure-matching” strategy is proposed based on minimum bounding rectangle (MBR) mapping to extract the tree candidates. Finally, a slicing method based on axial symmetry is proposed to segment the overlapped canopy. The experimental results show that the proposed method can quickly extract individual trees from massive street point clouds.","PeriodicalId":434878,"journal":{"name":"Proceedings of the 2022 5th International Conference on Image and Graphics Processing","volume":"74 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115959275","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Vision-based Monitoring System for Quality Assessment of Fused Filament Fabrication (FFF) 3D Printing","authors":"Jingdong Li, Wei Quan, L. Shark, H. Brooks","doi":"10.1145/3512388.3512424","DOIUrl":"https://doi.org/10.1145/3512388.3512424","url":null,"abstract":"As one of the most popular 3D printing technology, Fused Filament Fabrication (FFF) allows intricate structures to be produced without complex manufacturing processes. However, there is a limitation of the currently available FFF 3D printers which print blindly without an ability to detect and stop upon printing deviations, incurring additional running costs due to unnecessary waste of materials and time. This has led to a novel development reported in this paper of a vision-based monitoring system for the quality assessment of 3D printing by applying advanced computer vision algorithms and imaging processing techniques. The proposed approach is through comparison between actual images of the printed layer and simulated images created by slicing CAD model via G-code generation based on the calibrated camera pose. Also presented are feature extraction methods to yield object dimension, profile and infill for quality assessment, with the system performance demonstrated based on various object geometries. Using this system makes it possible to analyze and examine the quality of 3D printing during the print process, which could identify the defective printed parts, terminate the whole process and alert the users for time and cost-savings.","PeriodicalId":434878,"journal":{"name":"Proceedings of the 2022 5th International Conference on Image and Graphics Processing","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116794128","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Consistency Mean-Teaching for Unsupervised Domain Adaptive Person Re-identification","authors":"Sheng-Hsiang Yu, Shengjin Wang","doi":"10.1145/3512388.3512451","DOIUrl":"https://doi.org/10.1145/3512388.3512451","url":null,"abstract":"Unsupervised domain adaptive (UDA) person re-identification (re-ID) transfers a labeled source domain model to an unlabeled target domain. In this paper, we propose a Consistency Mean Teaching (CMT) method to improve clustering-based UDA Re-ID. Our CMT consists of two consistencies, i.e., an inter-view consistency and an intra-identity consistency. First, the inter-view consistency exploits a popular self-supervised training for UDA, which has been neglected in existing clustering-based UDA methods. Second, the intra-identity consistency imposes regulations between the teacher and student model to output consistent representations for different samples from the same identity. Third, these two consistencies are integrated into a single student-teacher framework and facilitate complementary benefits. Experimental results show that CMT brings significant improvements over the baseline and achieves competitive accuracy on four popular UDA re-ID.","PeriodicalId":434878,"journal":{"name":"Proceedings of the 2022 5th International Conference on Image and Graphics Processing","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125399552","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}