{"title":"Optic disc segmentation in retinal fundus images using improved CE-Net","authors":"Yingxue Wang, Lin Huang","doi":"10.1117/12.2643259","DOIUrl":"https://doi.org/10.1117/12.2643259","url":null,"abstract":"Diabetic retinopathy is one of the main complications of diabetes and the most important factor leading to blindness in the late stage of the disease. It often manifests as one or more lesions in clinical diagnosis. In order to reduce the difficulty of detection, it is of great significance to segment the optic disc in retinal images. This paper proposes an improved context encoding network architecture (CE-Net) for segmentation of the optic disc portion in diabetic retinal images. The network architecture is divided into three parts: feature encoder module, context extractor module and feature decoder module. The context extractor module consists of an improved dense atrous convolutional block (DAC) and residual multi-kernel pooling (RMP). Experimental result shows that the optimal network model generated by the improved CE-Net architecture has good performance on the Indian Diabetic Retinopathy Image Dataset (IDRID), and compared with other methods, our method has the lowest mean overlap error and the highest accuracy and sensitivity.","PeriodicalId":314555,"journal":{"name":"International Conference on Digital Image Processing","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128467032","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Track initiation algorithm for bearing-only target tracking in complex background","authors":"Hao Wang, Weihua Wang","doi":"10.1117/12.2643464","DOIUrl":"https://doi.org/10.1117/12.2643464","url":null,"abstract":"Aiming at the difficulty of the target track initiation with bearing-only and no distance information under the complex background, an improved heuristic track initiation algorithm is proposed. Based on the motion characteristics of the target in the azimuth-pitch coordinate system, the motion trajectory of the target and clutter distribution are modeled. Combining the heuristic track initiation algorithm with Kalman filtering can build target track rapidly under complex background and ensure a higher probability of truth track. This method meets the full performance of the heuristic track initiation algorithm, and has a high probability of correct track in the case of multiple clutter numbers or high clutter density. Finally, the simulation is performed to verify the effectiveness of the proposed algorithm.","PeriodicalId":314555,"journal":{"name":"International Conference on Digital Image Processing","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131925344","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hongli Xiao, Bingshu Wang, Jiangbin Zheng, Jin Fang, Zhulin Liu, C. L. P. Chen
{"title":"Masked facial region recognition using human pose estimation and broad learning system","authors":"Hongli Xiao, Bingshu Wang, Jiangbin Zheng, Jin Fang, Zhulin Liu, C. L. P. Chen","doi":"10.1117/12.2643023","DOIUrl":"https://doi.org/10.1117/12.2643023","url":null,"abstract":"COVID-19 and its variants have been posing a large risk to people around the world since the outbreak of the disease. Many techniques like AI are explored to help combat epidemics. People are required or forced to wear a mask to fight against COVID-19 epidemics worldwide. It brings new challenges to the task of masked facial region recognition. When facial regions are occluded by masks, it will result in some failures of face detection algorithms. In this paper, we propose a method to recognize masked faces. It mainly includes three parts. Firstly, the human pose is estimated to produce a series of key points. It is implemented by OpenPose. Secondly, a key-points location strategy is designed to capture the masked facial regions. It can locate the positions of faces accurately. Thirdly, the broad learning system, which is also an incremental learning algorithm, is employed to recognize the classes of candidate regions. Experiments conducted on some datasets shed light on the effectiveness of the proposed method.","PeriodicalId":314555,"journal":{"name":"International Conference on Digital Image Processing","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132083957","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Identification and localisation of multiple weeds in grassland for removal operation","authors":"Jinjin Wang, Xiaopeng Yao, B. Nguyen","doi":"10.1117/12.2644281","DOIUrl":"https://doi.org/10.1117/12.2644281","url":null,"abstract":"Weeds are a common issue in agriculture. Image-based weed identification has regained popularity in recent years as computing power increases. Researchers have successfully applied weed detection in the crop field and have combined the sensor (e.g.camera) and mechanical such as robotic weeders to get the location of the weeds. Meanwhile, many studies also have been conducted on the two classifications between grass and weed. However, there is no excellent and comprehensive weed dataset in reality because weeds are always similar and difficult to obtain by non-specialists. Moreover, it is challenging to identify weeds from grasslands for their similar colors, sizes, and shapes. We investigate three weeds (Bitter Gentian, Hawk's Beard, Pedunculate) relatively common in grasslands. Then, we select the typical grassland dominated by the above weeds for data collection. A natural and effective dataset is built and has generality in the scene of actual grassland. Secondly, we extract image features, including Color, Histogram, and orientation gradient histogram (HOG), and make various combinations to accurately and comprehensively reflect the actual characteristics of weeds. Thirdly, we propose a \"core zone\" algorithm to locate the weeds. The algorithm mainly adopts technology in image processing, such as threshold segmentation and morphological transformations. Experiments show that our binary classifier is more accurate than the comparison method, and the accuracy of the multi-classifier is also high. In addition, the algorithm for weeds location is more efficient than the comparative method.","PeriodicalId":314555,"journal":{"name":"International Conference on Digital Image Processing","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133912599","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Single image 3D scene reconstruction based on ShapeNet models","authors":"Xue Chen, Yifan Ren, Yaoxu Song","doi":"10.1117/12.2645274","DOIUrl":"https://doi.org/10.1117/12.2645274","url":null,"abstract":"The 3D scene reconstruction task is the basis for implementing mixed reality, but traditional single-image scene reconstruction algorithms are difficult to generate regularized models. It is believed that this situation is caused by a lack of prior knowledge, so we try to introduce the model collection ShapeNet 1 to solve this problem. Besides, our approach incorporates traditional model generation algorithms. The predicted artificial indoor objects as indicators will match models in ShapeNet. The refined models selected from ShapeNet will then replace the rough ones to produce the final 3D scene. These selected models from the model library will greatly improve the aesthetics of the reconstructed 3D scene. We test our method on the NYU-v2 2 dataset and achieve pleasing results. Our project is publicly available at https://sjtu-cv- 2021.github.io/Single-Image-3D-Reconstruction-Based-On-ShapeNet.","PeriodicalId":314555,"journal":{"name":"International Conference on Digital Image Processing","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124764240","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Improved video classification method based on non-parametric attention combined with self-supervision","authors":"Xuchao Gong, Zongmin Li","doi":"10.1117/12.2643038","DOIUrl":"https://doi.org/10.1117/12.2643038","url":null,"abstract":"It is worth mentioning that in the video sequence modeling, the best recognition architecture is transformer. The current popular transformer based video classification methods focus on the importance of current features in time sequence. The degree of characterization of simultaneous order is insufficient, and simple data augmentation has unstable classification effect. In this paper we proposed a method of non-parametric attention combined with self-supervised feature construction to further improve video classification. In this method, the non-parametric attention mechanism is constructed in the simultaneous order feature to fit the multi-local extreme value distribution. At the same time, in the process of model learning, the input video is randomly masked in temporal domain and spatial domain, and self-supervised information is added to effectively learn the details and classification information of video content. Experiments using kinetics400, kinetics600 and something V2 datasets show that the algorithm in this paper has better improvement in accuracy than the current optimal method.","PeriodicalId":314555,"journal":{"name":"International Conference on Digital Image Processing","volume":"111 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124099501","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lei Shi, Jing Xu, Yameng Zhang, Guohua Zhao, Yufei Gao
{"title":"Weakly supervised deep learning for cervical histopathology images analysis","authors":"Lei Shi, Jing Xu, Yameng Zhang, Guohua Zhao, Yufei Gao","doi":"10.1117/12.2644291","DOIUrl":"https://doi.org/10.1117/12.2644291","url":null,"abstract":"Cervical cancer is the second most common malignancy in women, while is prevented through diagnosing and treating cervical precancerous lesions. Clinically, histopathological image analysis is recognized as the gold standard for diagnosis. However, the diagnosis of cervical precancerous lesions is challenging due to the massive size of whole slide images and subjective grading without precise quantification criteria. Most existing computer aided diagnosis approaches are patches-based, first learning patch-wise features and then aggregating these local features to infer the final prediction. Cropping pathology images into patches restrains the contextual information available to those networks, causing failing to learn clinically relevant structural representations. To address the above problems, this paper proposes a novel weakly supervised learning method called general attention network (GANet) for grading cervical precancerous lesions. A bag-of-instances pattern is introduced to overcome the limitation of the high resolution of whole slide images. Moreover, based on two transformer blocks, the proposed model is able to encode the dependencies among bags and instances that are beneficial to capture much more informative contexts, and thus produce more discriminative WSI descriptors. Finally, extensive experiments are conducted on a public cervical histology dataset and the results show that GANet achieves the state-of-the-art performance.","PeriodicalId":314555,"journal":{"name":"International Conference on Digital Image Processing","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130881392","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yuhan Chen, Rita Tse, Michael Bosello, Davide Aguiari, Su-Kit Tang, Giovanni Pau
{"title":"Enabling deep reinforcement learning autonomous driving by 3D-LiDAR point clouds","authors":"Yuhan Chen, Rita Tse, Michael Bosello, Davide Aguiari, Su-Kit Tang, Giovanni Pau","doi":"10.1117/12.2644369","DOIUrl":"https://doi.org/10.1117/12.2644369","url":null,"abstract":"Autonomous driving holds the promise of revolutionizing our lives and society. Robot drivers will run errands such as commuting, parking cars, or taking kids to school. It is expected that, by the mid-century, humans will drive only for their pleasure. Autonomous vehicles will increase the efficiency and safety of the transportation system by reducing accidents and increasing the overall system capacity. Current autonomous driving systems are based on supervised learning that relies on massive, labeled data. It takes a lot of time, resources, and manpower to produce such data sets. While this approach is achieving remarkable results, the required effort to produce data becomes a limiting factor for general driving scenarios. This research explores Reinforcement Learning to advance autonomous driving models without labeled data. Reinforcement Learning is a learning paradigm that uses the concept of rewards to autonomously discover, through trial & error, how to solve a task. This work uses the LiDAR sensor as a case study to explore the effectiveness of Reinforcement Learning in interpreting complex data. LiDARs provide a dynamic high time-space definition map of the environment and it could be one of the key sensors for autonomous driving.","PeriodicalId":314555,"journal":{"name":"International Conference on Digital Image Processing","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128184329","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A collaborative spectrum sensing algorithm for cognitive radio based on related vector machine","authors":"Baolong Yuan, Yi Ning, F. Kan","doi":"10.1117/12.2644619","DOIUrl":"https://doi.org/10.1117/12.2644619","url":null,"abstract":"Due to the presence of tall buildings, mountains and other high occlusions in mountainous cities, this will produce fading phenomena, which will result in weak or even unrecognizable signals from the main users. To address this problem, a Related Vector Machine (RVM) based spectrum sensing method is proposed in this paper. First, the cognitive radio users (CR users) selection mechanism based on location correlation is designed, and some CR users with the best sensing performance are selected to participate in the sensing of the primary user (PU). Second, some parameters that reflect the characteristics of the PU signal are selected as the sample parameters. Finally, the signal samples received for both the presence and absence of the PU are sensed by using RVM. The experimental results show that the proposed algorithm has high classification detection performance in each low signal-to-noise ratio case, and effectively realizes the perception of the PU signal.","PeriodicalId":314555,"journal":{"name":"International Conference on Digital Image Processing","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128081280","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Texture based adaptive computational resource allocation for fast AVS3 inter coding","authors":"Jianing Chen","doi":"10.1117/12.2644285","DOIUrl":"https://doi.org/10.1117/12.2644285","url":null,"abstract":"The newest Audio Video Coding Standard (AVS3) generation provides better coding efficiency than its predecessor, where two new partitioning structures, i.e., Extend Quad-Tree (EQT) and Binary-Tree (BT), are adopted. Although these split tools bring remarkable coding performance, for the price of increasing of computational coding complexity. For the popular conference video applications, experiments show that the EQT or BT split times in different regions are quite different, which indicates that it is unnecessary to provide all partitioning candidate modes in different area. In this work, an effective partitioning resource allocation method is proposed to reduce computational complexity while guaranteeing the coding performance. Specifically, a Decision Tree (DT) model is trained to determine available partitioning modes for current Coding Unit (CU), where input features are the histogram, sobel texture and average residual difference between current and reference CU, along with the size of CU. The training data are selected from different test sequences of AVS and Joint Video Experts Team Common Test Conditions (JCT) sequences, which are identified by the Structural Similarity (SSIM). The experiments on 720p and Common Intermediate Format (CIF) sequences, implemented on platform of AVS3 reference software HPM-9.1, under Low Delay B (LB) configuration, show the efficiency of the proposed method, which can achieve more than 40.0% computational complexity reduction, and BDBR loss is less than 2.0%.","PeriodicalId":314555,"journal":{"name":"International Conference on Digital Image Processing","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127704892","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}