{"title":"A hardware architecture of skip/direct mode for AVS3","authors":"Yingbo Wen, Guoqing Xiang, Yunyao Yan, Xizhong Zhu, Xiaofeng Huang, Peng Zhang, Wei Yan","doi":"10.1117/12.2643010","DOIUrl":"https://doi.org/10.1117/12.2643010","url":null,"abstract":"Skip/direct mode is one of the inter prediction modes in video coding, which achieves a high coding performance. In Audio and Video coding Standard-3(AVS3), skip/direct has improved more performance with more candidate modes. The candidate mode list is generated by numerous prediction directions with corresponded predicted motion vectors. However, it will result in higher computation complexities and challenges to parallel computation, especially for the hardware implementation. For resolving the problem, we propose a hardware architecture of skip/direct mode with a fast motion vector prediction (MVP) algorithm in this paper. Our architecture is designed with efficient pipeline schedules. And the fast MVP algorithm can reduce the number of MVP candidates efficiently. The fast MVP method is introduced by setting a search window, some unnecessary MVP are skipped, thereby reducing the computational complexity firstly. Then the proposed hardware architecture is given with efficient pipeline schedules in detail. The experimental results show that our architecture is able to meet the requirement of 3840x2160@60FPS with only 0.48% and 0.42% BD-Rate increase under the low delay P (LDP) and random access (RA) configurations, respectively.","PeriodicalId":314555,"journal":{"name":"International Conference on Digital Image Processing","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123411919","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"No-reference stereoscopic video quality assessment based on Tchebichef moment","authors":"Yuxin Chen, Ming-Chang Wen","doi":"10.1117/12.2644706","DOIUrl":"https://doi.org/10.1117/12.2644706","url":null,"abstract":"We propose a no-reference (NR) stereoscopic video quality assessment (SVQA) model based on Tchebichef moment in this paper. Specifically, we extract keyframes according to mutual information between adjacent frames, and then the extracted keyframes are segmented to patches to calculate low-order Tchebichef moments. Since the strong description ability of Tchebichef moment, and different order of Tchebichef moment can represent independent features with minimal information redundancy, we extract statistical features of Tchebichef moment on computed patches as spatial features. Considering the influence of distortions in spatiotemporal domain to video quality, we use the three-dimensional derivative of Gaussian filters to calculate the spatiotemporal energy responses and extract statistical features from the responses as spatiotemporal features. Finally, we combine the spatial and spatiotemporal features to predict the quality of stereoscopic videos. The proposed model is evaluated on the NAMA3DS1-COSPAD1, SVQA and Waterloo IVC phase I databases. The experimental results show that the proposed model achieved competitive performance as compared with existing SVQA models.","PeriodicalId":314555,"journal":{"name":"International Conference on Digital Image Processing","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123968154","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"ARO-DeepSFM: deep structure-from-motion with alternating recursive optimization","authors":"Rongcheng Cui, Haoyuan Huang","doi":"10.1117/12.2644363","DOIUrl":"https://doi.org/10.1117/12.2644363","url":null,"abstract":"Structure from Motion (SfM) is the cornerstone of 3D reconstruction and visualization of SLAM. Existing deep learning approaches formulate problems by restoring absolute pose ratios from two consecutive frames or predicting a depth map from a single image, both of which are unsuitable problems. In order to solve this maladaptation problem and further tap the potential of neural networks in SfM, this paper proposes a new optimization model for deep motion structure recovery based on recurrent neural networks. The model consists of two architectures based on depth and posture estimation of costs, and is constantly iteratively updated alternately to improve both systems. The neural optimizer designed here tracks historical information during iterations to minimize feature metric cost update depth and camera poses. Experiments show that the optimization model of deep motion structure recovery in this paper is superior to the previous method, effectively reducing the cost of feature-metric, while refining depth and poses.","PeriodicalId":314555,"journal":{"name":"International Conference on Digital Image Processing","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125872692","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
K. Przybył, J. Wawrzyniak, K. Samborska, Ł. Gierz, K. Koszela, M. Szychta
{"title":"Application of artificial neural networks in recognizing carrier based on the color of raspberry powders obtained in the spray-drying process","authors":"K. Przybył, J. Wawrzyniak, K. Samborska, Ł. Gierz, K. Koszela, M. Szychta","doi":"10.1117/12.2645926","DOIUrl":"https://doi.org/10.1117/12.2645926","url":null,"abstract":"Fruit juices and vegetable and fruit juices are the products, which provide our bodies with a lot of valuable and nutritional ingredients and play a major role in prevention of numerous illnesses. Raspberries are the valuable source of bioactive compounds. As part of preserving food, whose main aim is to extend stability of products obtained only in season, the researchers took advantage of spray drying technique. In the research part of the study, research samples were prepared in the form of raspberry powders obtained from the process of dehumidified spray drying. Because of the research, a neural model was made, which supported the evaluation of the quality of detecting powder samples based on their color. The devised neural network reached classification accuracy at 0.924.","PeriodicalId":314555,"journal":{"name":"International Conference on Digital Image Processing","volume":"150 7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130960427","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Implementation of stereo matching algorithm based on Xavier edge computing platform","authors":"Shuting Wang, Chao Xu","doi":"10.1117/12.2644383","DOIUrl":"https://doi.org/10.1117/12.2644383","url":null,"abstract":"In view of the existing high-precision stereo matching based on deep learning which network structure is complex, and it is difficult to deploy and run in real time on edge platform. An improved stereo matching algorithm based on RTStereoNet is proposed. Firstly, the channel attention mechanism is introduced in the matching cost aggregation stage of RTStereoNet, so that the network can adaptively enhance the extraction of effective information and reduce the ambiguity of matching. Secondly, in the disparity refinement stage of RTStereoNet, the color image is introduced to compensate for the loss of details caused by the large-scale downsampling of the network, and a lightweight disparity refinement module is constructed to expand the receptive field of the network. In addition, based on Jetson Xavier NX edge computing module, a special edge computing platform is constructed, with the help of TensorRT inference framework, the calculation support problem of special operators is solved through CUDA programming, and achieved deployment acceleration on the platform for both models before and after the improvement. The results show that after the accelerated deployment, the inference speed of the improved model can reach 30 fps on the KITTI2015 test set, and the improved model has higher accuracy than the original model.","PeriodicalId":314555,"journal":{"name":"International Conference on Digital Image Processing","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128844782","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"FSC-UNet: a lightweight medical image segmentation algorithm fused with skip connections","authors":"Yixin Chen, Jianjun Zhang, Xulin Zong, Zhipeng Zhao, Hanqing Liu, Ruichun Tang, Peishun Liu, Jinyu Wang","doi":"10.1117/12.2644360","DOIUrl":"https://doi.org/10.1117/12.2644360","url":null,"abstract":"In order to study the effect of skip connections to segmentation performance in encoder and decoder networks, in this paper, we improve the skip connections of U-Net model and adopt the method of sub-module fusion connection. We fuse the high and low layers of the encoder by multi-head attention. Fusion is performed separately, and the fusion result is connected to the decoder. Considering that different input images have different effects to model training due to factors such as noise, we set the threshold by calculating the Euclidean distance between the image and the mask during training, so that different images use different skip connection methods. Experiments on Cell nuclei, Synapse, Heart, Chaos datasets show that FSC-UNet algorithm this paper proposed has better results than existing algorithms.","PeriodicalId":314555,"journal":{"name":"International Conference on Digital Image Processing","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126672535","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"No-reference video quality assessment using data dimensionality reduction and attention-based pooling","authors":"Zhiwei Wang, Linjing Lai","doi":"10.1117/12.2643807","DOIUrl":"https://doi.org/10.1117/12.2643807","url":null,"abstract":"This paper proposes a new end-to-end no-reference (NR) video quality assessment (VQA) algorithm that makes use of dimensionality reduction and attention-based pooling. Firstly, the dataset is expanded through data enhancement based on frame sampling. Secondly, the cropped video blocks are input into the trainable data dimensionality reduction module which adopts 3D convolution to reduce the dimension of the data. Then, the dimensionality reduced data is input into the backbone of the algorithm to extract spatial features. The extracted features are pooled through attention-based pooling. Finally, the pooled features are regressed to the quality score through the full connection layer. Experimental results show that the proposed algorithm has achieved competitive performance on the LIVE, LIVE Mobile and CVD2014 datasets, and has low complexity.","PeriodicalId":314555,"journal":{"name":"International Conference on Digital Image Processing","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124058517","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Hardware-friendly fast rate-distortion optimized quantization algorithm for AVS3","authors":"Jinchang Xu, Guoqing Xiang, Yunyao Yan, Yingbo Wen, Xiaofeng Huang, Peng Zhang, Wei Yan","doi":"10.1117/12.2643000","DOIUrl":"https://doi.org/10.1117/12.2643000","url":null,"abstract":"Rate-distortion optimized quantization (RDOQ) is an important technique in the video coding standard, which effectively improves encoding efficiency. However, the large compute complexity and the strong data dependency in the RDOQ calculation process limit the real-time encoding in hardware design. In this paper, a fast RDOQ algorithm is proposed, which includes the RDOQ skip algorithm and the optimized rate estimation algorithm. Firstly, by detecting the Pseudo all-zero block (PZB) in advance, some unnecessary RDOQ processes are skipped, thereby reducing the computational complexity. Secondly, by optimizing the elements used in rate estimation of the RDOQ process, the strong data dependency of the process is alleviated, which allows RDOQ to be executed in parallel. Experimental results show that the proposed algorithm reduces 27.6% and 30.6% encoding time with only average 0.3% and 0.1% BD-rate performance loss under low delay P and random access configurations on the HPM-4.0.1 of AVS3, respectively.","PeriodicalId":314555,"journal":{"name":"International Conference on Digital Image Processing","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124317233","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ye Cai, Lan Luo, Hongxia Gao, Shicheng Niu, Weipeng Yang, Tian Qi, Guoheng Liang
{"title":"Haze removal using a hybrid convolutional sparse representation model","authors":"Ye Cai, Lan Luo, Hongxia Gao, Shicheng Niu, Weipeng Yang, Tian Qi, Guoheng Liang","doi":"10.1117/12.2643362","DOIUrl":"https://doi.org/10.1117/12.2643362","url":null,"abstract":"Haze removal is a challenging task in image recovery, because hazy images are always degraded by turbid media in atmosphere, showing limited visibility and low contrast. Analysis Sparse Representation (ASR) and Synthesis Sparse Representation (SSR) has been widely used to recover degraded images. But there are always unexpected noise and details loss in the recovered images, as they take relatively less account of the images’ inherent coherence between image patches. Thus, in this paper, we propose a new haze removal method based on hybrid convolutional sparse representation, with consideration of the adjacent relationship by convolution and superposition. To integrate optical model into a convolutional sparse framework, we separate transmission map by transforming it into logarithm domain. And then a structure-based constraint on transmission map is proposed to maintain piece-wise smoothness and reduce the influence brought by pseudo depth abrupt edges. Experiment results demonstrate that the proposed method can restore fine structure of hazy images and suppress boosted noise.","PeriodicalId":314555,"journal":{"name":"International Conference on Digital Image Processing","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133529156","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Qing He, Qianqian Yang, Yinfeng Xia, Sifan Peng, B. Yin
{"title":"Attention-guided feature fusion network for crowd counting","authors":"Qing He, Qianqian Yang, Yinfeng Xia, Sifan Peng, B. Yin","doi":"10.1117/12.2643005","DOIUrl":"https://doi.org/10.1117/12.2643005","url":null,"abstract":"How to solve the scale variation and background interference faced by crowd counting algorithms in practical applications is still an open problem. In this paper, to tackle the above problems, we propose the Attention-guided Feature Fusion Network (AFFNet) to learn the mapping between the crowd image and density map. In this network, the Channel-attentive Receptive Field Block (CRFB) is constructed by parallel convolutional layers with different expansion rates to extract multi-scale features. By adopting attention masks generated by high-level features to adjust low-level features, the Feature Fusion Module (FFM) can alleviate the background interference problem at the feature level. In addition, the Double Branch Module (DBM) generates a density estimation map, which further erases the background interference problem at the density level. Extensive experiments conducted on several challenging benchmark datasets including ShanghaiTech, UCF-QNRF and JHU-CROWD++ demonstrate our proposed method is superior to the state-of-the-art approaches.","PeriodicalId":314555,"journal":{"name":"International Conference on Digital Image Processing","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128286050","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}