{"title":"Face tampering detection based on spatiotemporal attention residual network","authors":"Z. Cai, Weimin Wei, Fanxing Meng, Changan Liu","doi":"10.1117/12.2644654","DOIUrl":"https://doi.org/10.1117/12.2644654","url":null,"abstract":"Fake technology has evolved to the point where fake faces are increasingly difficult to distinguish from real ones. If the forged face videos spread wildly on social media, social unrest or personal reputation damage may lead to social unrest. A face tampering detection method (RALNet) with spatiotemporal attention residual network is designed to reduce the misuse of face data due to malicious dissemination. Firstly, we propose a process to extract video face data, which reduces the interference of irrelevant information and improves the utilization of data processing. Then, based on the characteristics of incoherence and inconsistency in spatial and temporal information of tampered videos, the spatial domain features and temporal domain features of the target face video are extracted by introducing an attention mechanism of residual network and long short-term memory network to classify the targets as true or fake. The experimental results show that the method can effectively detect whether the face data is tampered, and its detection accuracy is better than other methods. In addition, it also achieves good performance in terms of recall, precision, and F1 score.","PeriodicalId":314555,"journal":{"name":"International Conference on Digital Image Processing","volume":"377 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115174057","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A hardware architecture of skip/direct mode for AVS3","authors":"Yingbo Wen, Guoqing Xiang, Yunyao Yan, Xizhong Zhu, Xiaofeng Huang, Peng Zhang, Wei Yan","doi":"10.1117/12.2643010","DOIUrl":"https://doi.org/10.1117/12.2643010","url":null,"abstract":"Skip/direct mode is one of the inter prediction modes in video coding, which achieves a high coding performance. In Audio and Video coding Standard-3(AVS3), skip/direct has improved more performance with more candidate modes. The candidate mode list is generated by numerous prediction directions with corresponded predicted motion vectors. However, it will result in higher computation complexities and challenges to parallel computation, especially for the hardware implementation. For resolving the problem, we propose a hardware architecture of skip/direct mode with a fast motion vector prediction (MVP) algorithm in this paper. Our architecture is designed with efficient pipeline schedules. And the fast MVP algorithm can reduce the number of MVP candidates efficiently. The fast MVP method is introduced by setting a search window, some unnecessary MVP are skipped, thereby reducing the computational complexity firstly. Then the proposed hardware architecture is given with efficient pipeline schedules in detail. The experimental results show that our architecture is able to meet the requirement of 3840x2160@60FPS with only 0.48% and 0.42% BD-Rate increase under the low delay P (LDP) and random access (RA) configurations, respectively.","PeriodicalId":314555,"journal":{"name":"International Conference on Digital Image Processing","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123411919","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"No-reference stereoscopic video quality assessment based on Tchebichef moment","authors":"Yuxin Chen, Ming-Chang Wen","doi":"10.1117/12.2644706","DOIUrl":"https://doi.org/10.1117/12.2644706","url":null,"abstract":"We propose a no-reference (NR) stereoscopic video quality assessment (SVQA) model based on Tchebichef moment in this paper. Specifically, we extract keyframes according to mutual information between adjacent frames, and then the extracted keyframes are segmented to patches to calculate low-order Tchebichef moments. Since the strong description ability of Tchebichef moment, and different order of Tchebichef moment can represent independent features with minimal information redundancy, we extract statistical features of Tchebichef moment on computed patches as spatial features. Considering the influence of distortions in spatiotemporal domain to video quality, we use the three-dimensional derivative of Gaussian filters to calculate the spatiotemporal energy responses and extract statistical features from the responses as spatiotemporal features. Finally, we combine the spatial and spatiotemporal features to predict the quality of stereoscopic videos. The proposed model is evaluated on the NAMA3DS1-COSPAD1, SVQA and Waterloo IVC phase I databases. The experimental results show that the proposed model achieved competitive performance as compared with existing SVQA models.","PeriodicalId":314555,"journal":{"name":"International Conference on Digital Image Processing","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123968154","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Hardware-friendly fast rate-distortion optimized quantization algorithm for AVS3","authors":"Jinchang Xu, Guoqing Xiang, Yunyao Yan, Yingbo Wen, Xiaofeng Huang, Peng Zhang, Wei Yan","doi":"10.1117/12.2643000","DOIUrl":"https://doi.org/10.1117/12.2643000","url":null,"abstract":"Rate-distortion optimized quantization (RDOQ) is an important technique in the video coding standard, which effectively improves encoding efficiency. However, the large compute complexity and the strong data dependency in the RDOQ calculation process limit the real-time encoding in hardware design. In this paper, a fast RDOQ algorithm is proposed, which includes the RDOQ skip algorithm and the optimized rate estimation algorithm. Firstly, by detecting the Pseudo all-zero block (PZB) in advance, some unnecessary RDOQ processes are skipped, thereby reducing the computational complexity. Secondly, by optimizing the elements used in rate estimation of the RDOQ process, the strong data dependency of the process is alleviated, which allows RDOQ to be executed in parallel. Experimental results show that the proposed algorithm reduces 27.6% and 30.6% encoding time with only average 0.3% and 0.1% BD-rate performance loss under low delay P and random access configurations on the HPM-4.0.1 of AVS3, respectively.","PeriodicalId":314555,"journal":{"name":"International Conference on Digital Image Processing","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124317233","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Blind image quality assessment based on transformer","authors":"Linxin Li, Chu Chen, Naixuan Zhao","doi":"10.1117/12.2643493","DOIUrl":"https://doi.org/10.1117/12.2643493","url":null,"abstract":"Transformer has achieved milestones in natural language processing (NLP). Due to its excellent global and remote semantic information interaction performance, it has gradually been applied in vision tasks. In this paper, we propose PTIQ, which is a pure Transformer structure for Image Quality Assessment. Specifically, we use Swin Transformer Blocks as backbone to extract image features. The extracted feature vectors after extra state embedding and position embedding are fed into the original transformer encoder. Then, the output is passed to the MLP head to predict quality score. Experimental results demonstrate that the proposed architecture achieves outstanding performance.","PeriodicalId":314555,"journal":{"name":"International Conference on Digital Image Processing","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124943319","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"No-reference video quality assessment using data dimensionality reduction and attention-based pooling","authors":"Zhiwei Wang, Linjing Lai","doi":"10.1117/12.2643807","DOIUrl":"https://doi.org/10.1117/12.2643807","url":null,"abstract":"This paper proposes a new end-to-end no-reference (NR) video quality assessment (VQA) algorithm that makes use of dimensionality reduction and attention-based pooling. Firstly, the dataset is expanded through data enhancement based on frame sampling. Secondly, the cropped video blocks are input into the trainable data dimensionality reduction module which adopts 3D convolution to reduce the dimension of the data. Then, the dimensionality reduced data is input into the backbone of the algorithm to extract spatial features. The extracted features are pooled through attention-based pooling. Finally, the pooled features are regressed to the quality score through the full connection layer. Experimental results show that the proposed algorithm has achieved competitive performance on the LIVE, LIVE Mobile and CVD2014 datasets, and has low complexity.","PeriodicalId":314555,"journal":{"name":"International Conference on Digital Image Processing","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124058517","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
K. Przybył, J. Wawrzyniak, K. Samborska, Ł. Gierz, K. Koszela, M. Szychta
{"title":"Application of artificial neural networks in recognizing carrier based on the color of raspberry powders obtained in the spray-drying process","authors":"K. Przybył, J. Wawrzyniak, K. Samborska, Ł. Gierz, K. Koszela, M. Szychta","doi":"10.1117/12.2645926","DOIUrl":"https://doi.org/10.1117/12.2645926","url":null,"abstract":"Fruit juices and vegetable and fruit juices are the products, which provide our bodies with a lot of valuable and nutritional ingredients and play a major role in prevention of numerous illnesses. Raspberries are the valuable source of bioactive compounds. As part of preserving food, whose main aim is to extend stability of products obtained only in season, the researchers took advantage of spray drying technique. In the research part of the study, research samples were prepared in the form of raspberry powders obtained from the process of dehumidified spray drying. Because of the research, a neural model was made, which supported the evaluation of the quality of detecting powder samples based on their color. The devised neural network reached classification accuracy at 0.924.","PeriodicalId":314555,"journal":{"name":"International Conference on Digital Image Processing","volume":"150 7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130960427","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Qing He, Qianqian Yang, Yinfeng Xia, Sifan Peng, B. Yin
{"title":"Attention-guided feature fusion network for crowd counting","authors":"Qing He, Qianqian Yang, Yinfeng Xia, Sifan Peng, B. Yin","doi":"10.1117/12.2643005","DOIUrl":"https://doi.org/10.1117/12.2643005","url":null,"abstract":"How to solve the scale variation and background interference faced by crowd counting algorithms in practical applications is still an open problem. In this paper, to tackle the above problems, we propose the Attention-guided Feature Fusion Network (AFFNet) to learn the mapping between the crowd image and density map. In this network, the Channel-attentive Receptive Field Block (CRFB) is constructed by parallel convolutional layers with different expansion rates to extract multi-scale features. By adopting attention masks generated by high-level features to adjust low-level features, the Feature Fusion Module (FFM) can alleviate the background interference problem at the feature level. In addition, the Double Branch Module (DBM) generates a density estimation map, which further erases the background interference problem at the density level. Extensive experiments conducted on several challenging benchmark datasets including ShanghaiTech, UCF-QNRF and JHU-CROWD++ demonstrate our proposed method is superior to the state-of-the-art approaches.","PeriodicalId":314555,"journal":{"name":"International Conference on Digital Image Processing","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128286050","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Implementation of stereo matching algorithm based on Xavier edge computing platform","authors":"Shuting Wang, Chao Xu","doi":"10.1117/12.2644383","DOIUrl":"https://doi.org/10.1117/12.2644383","url":null,"abstract":"In view of the existing high-precision stereo matching based on deep learning which network structure is complex, and it is difficult to deploy and run in real time on edge platform. An improved stereo matching algorithm based on RTStereoNet is proposed. Firstly, the channel attention mechanism is introduced in the matching cost aggregation stage of RTStereoNet, so that the network can adaptively enhance the extraction of effective information and reduce the ambiguity of matching. Secondly, in the disparity refinement stage of RTStereoNet, the color image is introduced to compensate for the loss of details caused by the large-scale downsampling of the network, and a lightweight disparity refinement module is constructed to expand the receptive field of the network. In addition, based on Jetson Xavier NX edge computing module, a special edge computing platform is constructed, with the help of TensorRT inference framework, the calculation support problem of special operators is solved through CUDA programming, and achieved deployment acceleration on the platform for both models before and after the improvement. The results show that after the accelerated deployment, the inference speed of the improved model can reach 30 fps on the KITTI2015 test set, and the improved model has higher accuracy than the original model.","PeriodicalId":314555,"journal":{"name":"International Conference on Digital Image Processing","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128844782","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Identifying Alzheimer’s disease from 4D fMRI using hybrid 3DCNN and GRU networks","authors":"Yifan Cao, Meili Lu, Jiajun Fu, Zhaohua Guo, Zicheng Gao","doi":"10.1117/12.2644454","DOIUrl":"https://doi.org/10.1117/12.2644454","url":null,"abstract":"In recently years, motivated by the excellent performance in automatic feature extraction and complex patterns detecting from raw data, recently, deep learning technologies have been widely used in analyzing fMRI data for Alzheimer’s disease classification. However, most current studies did not take full advantage of the temporal and spatial features of fMRI, which may result in ignoring some important information and influencing classification performance. In this paper, we propose a novel approach based on deep learning to learn temporal and spatial features of 4D fMRI for Alzheimer’s disease classification. This model is composed of 3D Convolutional Neural Network(3DCNN) and recurrent neural network. Experimental results demonstrated that the proposed approach could discriminate Alzheimer’s patients from healthy controls with a high accuracy rate.","PeriodicalId":314555,"journal":{"name":"International Conference on Digital Image Processing","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117122171","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}