International Conference on Digital Image Processing最新文献

筛选
英文 中文
Face tampering detection based on spatiotemporal attention residual network 基于时空注意残差网络的人脸篡改检测
International Conference on Digital Image Processing Pub Date : 2022-10-12 DOI: 10.1117/12.2644654
Z. Cai, Weimin Wei, Fanxing Meng, Changan Liu
{"title":"Face tampering detection based on spatiotemporal attention residual network","authors":"Z. Cai, Weimin Wei, Fanxing Meng, Changan Liu","doi":"10.1117/12.2644654","DOIUrl":"https://doi.org/10.1117/12.2644654","url":null,"abstract":"Fake technology has evolved to the point where fake faces are increasingly difficult to distinguish from real ones. If the forged face videos spread wildly on social media, social unrest or personal reputation damage may lead to social unrest. A face tampering detection method (RALNet) with spatiotemporal attention residual network is designed to reduce the misuse of face data due to malicious dissemination. Firstly, we propose a process to extract video face data, which reduces the interference of irrelevant information and improves the utilization of data processing. Then, based on the characteristics of incoherence and inconsistency in spatial and temporal information of tampered videos, the spatial domain features and temporal domain features of the target face video are extracted by introducing an attention mechanism of residual network and long short-term memory network to classify the targets as true or fake. The experimental results show that the method can effectively detect whether the face data is tampered, and its detection accuracy is better than other methods. In addition, it also achieves good performance in terms of recall, precision, and F1 score.","PeriodicalId":314555,"journal":{"name":"International Conference on Digital Image Processing","volume":"377 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115174057","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A hardware architecture of skip/direct mode for AVS3 AVS3的跳过/直接模式的硬件架构
International Conference on Digital Image Processing Pub Date : 2022-10-12 DOI: 10.1117/12.2643010
Yingbo Wen, Guoqing Xiang, Yunyao Yan, Xizhong Zhu, Xiaofeng Huang, Peng Zhang, Wei Yan
{"title":"A hardware architecture of skip/direct mode for AVS3","authors":"Yingbo Wen, Guoqing Xiang, Yunyao Yan, Xizhong Zhu, Xiaofeng Huang, Peng Zhang, Wei Yan","doi":"10.1117/12.2643010","DOIUrl":"https://doi.org/10.1117/12.2643010","url":null,"abstract":"Skip/direct mode is one of the inter prediction modes in video coding, which achieves a high coding performance. In Audio and Video coding Standard-3(AVS3), skip/direct has improved more performance with more candidate modes. The candidate mode list is generated by numerous prediction directions with corresponded predicted motion vectors. However, it will result in higher computation complexities and challenges to parallel computation, especially for the hardware implementation. For resolving the problem, we propose a hardware architecture of skip/direct mode with a fast motion vector prediction (MVP) algorithm in this paper. Our architecture is designed with efficient pipeline schedules. And the fast MVP algorithm can reduce the number of MVP candidates efficiently. The fast MVP method is introduced by setting a search window, some unnecessary MVP are skipped, thereby reducing the computational complexity firstly. Then the proposed hardware architecture is given with efficient pipeline schedules in detail. The experimental results show that our architecture is able to meet the requirement of 3840x2160@60FPS with only 0.48% and 0.42% BD-Rate increase under the low delay P (LDP) and random access (RA) configurations, respectively.","PeriodicalId":314555,"journal":{"name":"International Conference on Digital Image Processing","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123411919","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
No-reference stereoscopic video quality assessment based on Tchebichef moment 基于切比切夫矩的无参考立体视频质量评价
International Conference on Digital Image Processing Pub Date : 2022-10-12 DOI: 10.1117/12.2644706
Yuxin Chen, Ming-Chang Wen
{"title":"No-reference stereoscopic video quality assessment based on Tchebichef moment","authors":"Yuxin Chen, Ming-Chang Wen","doi":"10.1117/12.2644706","DOIUrl":"https://doi.org/10.1117/12.2644706","url":null,"abstract":"We propose a no-reference (NR) stereoscopic video quality assessment (SVQA) model based on Tchebichef moment in this paper. Specifically, we extract keyframes according to mutual information between adjacent frames, and then the extracted keyframes are segmented to patches to calculate low-order Tchebichef moments. Since the strong description ability of Tchebichef moment, and different order of Tchebichef moment can represent independent features with minimal information redundancy, we extract statistical features of Tchebichef moment on computed patches as spatial features. Considering the influence of distortions in spatiotemporal domain to video quality, we use the three-dimensional derivative of Gaussian filters to calculate the spatiotemporal energy responses and extract statistical features from the responses as spatiotemporal features. Finally, we combine the spatial and spatiotemporal features to predict the quality of stereoscopic videos. The proposed model is evaluated on the NAMA3DS1-COSPAD1, SVQA and Waterloo IVC phase I databases. The experimental results show that the proposed model achieved competitive performance as compared with existing SVQA models.","PeriodicalId":314555,"journal":{"name":"International Conference on Digital Image Processing","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123968154","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hardware-friendly fast rate-distortion optimized quantization algorithm for AVS3 硬件友好的AVS3快速率失真优化量化算法
International Conference on Digital Image Processing Pub Date : 2022-10-12 DOI: 10.1117/12.2643000
Jinchang Xu, Guoqing Xiang, Yunyao Yan, Yingbo Wen, Xiaofeng Huang, Peng Zhang, Wei Yan
{"title":"Hardware-friendly fast rate-distortion optimized quantization algorithm for AVS3","authors":"Jinchang Xu, Guoqing Xiang, Yunyao Yan, Yingbo Wen, Xiaofeng Huang, Peng Zhang, Wei Yan","doi":"10.1117/12.2643000","DOIUrl":"https://doi.org/10.1117/12.2643000","url":null,"abstract":"Rate-distortion optimized quantization (RDOQ) is an important technique in the video coding standard, which effectively improves encoding efficiency. However, the large compute complexity and the strong data dependency in the RDOQ calculation process limit the real-time encoding in hardware design. In this paper, a fast RDOQ algorithm is proposed, which includes the RDOQ skip algorithm and the optimized rate estimation algorithm. Firstly, by detecting the Pseudo all-zero block (PZB) in advance, some unnecessary RDOQ processes are skipped, thereby reducing the computational complexity. Secondly, by optimizing the elements used in rate estimation of the RDOQ process, the strong data dependency of the process is alleviated, which allows RDOQ to be executed in parallel. Experimental results show that the proposed algorithm reduces 27.6% and 30.6% encoding time with only average 0.3% and 0.1% BD-rate performance loss under low delay P and random access configurations on the HPM-4.0.1 of AVS3, respectively.","PeriodicalId":314555,"journal":{"name":"International Conference on Digital Image Processing","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124317233","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Blind image quality assessment based on transformer 基于变压器的盲图像质量评价
International Conference on Digital Image Processing Pub Date : 2022-10-12 DOI: 10.1117/12.2643493
Linxin Li, Chu Chen, Naixuan Zhao
{"title":"Blind image quality assessment based on transformer","authors":"Linxin Li, Chu Chen, Naixuan Zhao","doi":"10.1117/12.2643493","DOIUrl":"https://doi.org/10.1117/12.2643493","url":null,"abstract":"Transformer has achieved milestones in natural language processing (NLP). Due to its excellent global and remote semantic information interaction performance, it has gradually been applied in vision tasks. In this paper, we propose PTIQ, which is a pure Transformer structure for Image Quality Assessment. Specifically, we use Swin Transformer Blocks as backbone to extract image features. The extracted feature vectors after extra state embedding and position embedding are fed into the original transformer encoder. Then, the output is passed to the MLP head to predict quality score. Experimental results demonstrate that the proposed architecture achieves outstanding performance.","PeriodicalId":314555,"journal":{"name":"International Conference on Digital Image Processing","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124943319","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
No-reference video quality assessment using data dimensionality reduction and attention-based pooling 使用数据降维和基于注意力池的无参考视频质量评估
International Conference on Digital Image Processing Pub Date : 2022-10-12 DOI: 10.1117/12.2643807
Zhiwei Wang, Linjing Lai
{"title":"No-reference video quality assessment using data dimensionality reduction and attention-based pooling","authors":"Zhiwei Wang, Linjing Lai","doi":"10.1117/12.2643807","DOIUrl":"https://doi.org/10.1117/12.2643807","url":null,"abstract":"This paper proposes a new end-to-end no-reference (NR) video quality assessment (VQA) algorithm that makes use of dimensionality reduction and attention-based pooling. Firstly, the dataset is expanded through data enhancement based on frame sampling. Secondly, the cropped video blocks are input into the trainable data dimensionality reduction module which adopts 3D convolution to reduce the dimension of the data. Then, the dimensionality reduced data is input into the backbone of the algorithm to extract spatial features. The extracted features are pooled through attention-based pooling. Finally, the pooled features are regressed to the quality score through the full connection layer. Experimental results show that the proposed algorithm has achieved competitive performance on the LIVE, LIVE Mobile and CVD2014 datasets, and has low complexity.","PeriodicalId":314555,"journal":{"name":"International Conference on Digital Image Processing","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124058517","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Application of artificial neural networks in recognizing carrier based on the color of raspberry powders obtained in the spray-drying process 基于喷雾干燥覆盆子粉末颜色的人工神经网络载体识别
International Conference on Digital Image Processing Pub Date : 2022-10-12 DOI: 10.1117/12.2645926
K. Przybył, J. Wawrzyniak, K. Samborska, Ł. Gierz, K. Koszela, M. Szychta
{"title":"Application of artificial neural networks in recognizing carrier based on the color of raspberry powders obtained in the spray-drying process","authors":"K. Przybył, J. Wawrzyniak, K. Samborska, Ł. Gierz, K. Koszela, M. Szychta","doi":"10.1117/12.2645926","DOIUrl":"https://doi.org/10.1117/12.2645926","url":null,"abstract":"Fruit juices and vegetable and fruit juices are the products, which provide our bodies with a lot of valuable and nutritional ingredients and play a major role in prevention of numerous illnesses. Raspberries are the valuable source of bioactive compounds. As part of preserving food, whose main aim is to extend stability of products obtained only in season, the researchers took advantage of spray drying technique. In the research part of the study, research samples were prepared in the form of raspberry powders obtained from the process of dehumidified spray drying. Because of the research, a neural model was made, which supported the evaluation of the quality of detecting powder samples based on their color. The devised neural network reached classification accuracy at 0.924.","PeriodicalId":314555,"journal":{"name":"International Conference on Digital Image Processing","volume":"150 7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130960427","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Attention-guided feature fusion network for crowd counting 用于人群计数的注意引导特征融合网络
International Conference on Digital Image Processing Pub Date : 2022-10-12 DOI: 10.1117/12.2643005
Qing He, Qianqian Yang, Yinfeng Xia, Sifan Peng, B. Yin
{"title":"Attention-guided feature fusion network for crowd counting","authors":"Qing He, Qianqian Yang, Yinfeng Xia, Sifan Peng, B. Yin","doi":"10.1117/12.2643005","DOIUrl":"https://doi.org/10.1117/12.2643005","url":null,"abstract":"How to solve the scale variation and background interference faced by crowd counting algorithms in practical applications is still an open problem. In this paper, to tackle the above problems, we propose the Attention-guided Feature Fusion Network (AFFNet) to learn the mapping between the crowd image and density map. In this network, the Channel-attentive Receptive Field Block (CRFB) is constructed by parallel convolutional layers with different expansion rates to extract multi-scale features. By adopting attention masks generated by high-level features to adjust low-level features, the Feature Fusion Module (FFM) can alleviate the background interference problem at the feature level. In addition, the Double Branch Module (DBM) generates a density estimation map, which further erases the background interference problem at the density level. Extensive experiments conducted on several challenging benchmark datasets including ShanghaiTech, UCF-QNRF and JHU-CROWD++ demonstrate our proposed method is superior to the state-of-the-art approaches.","PeriodicalId":314555,"journal":{"name":"International Conference on Digital Image Processing","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128286050","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Implementation of stereo matching algorithm based on Xavier edge computing platform 基于Xavier边缘计算平台的立体匹配算法实现
International Conference on Digital Image Processing Pub Date : 2022-10-12 DOI: 10.1117/12.2644383
Shuting Wang, Chao Xu
{"title":"Implementation of stereo matching algorithm based on Xavier edge computing platform","authors":"Shuting Wang, Chao Xu","doi":"10.1117/12.2644383","DOIUrl":"https://doi.org/10.1117/12.2644383","url":null,"abstract":"In view of the existing high-precision stereo matching based on deep learning which network structure is complex, and it is difficult to deploy and run in real time on edge platform. An improved stereo matching algorithm based on RTStereoNet is proposed. Firstly, the channel attention mechanism is introduced in the matching cost aggregation stage of RTStereoNet, so that the network can adaptively enhance the extraction of effective information and reduce the ambiguity of matching. Secondly, in the disparity refinement stage of RTStereoNet, the color image is introduced to compensate for the loss of details caused by the large-scale downsampling of the network, and a lightweight disparity refinement module is constructed to expand the receptive field of the network. In addition, based on Jetson Xavier NX edge computing module, a special edge computing platform is constructed, with the help of TensorRT inference framework, the calculation support problem of special operators is solved through CUDA programming, and achieved deployment acceleration on the platform for both models before and after the improvement. The results show that after the accelerated deployment, the inference speed of the improved model can reach 30 fps on the KITTI2015 test set, and the improved model has higher accuracy than the original model.","PeriodicalId":314555,"journal":{"name":"International Conference on Digital Image Processing","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128844782","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Identifying Alzheimer’s disease from 4D fMRI using hybrid 3DCNN and GRU networks 使用混合3DCNN和GRU网络从4D fMRI识别阿尔茨海默病
International Conference on Digital Image Processing Pub Date : 2022-10-12 DOI: 10.1117/12.2644454
Yifan Cao, Meili Lu, Jiajun Fu, Zhaohua Guo, Zicheng Gao
{"title":"Identifying Alzheimer’s disease from 4D fMRI using hybrid 3DCNN and GRU networks","authors":"Yifan Cao, Meili Lu, Jiajun Fu, Zhaohua Guo, Zicheng Gao","doi":"10.1117/12.2644454","DOIUrl":"https://doi.org/10.1117/12.2644454","url":null,"abstract":"In recently years, motivated by the excellent performance in automatic feature extraction and complex patterns detecting from raw data, recently, deep learning technologies have been widely used in analyzing fMRI data for Alzheimer’s disease classification. However, most current studies did not take full advantage of the temporal and spatial features of fMRI, which may result in ignoring some important information and influencing classification performance. In this paper, we propose a novel approach based on deep learning to learn temporal and spatial features of 4D fMRI for Alzheimer’s disease classification. This model is composed of 3D Convolutional Neural Network(3DCNN) and recurrent neural network. Experimental results demonstrated that the proposed approach could discriminate Alzheimer’s patients from healthy controls with a high accuracy rate.","PeriodicalId":314555,"journal":{"name":"International Conference on Digital Image Processing","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117122171","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信