{"title":"Predicting Human Perception of Scene Complexity","authors":"C. Kyle-Davidson, A. Bors, K. Evans","doi":"10.1109/ICIP46576.2022.9897953","DOIUrl":"https://doi.org/10.1109/ICIP46576.2022.9897953","url":null,"abstract":"It is apparent that humans are intrinsically capable of determining the degree of complexity present in an image; but it is unclear which regions in that image lead humans towards evaluating an image as complex or simple. Here, we develop a novel deep learning model for predicting human perception of the complexity of natural scene images in order to address these problems. For a given image, our approach, ComplexityNet, can generate both single-score complexity ratings and two-dimensional per-pixel complexity maps. These complexity maps indicate the regions of scenes that humans find to be complex, or simple. Drawing on work in the cognitive sciences we integrate metrics for scene clutter and scene symmetry, and conclude that the proposed metrics do indeed boost neural network performance when predicting complexity.","PeriodicalId":387035,"journal":{"name":"2022 IEEE International Conference on Image Processing (ICIP)","volume":"93 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116491947","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Learning Trajectory-Conditioned Relations to Predict Pedestrian Crossing Behavior","authors":"Chenchao Zhou, Ghassan AlRegib, Armin Parchami, Kunjan Singh","doi":"10.1109/ICIP46576.2022.9897655","DOIUrl":"https://doi.org/10.1109/ICIP46576.2022.9897655","url":null,"abstract":"In smart transportation, intelligent systems avoid potential collisions by predicting the intent of traffic agents, especially pedestrians. Pedestrian intent, defined as future action, e.g., start crossing, can be dependent on traffic surroundings. In this paper, we develop a framework to incorporate such dependency given observed pedestrian trajectory and scene frames. Our framework first encodes regional joint information between a pedestrian and surroundings over time into feature-map vectors. The global relation representations are then extracted from pairwise feature-map vectors to estimate intent with past trajectory condition. We evaluate our approach on two public datasets and compare against two state-of-the-art approaches. The experimental results demonstrate that our method helps to inform potential risks during crossing events with 0.04 improvement in F1-score on JAAD dataset and 0.01 improvement in recall on PIE dataset. Furthermore, we conduct ablation experiments to confirm the contribution of the relation extraction in our framework.","PeriodicalId":387035,"journal":{"name":"2022 IEEE International Conference on Image Processing (ICIP)","volume":"144 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123482350","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Self-Supervised Method for Infrared and Visible Image Fusion","authors":"Xiaopeng Lin, Guanxing Zhou, Weihong Zeng, Xiaotong Tu, Yue Huang, Xinghao Ding","doi":"10.1109/ICIP46576.2022.9897731","DOIUrl":"https://doi.org/10.1109/ICIP46576.2022.9897731","url":null,"abstract":"Infrared and visible image fusion (IVIF) plays important roles in many applications. Since there is no ground-truth, the fusion performance measurement is a difficult but important problem for the task. Previous unsupervised deep learning based fusion methods depend on a hand-crafted loss function to define the distance between the fused image and two types of source images, which still cannot well preserve the vital information in the fused images. To address these issues, we propose an image fusion performance measurement between the fused image and the decomposition of the fused image. A novel self-supervised network for infrared and visible image fusion is designed to preserve the vital information of source images by narrowing the distance between the source images and the decomposed ones. Extensive experimental results demonstrate that our proposed measurement has the ability in improving the performance of backbone network in both subjective and objective evaluations.","PeriodicalId":387035,"journal":{"name":"2022 IEEE International Conference on Image Processing (ICIP)","volume":"97 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122052427","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Qi Zheng, Zhengzhong Tu, Zhijian Hao, Xiaoyang Zeng, A. Bovik, Yibo Fan
{"title":"Blind Video Quality Assessment via Space-Time Slice Statistics","authors":"Qi Zheng, Zhengzhong Tu, Zhijian Hao, Xiaoyang Zeng, A. Bovik, Yibo Fan","doi":"10.1109/ICIP46576.2022.9897565","DOIUrl":"https://doi.org/10.1109/ICIP46576.2022.9897565","url":null,"abstract":"User-generated contents (UGC) have gained increased attention in the video quality community recently. Perceptual video quality assessment (VQA) of UGC videos is of great significance for content providers to monitor, process, and deliver massive numbers of UGC videos. Blind video quality prediction of UGC videos is challenging since complex mixtures of spatial and temporal distortions contribute to the overall perceptual quality. In this paper, we develop a simple, effective, and efficient blind VQA framework (STS-QA) based on the statistical analysis of space-time slices (STS) of videos. Specifically, we extract spatio-temporal statistical features along different orientations of video STS, that capture directional global motion, then train a shallow quality predictor. The proposed framework can be used to easily extend any existing video/image quality model to account for temporal or motion regularities. Our experimental results on three publicly available UGC databases demonstrate that our proposed STS-QA model can significantly boost prediction performance compared to baselines. The code will be released at: https://github.com/uniqzheng/STS_BVQA.","PeriodicalId":387035,"journal":{"name":"2022 IEEE International Conference on Image Processing (ICIP)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122060690","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
R. Kuzu, F. Albrecht, Caroline Arnold, Roshni Kamath, Kai Konen
{"title":"Predicting Soil Properties from Hyperspectral Satellite Images","authors":"R. Kuzu, F. Albrecht, Caroline Arnold, Roshni Kamath, Kai Konen","doi":"10.1109/ICIP46576.2022.9897254","DOIUrl":"https://doi.org/10.1109/ICIP46576.2022.9897254","url":null,"abstract":"The AI4EO Hyperview challenge seeks machine learning methods that predict agriculturally relevant soil parameters (K, Mg, P2O5, pH) from airborne hyperspectral images. We present a hybrid model fusing Random Forest and K-nearest neighbor regressors that exploit the average spectral reflectance, as well as derived features such as gradients, wavelet coefficients, and Fourier transforms. The solution is computationally lightweight and improves upon the challenge baseline by 21.9%, with the first place on the public leaderboard. In addition, we discuss neural network architectures and potential future improvements.","PeriodicalId":387035,"journal":{"name":"2022 IEEE International Conference on Image Processing (ICIP)","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124678686","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"HDR-TOF: HDR Time-of-Flight Imaging via Modulo Acquisition","authors":"Gal Shtendel, A. Bhandari","doi":"10.1109/ICIP46576.2022.9897552","DOIUrl":"https://doi.org/10.1109/ICIP46576.2022.9897552","url":null,"abstract":"Time-of-Flight (ToF) imagers, e.g. Microsoft Kinect, are active devices that offer a portable, efficient and a consumer-grade solution to three dimensional imaging problems. As the name suggests, in ToF imaging, back scattered light from an active illumination source (typically a sinusoid) is used to measure the ToF, thus resulting in depth information. Despite its prevalence in applications such as autonomous navigation and scientific imaging, current ToF sensors are limited in their dynamic range. Computational imaging solutions enabling high dynamic range (HDR) ToF imaging are largely unexplored. We take a step in this direction by proposing a novel architecture for HDR ToF imaging; we combine ToF imaging with the recently introduced Unlimited Sensing Framework. By considering modulo sampling at each ToF pixel, HDR signals are folded back in the conventional dynamic range. Our work offers a single-shot solution for HDR ToF imaging. We report a sampling density criterion that guarantees inversion of modulo non-linearity. Furthermore, we also present a new algorithm for ToF recovery that circumvents the need for unfolding of modulo samples. Numerical examples based on the Stanford 3D Scanning Repository highlight the merits of our approach, thus paving a path for a novel imaging architecture.","PeriodicalId":387035,"journal":{"name":"2022 IEEE International Conference on Image Processing (ICIP)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124806548","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yilin Wang, Joong Gon Yim, N. Birkbeck, Junjie Ke, Hossein Talebi, Xi Chen, Feng Yang, Balu Adsumilli
{"title":"Revisiting the Efficiency of UGC Video Quality Assessment","authors":"Yilin Wang, Joong Gon Yim, N. Birkbeck, Junjie Ke, Hossein Talebi, Xi Chen, Feng Yang, Balu Adsumilli","doi":"10.1109/ICIP46576.2022.9897401","DOIUrl":"https://doi.org/10.1109/ICIP46576.2022.9897401","url":null,"abstract":"UGC video quality assessment (UGC-VQA) is a challenging research topic due to the high video diversity and limited public UGC quality datasets. State-of-the-art (SOTA) UGC quality models tend to use high complexity models, and rarely discuss the trade-off among complexity, accuracy, and generalizability. We propose a new perspective on UGC-VQA, and show that model complexity may not be critical to the performance, whereas a more diverse dataset is essential to train a better model. We illustrate this by using a light weight model, UVQ-lite, which has higher efficiency and better generalizability (less overfitting) than baseline SOTA models. We also propose a new way to analyze the sufficiency of the training set, by leveraging UVQ’s comprehensive features. Our results motivate a new perspective about the future of UGC-VQA research, which we believe is headed toward more efficient models and more diverse datasets.","PeriodicalId":387035,"journal":{"name":"2022 IEEE International Conference on Image Processing (ICIP)","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129456173","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Spatio-Temporal Parallelization Scheme for HEVC Encoding on Multi-Computer Systems","authors":"Alexandre Mercat, Sami Ahovainio, Jarno Vanne","doi":"10.1109/ICIP46576.2022.9897316","DOIUrl":"https://doi.org/10.1109/ICIP46576.2022.9897316","url":null,"abstract":"High Efficiency Video Coding (HEVC) sets the scene for economic video transmission and storage, but its inherent computational complexity calls for efficient parallelization techniques. This paper introduces and compares three different parallelization strategies for HEVC encoding on multi-computer systems: 1) spatial parallelization scheme, where input video frames are divided into slices and distributed among available computers; 2) temporal parallelization scheme, where input video is distributed among computers in groups of consecutive frames; 3) spatio-temporal parallelization scheme that combines the proposed spatial and temporal approaches. All these three schemes were benchmarked as part of the practical Kvazaar open-source HEVC encoder. Our experimental results on 2–5 computer configurations show that using the spatial scheme gives 1.65×–2.90× speedup at the cost of 4.16%–13.09% bitrate loss over a single-computer setup. The respective speedup with temporal parallelization is 1.86×–3.26× without any coding overhead. The spatio-temporal scheme with 2 slices was shown to offer the best load-balancing with 1.81×–3.55× speedups and a constant coding loss of 4.16%.","PeriodicalId":387035,"journal":{"name":"2022 IEEE International Conference on Image Processing (ICIP)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128455503","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wei Wang, Zhuo Chen, Zhe Wang, Jie Lin, Long Xu, Weisi Lin
{"title":"Channel-Wise Bit Allocation for Deep Visual Feature Quantization","authors":"Wei Wang, Zhuo Chen, Zhe Wang, Jie Lin, Long Xu, Weisi Lin","doi":"10.1109/ICIP46576.2022.9897325","DOIUrl":"https://doi.org/10.1109/ICIP46576.2022.9897325","url":null,"abstract":"Intermediate deep visual feature compression and transmission is an emerging research topic, which enables a good balance among computing load, bandwidth usage and generalization ability for AI-based visual analysis in edge-cloud collaboration. Quantization and the corresponding rate-distortion optimization are the key techniques in deep feature compression. In this paper, by exploring the feature statistics and a greedy iterative algorithm, we propose a channel-wise bit allocation method for deep feature quantization optimizing for network output error. Given the limited rate and computational power, the proposed method can quantize features with small information loss. Moreover, the method also provides the option to handle the trade-offs between computational cost and quantization performance. Experimental results on ResNet and VGGNet features demonstrate the effectiveness of the proposed bit allocation method.","PeriodicalId":387035,"journal":{"name":"2022 IEEE International Conference on Image Processing (ICIP)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128556358","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Adaptive Detail Injection-Based Feature Pyramid Network for Pan-Sharpening","authors":"Yi Sun, Yuanlin Zhang, Yuan Yuan","doi":"10.1109/ICIP46576.2022.9897212","DOIUrl":"https://doi.org/10.1109/ICIP46576.2022.9897212","url":null,"abstract":"Many remarkable works have been proposed to deal with distortions problems in image fusion to date. However, the spectral distortion and the spatial distortion cannot always be well addressed at the same time. To deal with this, we propose an Adaptive Feature Pyramid Network (AFPN) to efficiently embed an Adaptive Detail Injection (ADI) module at different scales. Feature-domain injection gains are proposed in the ADI module to adaptively modulate spatial information and guide a refined detail injection. Furthermore, we propose a texture loss function to further guide our model to learn detail perception in each band. Experiments on QuickBird and GaoFen-1 datasets show that our method achieves superior performance and produces visually pleasing fusion images. Our code is available at https://github.com/yisun98/AFPN.","PeriodicalId":387035,"journal":{"name":"2022 IEEE International Conference on Image Processing (ICIP)","volume":"83 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128189673","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}