Qi Wang, Zhihua Zhong, Yuchi Huo, Hujun Bao, Rui Wang
{"title":"State of the Art on Deep Learning-enhanced Rendering Methods","authors":"Qi Wang, Zhihua Zhong, Yuchi Huo, Hujun Bao, Rui Wang","doi":"10.1007/s11633-022-1400-x","DOIUrl":"https://doi.org/10.1007/s11633-022-1400-x","url":null,"abstract":"","PeriodicalId":29727,"journal":{"name":"Machine Intelligence Research","volume":" 6","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135290693","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yu Xi, Ke Zhou, Ling-Wen Meng, Bo Chen, Hao-Min Chen, Jing-Yi Zhang
{"title":"Transmission Line Insulator Defect Detection Based on Swin Transformer and Context","authors":"Yu Xi, Ke Zhou, Ling-Wen Meng, Bo Chen, Hao-Min Chen, Jing-Yi Zhang","doi":"10.1007/s11633-022-1355-y","DOIUrl":"https://doi.org/10.1007/s11633-022-1355-y","url":null,"abstract":"Insulators are important components of power transmission lines. Once a failure occurs, it may cause a large-scale blackout and other hidden dangers. Due to the large image size and complex background, detecting small defect objects is a challenge. We make improvements based on the two-stage network Faster R-convolutional neural networks (CNN). First, we use a hierarchical Swin Transformer with shifted windows as the feature extraction network, instead of ResNet, to extract more discriminative features, and then design the deformable receptive field block to encode global and local context information, which is utilized to capture key clues for detecting objects in complex backgrounds. Finally, the filling data augmentation method is proposed for the problem of insufficient defects and more images of insulator defects under different backgrounds are added to the training set to improve the robustness of the model. As a result, the recall increases from 89.5% to 92.1%, and the average precision increases from 81.0% to 87.1%. To further prove the superiority of the proposed algorithm, we also tested the model on the public data set Pascal visual object classes (VOC), which also yields outstanding results.","PeriodicalId":29727,"journal":{"name":"Machine Intelligence Research","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135396705","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"YOLO-CORE: Contour Regression for Efficient Instance Segmentation","authors":"Haoliang Liu, Wei Xiong, Yu Zhang","doi":"10.1007/s11633-022-1379-3","DOIUrl":"https://doi.org/10.1007/s11633-022-1379-3","url":null,"abstract":"Instance segmentation has drawn mounting attention due to its significant utility. However, high computational costs have been widely acknowledged in this domain, as the instance mask is generally achieved by pixel-level labeling. In this paper, we present a conceptually efficient contour regression network based on the you only look once (YOLO) architecture named YOLO-CORE for instance segmentation. The mask of the instance is efficiently acquired by explicit and direct contour regression using our designed multi-order constraint consisting of a polar distance loss and a sector loss. Our proposed YOLO-CORE yields impressive segmentation performance in terms of both accuracy and speed. It achieves 57.9% AP@0.5 with 47 FPS (frames per second) on the semantic boundaries dataset (SBD) and 51.1% AP@0.5 with 46 FPS on the COCO dataset. The superior performance achieved by our method with explicit contour regression suggests a new technique line in the YOLO-based image understanding field. Moreover, our instance segmentation design can be flexibly integrated into existing deep detectors with negligible computation cost (65.86 BFLOPs (billion float operations per second) to 66.15 BFLOPs with the YOLOv3 detector).","PeriodicalId":29727,"journal":{"name":"Machine Intelligence Research","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135396713","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Practical Blind Image Denoising via Swin-Conv-UNet and Data Synthesis","authors":"Zhang, Kai, Li, Yawei, Liang, Jingyun, Cao, Jiezhang, Zhang, Yulun, Tang, Hao, Timofte, Radu, Van Gool, Luc","doi":"10.1007/s11633-023-1466-0","DOIUrl":"https://doi.org/10.1007/s11633-023-1466-0","url":null,"abstract":"While recent years have witnessed a dramatic upsurge of exploiting deep neural networks toward solving image denoising, existing methods mostly rely on simple noise assumptions, such as additive white Gaussian noise (AWGN), JPEG compression noise and camera sensor noise, and a general-purpose blind denoising method for real images remains unsolved. In this paper, we attempt to solve this problem from the perspective of network architecture design and training data synthesis. Specifically, for the network architecture design, we propose a swin-conv block to incorporate the local modeling ability of residual convolutional layer and non-local modeling ability of swin transformer block, and then plug it as the main building block into the widely-used image-to-image translation UNet architecture. For the training data synthesis, we design a practical noise degradation model which takes into consideration different kinds of noise (including Gaussian, Poisson, speckle, JPEG compression, and processed camera sensor noises) and resizing, and also involves a random shuffle strategy and a double degradation strategy. Extensive experiments on AGWN removal and real image denoising demonstrate that the new network architecture design achieves state-of-the-art performance and the new degradation model can help to significantly improve the practicability. We believe our work can provide useful insights into current denoising research.","PeriodicalId":29727,"journal":{"name":"Machine Intelligence Research","volume":"175 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135353684","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"DepthFormer: Exploiting Long-range Correlation and Local Information for Accurate Monocular Depth Estimation","authors":"Zhenyu Li, Zehui Chen, Xianming Liu, Junjun Jiang","doi":"10.1007/s11633-023-1458-0","DOIUrl":"https://doi.org/10.1007/s11633-023-1458-0","url":null,"abstract":"Abstract This paper aims to address the problem of supervised monocular depth estimation. We start with a meticulous pilot study to demonstrate that the long-range correlation is essential for accurate depth estimation. Moreover, the Transformer and convolution are good at long-range and close-range depth estimation, respectively. Therefore, we propose to adopt a parallel encoder architecture consisting of a Transformer branch and a convolution branch. The former can model global context with the effective attention mechanism and the latter aims to preserve the local information as the Transformer lacks the spatial inductive bias in modeling such contents. However, independent branches lead to a shortage of connections between features. To bridge this gap, we design a hierarchical aggregation and heterogeneous interaction module to enhance the Transformer features and model the affinity between the heterogeneous features in a set-to-set translation manner. Due to the unbearable memory cost introduced by the global attention on high-resolution feature maps, we adopt the deformable scheme to reduce the complexity. Extensive experiments on the KITTI, NYU, and SUN RGB-D datasets demonstrate that our proposed model, termed DepthFormer, surpasses state-of-the-art monocular depth estimation methods with prominent margins. The effectiveness of each proposed module is elaborately evaluated through meticulous and intensive ablation studies.","PeriodicalId":29727,"journal":{"name":"Machine Intelligence Research","volume":"88 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134990420","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}