Mehmet N. Akcay, Burak Kara, Saba Ahsan, A. Begen, I. Curcio, Emre B. Aksu
{"title":"Head-Motion-Aware Viewport Margins for Improving User Experience in Immersive Video","authors":"Mehmet N. Akcay, Burak Kara, Saba Ahsan, A. Begen, I. Curcio, Emre B. Aksu","doi":"10.1145/3469877.3490573","DOIUrl":"https://doi.org/10.1145/3469877.3490573","url":null,"abstract":"Viewport-dependent delivery (VDD) is a technique to save network resources during the transmission of immersive videos. However, it results in a non-zero motion-to-high-quality delay (MTHQD), which is the delta time from the moment where the current viewport has at least one low-quality tile to when all the tiles in the new viewport are rendered in high quality. MTHQD is an important metric in the evaluation of the VDD systems. This paper improves an earlier concept called viewport margins by introducing head-motion awareness. The primary benefit of this improvement is the reduction (up to 64%) in the average MTHQD.","PeriodicalId":210974,"journal":{"name":"ACM Multimedia Asia","volume":"142 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121361126","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Learning to Decompose and Restore Low-light Images with Wavelet Transform","authors":"Pengju Zhang, Chaofan Zhang, Zheng Rong, Yihong Wu","doi":"10.1145/3469877.3490622","DOIUrl":"https://doi.org/10.1145/3469877.3490622","url":null,"abstract":"Low-light images often suffer from low visibility and various noise. Most existing low-light image enhancement methods often amplify noise when enhancing low-light images, due to the neglect of separating valuable image information and noise. In this paper, we propose a novel wavelet-based attention network, where wavelet transform is integrated into attention learning for joint low-light enhancement and noise suppression. Particularly, the proposed wavelet-based attention network includes a Decomposition-Net, an Enhancement-Net and a Restoration-Net. In Decomposition-Net, to benefit denoising, wavelet transform layers are designed for separating noise and global content information into different frequency features. Furthermore, an attention-based strategy is introduced to progressively select suitable frequency features for accurately restoring illumination and reflectance according to Retinex theory. In addition, Enhancement-Net is introduced for further removing degradations in reflectance and adjusting illumination, while Restoration-Net employs conditional adversarial learning to adversarially improve the visual quality of final restored results based on enhanced illumination and reflectance. Extensive experiments on several public datasets demonstrate that the proposed method achieves more pleasing results than state-of-the-art methods.","PeriodicalId":210974,"journal":{"name":"ACM Multimedia Asia","volume":"55 9","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132090962","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yalu Cheng, Pengchong Qiao, Hong-Ju He, Guoli Song, Jie Chen
{"title":"Hard-Boundary Attention Network for Nuclei Instance Segmentation","authors":"Yalu Cheng, Pengchong Qiao, Hong-Ju He, Guoli Song, Jie Chen","doi":"10.1145/3469877.3490602","DOIUrl":"https://doi.org/10.1145/3469877.3490602","url":null,"abstract":"Image segmentation plays an important role in medical image analysis, and accurate segmentation of nuclei is especially crucial to clinical diagnosis. However, existing methods fail to segment dense nuclei due to the hard-boundary which has similar texture to nuclear inside. To this end, we propose a Hard-Boundary Attention Network (HBANet) for nuclei instance segmentation. Specifically, we propose a Background Weaken Module (BWM) to weaken the attention of our model to the nucleus background by integrating low-level features into high-level features. To improve the robustness of the model to the hard-boundary of nuclei, we further design a Gradient-based boundary adaptive Strategy (GS) which generates boundary-weakened data for model training in an adversarial manner. We conduct extensive experiments on MoNuSeg and CPM-17 datasets, and experimental results show that our HBANet outperforms the state-of-the-art methods.","PeriodicalId":210974,"journal":{"name":"ACM Multimedia Asia","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114408864","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shuguang Zhao, Bingzhi Chen, Zheng Zhang, Guangming Lu
{"title":"An Embarrassingly Simple Approach to Discrete Supervised Hashing","authors":"Shuguang Zhao, Bingzhi Chen, Zheng Zhang, Guangming Lu","doi":"10.1145/3469877.3493595","DOIUrl":"https://doi.org/10.1145/3469877.3493595","url":null,"abstract":"Prior hashing works typically learn a projection function from high-dimensional visual feature space to low-dimensional latent space. However, such a projection function remains several crucial bottlenecks: 1) information loss and coding redundancy are inevitable; 2) the available information of semantic labels is not well-explored; 3) the learned latent embedding lacks explicit semantic meaning. To overcome these limitations, we propose a novel supervised Discrete Auto-Encoder Hashing (DAEH) framework, in which a linear auto-encoder can effectively project the semantic labels of images into a latent representation space. Instead of using the visual feature projection, the proposed DAEH framework skillfully explores the semantic information of supervised labels to refine the latent feature embedding and further optimizes hashing function. Meanwhile, we reformulate the objective and relax the discrete constraints for the binary optimization problem. Extensive experiments on Caltech-256, CIFAR-10, and MNIST datasets demonstrate that our method can outperform the state-of-the-art hashing baselines.","PeriodicalId":210974,"journal":{"name":"ACM Multimedia Asia","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122494134","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
L. Galteri, Lorenzo Seidenari, P. Bongini, M. Bertini, A. Bimbo
{"title":"Language Based Image Quality Assessment","authors":"L. Galteri, Lorenzo Seidenari, P. Bongini, M. Bertini, A. Bimbo","doi":"10.1145/3469877.3490605","DOIUrl":"https://doi.org/10.1145/3469877.3490605","url":null,"abstract":"Evaluation of generative models, in the visual domain, is often performed providing anecdotal results to the reader. In the case of image enhancement, reference images are usually available. Nonetheless, using signal based metrics often leads to counterintuitive results: highly natural crisp images may obtain worse scores than blurry ones. On the other hand, blind reference image assessment may rank images reconstructed with GANs higher than the original undistorted images. To avoid time consuming human based image assessment, semantic computer vision tasks may be exploited instead [9, 25, 33]. In this paper we advocate the use of language generation tasks to evaluate the quality of restored images. We show experimentally that image captioning, used as a downstream task, may serve as a method to score image quality. Captioning scores are better aligned with human rankings with respect to signal based metrics or no-reference image quality metrics. We show insights on how the corruption, by artifacts, of local image structure may steer image captions in the wrong direction.","PeriodicalId":210974,"journal":{"name":"ACM Multimedia Asia","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116426881","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hao Zhang, Qi Zhang, P. Nguyen, Victor C. S. Lee, Antoni B. Chan
{"title":"Chinese White Dolphin Detection in the Wild","authors":"Hao Zhang, Qi Zhang, P. Nguyen, Victor C. S. Lee, Antoni B. Chan","doi":"10.1145/3469877.3490574","DOIUrl":"https://doi.org/10.1145/3469877.3490574","url":null,"abstract":"For ecological protection of the ocean, biologists usually conduct line-transect vessel surveys to measure sea species’ population density within their habitat (such as dolphins). However, sea species observation via vessel surveys consumes a lot of manpower resources and is more challenging compared to observing common objects, due to the scarcity of the object in the wild, tiny-size of the objects, and similar-sized distracter objects (e.g., floating trash). To reduce the human experts’ workload and improve the observation accuracy, in this paper, we develop a practical system to detect Chinese White Dolphins in the wild automatically. First, we construct a dataset named Dolphin-14k with more than 2.6k dolphin instances. To improve the dataset annotation efficiency caused by the rarity of dolphins, we design an interactive dolphin box annotation strategy to annotate sparse dolphin instances in long videos efficiently. Second, we compare the performance and efficiency of three off-the-shelf object detection algorithms, including Faster-RCNN, FCOS, and YoloV5, on the Dolphin-14k dataset and pick YoloV5 as the detector, where a new category (Distracter) is added to the model training to reject the false positives. Finally, we incorporate the dolphin detector into a system prototype, which detects dolphins in video frames at 100.99 FPS per GPU with high accuracy (i.e., 90.95 mAP@0.5).","PeriodicalId":210974,"journal":{"name":"ACM Multimedia Asia","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126273047","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Deep Reinforcement Learning and Docking Simulations for autonomous molecule generation in de novo Drug Design","authors":"Hao Liu, Qian Wang, Xiaotong Hu","doi":"10.1145/3469877.3497694","DOIUrl":"https://doi.org/10.1145/3469877.3497694","url":null,"abstract":"In medicinal chemistry programs, it is key to design and make compounds that are efficacious and safe. In this study, we developed a new deep Reinforcement learning-based compounds molecular generation method. Because chemical space is impractically large, and many existing generation models generate molecules that lack effectiveness, novelty and unsatisfactory molecular properties. Our proposed method-DeepRLDS, which integrates transformer network, balanced binary tree search and docking simulation based on super large-scale supercomputing, can solve these problems well. Experiments show that more than 96 of the generated molecules are chemically valid, 99 of the generated molecules are chemically novelty, the generated molecules have satisfactory molecular properties and possess a broader chemical space distribution.","PeriodicalId":210974,"journal":{"name":"ACM Multimedia Asia","volume":"74 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127348166","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Intra- and Inter-frame Iterative Temporal Convolutional Networks for Video Stabilization","authors":"Haopeng Xie, Liang Xiao, Huicong Wu","doi":"10.1145/3469877.3490608","DOIUrl":"https://doi.org/10.1145/3469877.3490608","url":null,"abstract":"Video jitter is an uncomfortable product of irregular lens motion in time sequence. How to extract motion state information in a period of continuous video frames is a major issue for video stabilization. In this paper, we propose a novel sequence model, Intra- and Inter-frame Iterative Temporal Convolutional Networks (I3TC-Net), which alternatively transfer the spatial-temporal correlation of motion within and between frames. We hypothesize that the motion state information can be represented by transmission states. Specifically, we employ combination of Convolutional Long Short-Term Memory (ConvLSTM) and embedded encoder-decoder to generate the latent stable frame, which are used to update transmission states iteratively and learn a global homography transformation effectively for each unstable frame to generate the corresponding stabilized result along the time axis. Furthermore, we create a video dataset to solve the lack of stable data and improve the training effect. Experimental results show that our method outperforms state-of-the-art results on publicly available videos, such as 5.4 points improvements in stability score. The project page is available at https://github.com/root2022IIITC/IIITC.","PeriodicalId":210974,"journal":{"name":"ACM Multimedia Asia","volume":"42 6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130679449","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Differentially Private Learning with Grouped Gradient Clipping","authors":"Haolin Liu, Chenyu Li, Bochao Liu, Pengju Wang, Shiming Ge, Weiping Wang","doi":"10.1145/3469877.3490594","DOIUrl":"https://doi.org/10.1145/3469877.3490594","url":null,"abstract":"While deep learning has proved success in many critical tasks by training models from large-scale data, some private information within can be recovered from the released models, leading to the leakage of privacy. To address this problem, this paper presents a differentially private deep learning paradigm to train private models. In the approach, we propose and incorporate a simple operation termed grouped gradient clipping to modulate the gradient weights. We also incorporated the smooth sensitivity mechanism into differentially private deep learning paradigm, which bounds the adding Gaussian noise. In this way, the resulting model can simultaneously provide with strong privacy protection and avoid accuracy degradation, providing a good trade-off between privacy and performance. The theoretic advantages of grouped gradient clipping are well analyzed. Extensive evaluations on popular benchmarks and comparisons with 11 state-of-the-arts clearly demonstrate the effectiveness and genearalizability of our approach.","PeriodicalId":210974,"journal":{"name":"ACM Multimedia Asia","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131369552","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yanru Jiang, Chengyu Zheng, Zhao-Hui Wang, Rui Wang, Min Ye, Chenglong Wang, Ning Song, Jie Nie
{"title":"Multi-Scale Graph Convolutional Network and Dynamic Iterative Class Loss for Ship Segmentation in Remote Sensing Images","authors":"Yanru Jiang, Chengyu Zheng, Zhao-Hui Wang, Rui Wang, Min Ye, Chenglong Wang, Ning Song, Jie Nie","doi":"10.1145/3469877.3497699","DOIUrl":"https://doi.org/10.1145/3469877.3497699","url":null,"abstract":"The accuracy of the semantic segmentation results of ships is of great significance to coastline navigation, resource management, and territorial protection. Although the ship semantic segmentation method based on deep learning has made great progress, there is still the problem of not exploring the correlation between the targets. In order to avoid the above problems, this paper designed a multi-scale graph convolutional network and dynamic iterative class loss for ship segmentation in remote sensing images to generate more accurate segmentation results. Based on DeepLabv3+, our network uses deep convolutional networks and atrous convolutions for multi-scale feature extraction. In particular, for multi-scale semantic features, we propose to construct a Multi-Scale Graph Convolution Network (MSGCN) to introduce semantic correlation information for pixel feature learning by GCN, which enhances the segmentation result of ship objects. In addition, we propose a Dynamic Iterative Class Loss (DICL) based on iterative batch-wise class rectification instead of pre-computing the fixed weights over the whole dataset, which solves the problem of imbalance between positive and negative samples. We compared the proposed algorithm with the most advanced deep learning target detection methods and ship detection methods and proved the superiority of our method. On a High-Resolution SAR Images Dataset [1], ship detection and instance segmentation can be implemented well.","PeriodicalId":210974,"journal":{"name":"ACM Multimedia Asia","volume":"98 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113983351","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}