Frederic Ringsleben, Maik Benndorf, T. Haenselmann, R. Boiger, Manfred Mücke, M. Fehr, Dirk Motthes
{"title":"A New Approach using Characteristic Video Signals to Improve the Stability of Manufacturing Processes","authors":"Frederic Ringsleben, Maik Benndorf, T. Haenselmann, R. Boiger, Manfred Mücke, M. Fehr, Dirk Motthes","doi":"10.1109/DICTA.2018.8615860","DOIUrl":"https://doi.org/10.1109/DICTA.2018.8615860","url":null,"abstract":"Observing production processes is a typical task for sensors in industrial environments. This paper deals with the use of camera systems as a sensor array to compare similar production processes with one another. The aim is to detect anomalies in production processes, such as the motion of robots or the flow of liquids. Since the comparison of high-resolution and long videos is very resource-intensive, we propose clustering the video into areas and shots. Therefore, we suggest interpreting each pixel of a video as a signal varying in time. In order to do that without any background knowledge and to be useful for any production environment with motion involved, we use an unsupervised clustering procedure. We show three different preprocessing approaches to avoid faulty clustering of static image areas and those relevant for the production and finally compare the results.","PeriodicalId":130057,"journal":{"name":"2018 Digital Image Computing: Techniques and Applications (DICTA)","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131646013","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Blur Kernel Estimation Model with Combined Constraints for Blind Image Deblurring","authors":"Ying Liao, Weihong Li, Jinkai Cui, W. Gong","doi":"10.1109/DICTA.2018.8615815","DOIUrl":"https://doi.org/10.1109/DICTA.2018.8615815","url":null,"abstract":"This paper proposes a blur kernel estimation model based on combined constraints involving both image and blur kernel constraints for blind image deblurring. We adopt L0 regularization term for constraining image gradient and dark channel of image gradient to protect image strong edges and suppress noise in image, and use L2 regularization term as hybrid constraints for blur kernel and its gradient to preserve blur kernel's sparsity and continuity respectively. In combined constraints, the constrained dark channel of image gradient, which is a dark channel prior, can also effectively help blind image deblurring in various scenarios, such as natural, face and text images. Moreover, we introduce a half-quadratic splitting optimization algorithm for solving the proposed model. We conduct extensive experiments and results demonstrate that the proposed method can better estimate blur kernel and achieve better visual quality of image deblurring on both synthetic and real-life blurred images.","PeriodicalId":130057,"journal":{"name":"2018 Digital Image Computing: Techniques and Applications (DICTA)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130608336","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mirza Muhammad Ali Baig, Mian Ihtisham Shah, Muhammad Abdullah Wajahat, Nauman Zafar, Omar Arif
{"title":"Image Caption Generator with Novel Object Injection","authors":"Mirza Muhammad Ali Baig, Mian Ihtisham Shah, Muhammad Abdullah Wajahat, Nauman Zafar, Omar Arif","doi":"10.1109/DICTA.2018.8615810","DOIUrl":"https://doi.org/10.1109/DICTA.2018.8615810","url":null,"abstract":"Image captioning is a field within artificial intelligence that is progressing rapidly and it has a lot of potentials. A major problem when working in this field is the limited amount of data that is available to us as is. The only dataset considered suitable enough for the task is the Microsoft: Common Objects in Context (MSCOCO) dataset, which contains about 120,000 training images. This covers about 80 object classes, which is an insufficient amount if we want to create robust solutions that aren't limited to the constraints of the data at hand. In order to overcome this problem, we propose a solution that incorporates Zero-Shot Learning concepts in order to identify unknown objects and classes by using semantic word embeddings and existing state-of-the-art object identification algorithms. Our proposed model, Image Captioning using Novel Word Injection, uses a pre-trained caption generator and works on the output of the generator to inject objects that are not present in the dataset into the caption. We evaluate the model on standardized metrics, namely, BLEU, CIDEr and ROUGE-L. The results, qualitatively and quantitatively, outperform the underlying model.","PeriodicalId":130057,"journal":{"name":"2018 Digital Image Computing: Techniques and Applications (DICTA)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120848108","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Lane Detection Under Adverse Conditions Based on Dual Color Space","authors":"Nima Zarbakht, J. Zou","doi":"10.1109/DICTA.2018.8615785","DOIUrl":"https://doi.org/10.1109/DICTA.2018.8615785","url":null,"abstract":"A high level of situational awareness is essential to an advanced driver assistance system. One of the most important duties of such a system is the detection of lane markings on the road and to distinguish them from the road and other objects such as shadows, traffic, etc. A robust lane detection algorithm is critical to a lane departure warning system. It must determine the relative lane position reliably and rapidly using captured images. The available literature provides some methods to solve problems associated with adverse conditions such as precipitation, glare and blurred lane markings. However, the reliability of these methods can be adversely affected by the lighting conditions. In this paper, a new method is proposed that combines two distinct color spaces to reduce interference in a pre-processing step. The method is adaptive to different lighting situations. The directional gradient is used to detect the lane marking edges. The method can detect lane markings with different complexities imposed by shadows, rain, reflection, strong sources of light such as headlights and tail lights.","PeriodicalId":130057,"journal":{"name":"2018 Digital Image Computing: Techniques and Applications (DICTA)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129661082","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
C. Zhang, Yang Song, Sidong Liu, S. Lill, Chenyu Wang, Zihao Tang, Yuyi You, Yang Gao, A. Klistorner, M. Barnett, Weidong (Tom) Cai
{"title":"MS-GAN: GAN-Based Semantic Segmentation of Multiple Sclerosis Lesions in Brain Magnetic Resonance Imaging","authors":"C. Zhang, Yang Song, Sidong Liu, S. Lill, Chenyu Wang, Zihao Tang, Yuyi You, Yang Gao, A. Klistorner, M. Barnett, Weidong (Tom) Cai","doi":"10.1109/DICTA.2018.8615771","DOIUrl":"https://doi.org/10.1109/DICTA.2018.8615771","url":null,"abstract":"Automated segmentation of multiple sclerosis (MS) lesions in brain imaging is challenging due to the high variability in lesion characteristics. Based on the generative adversarial network (GAN), we propose a semantic segmentation framework MS-GAN to localize MS lesions in multimodal brain magnetic resonance imaging (MRI), which consists of one multimodal encoder-decoder generator G and multiple discriminators D corresponding to the multiple input modalities. For the design of the generator, we adopt an encoder-decoder deep learning architecture with bypass of spatial information from encoder to the corresponding decoder, which helps to reduce the network parameters while improving the localization performance. Our generator is also designed to integrate multimodal imaging data in end-to-end learning with multi-path encoding and cross-modality fusion. An additional classification-related constraint is proposed for the adversarial training process of the GAN model, with the aim of alleviating the hard-to-converge issue in classification-based image-to-image translation problems. For evaluation, we collected a database of 126 cases from patients with relapsing MS. We also experimented with other semantic segmentation models as well as patch-based deep learning methods for performance comparison. The results show that our method provides more accurate segmentation than the state-of-the-art techniques.","PeriodicalId":130057,"journal":{"name":"2018 Digital Image Computing: Techniques and Applications (DICTA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130664531","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Haoyuan Du, Liquan Dong, Ming Liu, Yuejin Zhao, W. Jia, Xiaohua Liu, Mei Hui, Lingqin Kong, Q. Hao
{"title":"Image Restoration Based on Deep Convolutional Network in Wavefront Coding Imaging System","authors":"Haoyuan Du, Liquan Dong, Ming Liu, Yuejin Zhao, W. Jia, Xiaohua Liu, Mei Hui, Lingqin Kong, Q. Hao","doi":"10.1109/DICTA.2018.8615824","DOIUrl":"https://doi.org/10.1109/DICTA.2018.8615824","url":null,"abstract":"Wavefront coding (WFC) is a prosperous technology for extending depth of field (DOF) in the incoherent imaging system. Digital recovery of the WFC technique is a classical ill-conditioned problem by removing the blurring effect and suppressing the noise. Traditional approaches relying on image heuristics suffer from high frequency noise amplification and processing artifacts. This paper investigates a general framework of neural networks for restoring images in WFC. To our knowledge, this is the first attempt for applying convolutional networks in WFC. The blur and additive noise are considered simultaneously. Two solutions respectively exploiting fully convolutional networks (FCN) and conditional Generative Adversarial Networks (CGAN) are presented. The FCN based on minimizing the mean squared reconstruction error (MSE) in pixel space gets high PSNR. On the other side, the CGAN based on perceptual loss optimization criterion retrieves more textures. We conduct comparison experiments to demonstrate the performance at different noise levels from the training configuration. We also reveal the image quality on non-natural test target image and defocused situation. The results indicate that the proposed networks outperform traditional approaches for restoring high frequency details and suppressing noise effectively.","PeriodicalId":130057,"journal":{"name":"2018 Digital Image Computing: Techniques and Applications (DICTA)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133891562","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Absolute and Relative Pose Estimation of a Multi-View Camera System using 2D-3D Line Pairs and Vertical Direction","authors":"Hichem Abdellali, Z. Kato","doi":"10.1109/DICTA.2018.8615792","DOIUrl":"https://doi.org/10.1109/DICTA.2018.8615792","url":null,"abstract":"We propose a new algorithm for estimating the absolute and relative pose of a multi-view camera system. The algorithm relies on two solvers: a direct solver using a minimal set of 6 line pairs and a least squares solver which uses all inlier 2D-3D line pairs. The algorithm have been validated on a large synthetic dataset, experimental results confirm the stable and real-time performance under realistic noise on the line parameters as well as on the vertical direction. Furthermore, the algorithm performs well on real data with less then half degree rotation error and less than 25 cm translation error.","PeriodicalId":130057,"journal":{"name":"2018 Digital Image Computing: Techniques and Applications (DICTA)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133297758","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Choon Giap Goh, Wee Han Lim, Justus Chua, I. Atmosukarto
{"title":"Image Analytics for Train Crowd Estimation","authors":"Choon Giap Goh, Wee Han Lim, Justus Chua, I. Atmosukarto","doi":"10.1109/DICTA.2018.8615794","DOIUrl":"https://doi.org/10.1109/DICTA.2018.8615794","url":null,"abstract":"Overcrowding is a common problem faced by train commuters in many countries. While waiting for the train at the stations, commuters tend to cluster and queue at doors that are closest to escalators and elevators that lead towards the station entrances and exits. This scenario results in trains not being fully utilized in terms of their capacity. As cabins with certain door positions tend to be more crowded than the rest of the cabins. The objective of this paper is to provide a methodology to estimate the crowd density within cabins of incoming trains, while leveraging on the existing train CCTV infrastructures. Providing the train cabin density information to commuters who are waiting for the incoming train allows the commuters to better select which cabin to board based on the provided density information. This will facilitate a better commuting experience without incurring a high cost for the train operator. To achieve this objective, we have adopted the usage of deep convolutional neural networks to analyze the footage from the existing security camera inside the trains and classify the images frames based the crowd level of train cabins. Three different experiments were conducted to train and test different convolutional neural network models. All models are able to make classification with an accuracy rate of over 90%.","PeriodicalId":130057,"journal":{"name":"2018 Digital Image Computing: Techniques and Applications (DICTA)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130177876","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Automated Military Vehicle Detection from Low-Altitude Aerial Images","authors":"F. Kamran, M. Shahzad, F. Shafait","doi":"10.1109/DICTA.2018.8615865","DOIUrl":"https://doi.org/10.1109/DICTA.2018.8615865","url":null,"abstract":"Detection and identification of military vehicles from aerial images is of great practical interest particularly for defense sector as it aids in predicting enemys move and hence, build early precautionary measures. Although due to advancement in the domain of self-driving cars, a vast literature of published algorithms exists that use the terrestrial data to solve the problem of vehicle detection in natural scenes. Directly translating these algorithms towards detection of both military and non-military vehicles in aerial images is not straight forward owing to high variability in scale, illumination and orientation together with articulations both in shape and structure. Moreover, unlike availability of terrestrial benchmark datasets such as Baidu Research Open-Access Dataset etc., there does not exist well-annotated datasets encompassing both military and non-military vehicles in aerial images which as a consequence limit the applicability of the state-of-the-art deep learning based object detection algorithms that have shown great success in the recent years. To this end, we have prepared a dataset of low-altitude aerial images that comprises of both real data (taken from military shows videos) and toy data (downloaded from YouTube videos). The dataset has been categorized into three main types, i.e., military vehicle, non-military vehicle and other non-vehicular objects. In total, there are 15,086 (11,733 toy and 3,353 real) vehicle images exhibiting a variety of different shapes, scales and orientations. To analyze the adequacy of the prepared dataset, we employed the state-of-the-art object detection algorithms to distinguish military and non-military vehicles. The experimental results show that the training of deep architectures using the customized/prepared dataset allows to recognize seven types of military and four types of non-military vehicles.","PeriodicalId":130057,"journal":{"name":"2018 Digital Image Computing: Techniques and Applications (DICTA)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128540476","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}