Street Object Detection from Synthesized and Processed Semantic Image: A Deep Learning Based Study

Human-Centric Intelligent Systems Pub Date : 2023-09-20 DOI:10.1007/s44230-023-00043-1

Parthaw Goswami, A. B. M. Aowlad Hossain

{"title":"Street Object Detection from Synthesized and Processed Semantic Image: A Deep Learning Based Study","authors":"Parthaw Goswami, A. B. M. Aowlad Hossain","doi":"10.1007/s44230-023-00043-1","DOIUrl":null,"url":null,"abstract":"Abstract Semantic image synthesis plays an important role in the development of Advanced Driver Assistance System (ADAS). Street objects detection might be erroneous during raining or when images from vehicle’s camera are blurred, which can cause serious accidents. Therefore, automatic and accurate street object detection is a demanding research scope. In this paper, a deep learning based framework is proposed and investigated for street object detection from synthesized and processed semantic image. Firstly, a Conditional Generative Adversarial Network (CGAN) has been used to create the realistic image. The brightness of the CGAN generated image has been increased using neural style transfer method. Furthermore, Enhanced Super-Resolution Generative Adversarial Networks (ESRGAN) based image enhancement concept has been used to improve the resolution of style-transferred images. These processed images exhibit better clarity and high fidelity which is impactful in the performance improvement of object detector. Finally, the synthesized and processed images were used as input in a Region-based Convolutional Neural Network (Faster R-CNN) and a MobileNet Single Shot Detector (MobileNetSSDv2) model separately for object detection. The widely used Cityscape dataset is used to investigate the performance of the proposed framework. The results analysis shows that the used synthesized and processed input improves the performance of the detectors than the unprocessed counterpart. A comparison of the proposed detection framework with related state of the art techniques is also found satisfactory with a mean average precision (mAP) around 32.6%, whereas most of the cases, mAPs are reported in the range of 20–28% for this particular dataset.","PeriodicalId":303535,"journal":{"name":"Human-Centric Intelligent Systems","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Human-Centric Intelligent Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s44230-023-00043-1","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Abstract Semantic image synthesis plays an important role in the development of Advanced Driver Assistance System (ADAS). Street objects detection might be erroneous during raining or when images from vehicle’s camera are blurred, which can cause serious accidents. Therefore, automatic and accurate street object detection is a demanding research scope. In this paper, a deep learning based framework is proposed and investigated for street object detection from synthesized and processed semantic image. Firstly, a Conditional Generative Adversarial Network (CGAN) has been used to create the realistic image. The brightness of the CGAN generated image has been increased using neural style transfer method. Furthermore, Enhanced Super-Resolution Generative Adversarial Networks (ESRGAN) based image enhancement concept has been used to improve the resolution of style-transferred images. These processed images exhibit better clarity and high fidelity which is impactful in the performance improvement of object detector. Finally, the synthesized and processed images were used as input in a Region-based Convolutional Neural Network (Faster R-CNN) and a MobileNet Single Shot Detector (MobileNetSSDv2) model separately for object detection. The widely used Cityscape dataset is used to investigate the performance of the proposed framework. The results analysis shows that the used synthesized and processed input improves the performance of the detectors than the unprocessed counterpart. A comparison of the proposed detection framework with related state of the art techniques is also found satisfactory with a mean average precision (mAP) around 32.6%, whereas most of the cases, mAPs are reported in the range of 20–28% for this particular dataset.

查看原文本刊更多论文

基于合成和处理语义图像的街道目标检测:一种基于深度学习的研究

摘要语义图像合成在高级驾驶辅助系统(ADAS)的发展中起着重要作用。在下雨或车辆摄像头的图像模糊时，街道物体检测可能会出错，这可能导致严重的事故。因此，自动准确的街道目标检测是一个要求很高的研究领域。本文提出并研究了一种基于深度学习的基于合成和处理语义图像的街道目标检测框架。首先，使用条件生成对抗网络(CGAN)生成真实图像。采用神经风格转移方法提高了CGAN生成图像的亮度。此外，基于增强超分辨率生成对抗网络(Enhanced Super-Resolution Generative Adversarial Networks, ESRGAN)的图像增强概念被用于提高风格转移图像的分辨率。这些处理后的图像具有更好的清晰度和高保真度，这对提高目标检测器的性能具有重要意义。最后，将合成和处理后的图像分别作为基于区域的卷积神经网络(Faster R-CNN)和MobileNet单镜头检测器(MobileNetSSDv2)模型的输入进行目标检测。使用广泛使用的城市景观数据集来研究所提出框架的性能。结果分析表明，与未经处理的输入相比，经过综合处理的输入提高了检测器的性能。将所提出的检测框架与相关的最新技术进行比较，发现平均精度(mAP)约为32.6%，而大多数情况下，该特定数据集的mAP在20-28%的范围内。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Human-Centric Intelligent Systems

自引率

0.00%

发文量