{"title":"Sign Recognition - How well does Single Shot Multibox Detector sum up? A Quantitative Study","authors":"Manikandan Ravikiran","doi":"10.1109/AIPR.2018.8707409","DOIUrl":null,"url":null,"abstract":"Deep learning in traffic sign detection & recognition (TSDR) is widely explored in recent times due to its ability to produce state-of-the-art results and availability of public datasets. Two different architectures of detection networks are currently being developed: Single Shot and Region Proposal based approaches. Even though for the case of traffic sign detection, single shot method seem adequate, very few works to date has investigated this hypothesis quantitatively, with most works focusing on region proposal based detection architectures. Moreover, with the complexity of the TSDR task and limited performance of region proposal based approaches, a quantitative study of the single shot method is warranted which would, in turn, reveal its strengths and weakness for TSDR. As such in this paper, we revisit this topic through quantitative evaluation of state-of-the-art Single Shot Multibox Detector (SSD) on multiple standard benchmarks. More specifically, we try to quantify 1) Performance of SSD over multiple existing TSDR benchmarks namely GTSDB, STSDB and BTSDB 2) Generalization of SSD across the datasets 3) Impact of class overlap on SSD’s performance 4) Performance of SSD from synthetically generated datasets using Wikipedia Images. Through our study, we show that 1) SSD can reach performance >0.92 AUC for TSDR across standard benchmarks and in the process, we introduce new benchmarks for Romania(RTSDB) and Finland(FTSDB) in line with GTSDB 2) SSD model pretrained on GTSDB generalizes well for BTSDB and RTSDB with average AUC of 0.90 and comparatively lower for Sweden and Finland datasets. We find that scale selection and information loss as the primary reason for the limited generalization. In the due process, to address these issues we propose a convex optimization-based scale selection and Skip SSD - An architecture developed based on the concept of feature reuse leading to improvement in generalization. We also show that 3) SSD model augmented with small synthetically generated dataset produces close to state-of-the-art accuracy across GTSDB, STSDB and BTSDB 4) Class overlap is indeed a challenging problem to be addressed even in case of SSD. Further, we show detailed experiments and summarize our practical findings for those interested in getting the most out of SSD for TSDR.","PeriodicalId":230582,"journal":{"name":"2018 IEEE Applied Imagery Pattern Recognition Workshop (AIPR)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE Applied Imagery Pattern Recognition Workshop (AIPR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AIPR.2018.8707409","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Deep learning in traffic sign detection & recognition (TSDR) is widely explored in recent times due to its ability to produce state-of-the-art results and availability of public datasets. Two different architectures of detection networks are currently being developed: Single Shot and Region Proposal based approaches. Even though for the case of traffic sign detection, single shot method seem adequate, very few works to date has investigated this hypothesis quantitatively, with most works focusing on region proposal based detection architectures. Moreover, with the complexity of the TSDR task and limited performance of region proposal based approaches, a quantitative study of the single shot method is warranted which would, in turn, reveal its strengths and weakness for TSDR. As such in this paper, we revisit this topic through quantitative evaluation of state-of-the-art Single Shot Multibox Detector (SSD) on multiple standard benchmarks. More specifically, we try to quantify 1) Performance of SSD over multiple existing TSDR benchmarks namely GTSDB, STSDB and BTSDB 2) Generalization of SSD across the datasets 3) Impact of class overlap on SSD’s performance 4) Performance of SSD from synthetically generated datasets using Wikipedia Images. Through our study, we show that 1) SSD can reach performance >0.92 AUC for TSDR across standard benchmarks and in the process, we introduce new benchmarks for Romania(RTSDB) and Finland(FTSDB) in line with GTSDB 2) SSD model pretrained on GTSDB generalizes well for BTSDB and RTSDB with average AUC of 0.90 and comparatively lower for Sweden and Finland datasets. We find that scale selection and information loss as the primary reason for the limited generalization. In the due process, to address these issues we propose a convex optimization-based scale selection and Skip SSD - An architecture developed based on the concept of feature reuse leading to improvement in generalization. We also show that 3) SSD model augmented with small synthetically generated dataset produces close to state-of-the-art accuracy across GTSDB, STSDB and BTSDB 4) Class overlap is indeed a challenging problem to be addressed even in case of SSD. Further, we show detailed experiments and summarize our practical findings for those interested in getting the most out of SSD for TSDR.