{"title":"实现视图无关车型识别中的边缘计算能力","authors":"","doi":"10.1016/j.ijtst.2023.03.007","DOIUrl":null,"url":null,"abstract":"<div><p>Vehicle model recognition (VMR) benefits the parking, surveillance, and tolling system by automatically identifying the exact make and model of the passing vehicles. Edge computing technology enables the roadside facilities and mobile cameras to achcieve VMR in real-time. Current work generally relies on a specific view of the vehicle or requires huge calculation capability to deploy the end-to-end deep learning network. This paper proposes a lightweight two-stage identification method based on object detection and image retrieval techniques, which empowers us the ability of recognizing the vehicle model from an arbitrary view. The first-stage model estimates the vehicle posture using object detection and similarity matching, which is cost-efficient and suitable to be programmed in the edge computing devices; the second-stage model retrieves the vehicle’s label from the dataset based on gradient boosting decision tree (GBDT) algorithm and VGGNet, which is flexible to the changing dataset. More than 8 000 vehicle images are labeled with their components’ information, such as headlights, windows, wheels, and logos. The YOLO network is employed to detect and localize the typical components of a vehicle. The vehicle postures are estimated by the spatial relationship between different segmented components. Due to the variety of the perspectives, a 7-dimensional vector is defined to represent the relative posture of the vehicle and screen out the images with a similar photographic perspective. Two algorithms are used to extract the features from each image patch: (1) the scale invariant feature transform (SIFT) combined with the bag-of-features (BoF) and (2) pre-trained deep neural network. The GBDT is applied to evaluate the weight of each component regarding its impact on VMR. The descriptors of each component are then aggregated to retrieve the best matching image from the database. The results showed its advantages in terms of accuracy (89.2%) and efficiency, demonstrating the vast potential of applying this method to large-scale vehicle model recognition.</p></div>","PeriodicalId":52282,"journal":{"name":"International Journal of Transportation Science and Technology","volume":null,"pages":null},"PeriodicalIF":4.3000,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S204604302300028X/pdfft?md5=ce6f5579d9069f7f5e9ff520676a8fd5&pid=1-s2.0-S204604302300028X-main.pdf","citationCount":"0","resultStr":"{\"title\":\"Enabling edge computing ability in view-independent vehicle model recognition\",\"authors\":\"\",\"doi\":\"10.1016/j.ijtst.2023.03.007\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Vehicle model recognition (VMR) benefits the parking, surveillance, and tolling system by automatically identifying the exact make and model of the passing vehicles. Edge computing technology enables the roadside facilities and mobile cameras to achcieve VMR in real-time. Current work generally relies on a specific view of the vehicle or requires huge calculation capability to deploy the end-to-end deep learning network. This paper proposes a lightweight two-stage identification method based on object detection and image retrieval techniques, which empowers us the ability of recognizing the vehicle model from an arbitrary view. The first-stage model estimates the vehicle posture using object detection and similarity matching, which is cost-efficient and suitable to be programmed in the edge computing devices; the second-stage model retrieves the vehicle’s label from the dataset based on gradient boosting decision tree (GBDT) algorithm and VGGNet, which is flexible to the changing dataset. More than 8 000 vehicle images are labeled with their components’ information, such as headlights, windows, wheels, and logos. The YOLO network is employed to detect and localize the typical components of a vehicle. The vehicle postures are estimated by the spatial relationship between different segmented components. Due to the variety of the perspectives, a 7-dimensional vector is defined to represent the relative posture of the vehicle and screen out the images with a similar photographic perspective. Two algorithms are used to extract the features from each image patch: (1) the scale invariant feature transform (SIFT) combined with the bag-of-features (BoF) and (2) pre-trained deep neural network. The GBDT is applied to evaluate the weight of each component regarding its impact on VMR. The descriptors of each component are then aggregated to retrieve the best matching image from the database. The results showed its advantages in terms of accuracy (89.2%) and efficiency, demonstrating the vast potential of applying this method to large-scale vehicle model recognition.</p></div>\",\"PeriodicalId\":52282,\"journal\":{\"name\":\"International Journal of Transportation Science and Technology\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":4.3000,\"publicationDate\":\"2024-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S204604302300028X/pdfft?md5=ce6f5579d9069f7f5e9ff520676a8fd5&pid=1-s2.0-S204604302300028X-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Transportation Science and Technology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S204604302300028X\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"TRANSPORTATION\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Transportation Science and Technology","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S204604302300028X","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"TRANSPORTATION","Score":null,"Total":0}
Enabling edge computing ability in view-independent vehicle model recognition
Vehicle model recognition (VMR) benefits the parking, surveillance, and tolling system by automatically identifying the exact make and model of the passing vehicles. Edge computing technology enables the roadside facilities and mobile cameras to achcieve VMR in real-time. Current work generally relies on a specific view of the vehicle or requires huge calculation capability to deploy the end-to-end deep learning network. This paper proposes a lightweight two-stage identification method based on object detection and image retrieval techniques, which empowers us the ability of recognizing the vehicle model from an arbitrary view. The first-stage model estimates the vehicle posture using object detection and similarity matching, which is cost-efficient and suitable to be programmed in the edge computing devices; the second-stage model retrieves the vehicle’s label from the dataset based on gradient boosting decision tree (GBDT) algorithm and VGGNet, which is flexible to the changing dataset. More than 8 000 vehicle images are labeled with their components’ information, such as headlights, windows, wheels, and logos. The YOLO network is employed to detect and localize the typical components of a vehicle. The vehicle postures are estimated by the spatial relationship between different segmented components. Due to the variety of the perspectives, a 7-dimensional vector is defined to represent the relative posture of the vehicle and screen out the images with a similar photographic perspective. Two algorithms are used to extract the features from each image patch: (1) the scale invariant feature transform (SIFT) combined with the bag-of-features (BoF) and (2) pre-trained deep neural network. The GBDT is applied to evaluate the weight of each component regarding its impact on VMR. The descriptors of each component are then aggregated to retrieve the best matching image from the database. The results showed its advantages in terms of accuracy (89.2%) and efficiency, demonstrating the vast potential of applying this method to large-scale vehicle model recognition.