Anilcan Bulut, Fatmanur Ozdemir, Y. S. Bostanci, M. Soyturk
{"title":"Performance Evaluation of Recent Object Detection Models for Traffic Safety Applications on Edge","authors":"Anilcan Bulut, Fatmanur Ozdemir, Y. S. Bostanci, M. Soyturk","doi":"10.1145/3582177.3582178","DOIUrl":"https://doi.org/10.1145/3582177.3582178","url":null,"abstract":"Real-time objection detection is becoming more important and critical in all application areas, including Smart Transport and Smart City. From safety/security to resource efficiency, real-time image processing approaches are used more than ever. On the other hand, low-latency requirements and available resources present challenges. Edge computing integrated with cloud computing minimizes communication delays but requires efficient use of resources due to its limited resources. For example, although deep learning-based object detection methods give very accurate and reliable results, they require high computational power. This overhead reveals a need to implement deep learning models with less complex architectures for edge deployment. In this paper, the performance of evolving deep learning models with their lightweight versions such as YOLOv5-Nano, YOLOX-Nano, YOLOX-Tiny, YOLOv6-Nano, YOLOv6-Tiny, and YOLOv7-Tiny are evaluated on a commercially available edge device. The results show that YOLOv5-Nano and YOLOv6-Nano with their TensorRT versions can provide real-time applicability in approximately 35 milliseconds of inference time. It is also observed that YOLOv6-Tiny gives the highest average precision while YOLOv5-Nano gives the lowest energy consumption when compared to other models.","PeriodicalId":170327,"journal":{"name":"Proceedings of the 2023 5th International Conference on Image Processing and Machine Vision","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129520893","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Matching Images from Different Viewpoints with Deep Learning Based on LoFTR and MAGSAC++","authors":"Liang Tian","doi":"10.1145/3582177.3582181","DOIUrl":"https://doi.org/10.1145/3582177.3582181","url":null,"abstract":"Matching 2D images from different viewpoints plays a crucial role in the fields of Structure-from-Motion and 3D reconstruction. However, image matching for assorted and unstructured images with a wide variety of viewpoints leads to difficulty for traditional matching methods. In this paper, we propose a Transformer-based feature matching approach to capture the same physical points of a scene from two images with different viewpoints. The local features of images are extracted by the LoFTR, which is a detector-free deep-learning matching model on the basis of Transformer. The subsequent matching process is realized by the MAGSAC++ estimator, where the matching results are summarized in the fundamental matrix as the model output. By removing image feature points with low confidence scores and applying the test time augmentation, our approach can reach a mean Average Accuracy 0.81340 in the Kaggle competition Image Matching Challenge 2022. This score ranks 45/642 in the competition leaderboard, and can get a silver medal in this competition. Our work could help accelerate the research of generalized methods for Structure-from-Motion and 3D reconstruction, and would potentially deepen the understanding of image feature matching and related fields.","PeriodicalId":170327,"journal":{"name":"Proceedings of the 2023 5th International Conference on Image Processing and Machine Vision","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128860032","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jiangying Zhao, Y. Hu, Mingrui Tian, Xiaohua Xia, Peng Tan
{"title":"Penetration Point Detection for Autonomous Trench Excavation Based on Binocular Vision","authors":"Jiangying Zhao, Y. Hu, Mingrui Tian, Xiaohua Xia, Peng Tan","doi":"10.1145/3582177.3582191","DOIUrl":"https://doi.org/10.1145/3582177.3582191","url":null,"abstract":"To autonomously detect the penetration point in the working area of trench excavation, a feature detection method of penetration point based on binocular cameras was proposed. First, the homogeneous coordinate transformation is established, which can convert the 3D point cloud of the excavation area from the camera coordinate system to the excavator global base coordinate system. Then, the global gradient consistency function is designed to describe the geometric feature of the penetration point of a trench, and the position coordinates of the penetration point are detected. Finally, the test of the penetration point detection of the excavation area is conducted. Within the range of the excavation operation, the maximum position error of the penetration point detection is less than 80 mm, and the average detection error is 46.2 mm, which proves that this method can effectively detect the penetration point.","PeriodicalId":170327,"journal":{"name":"Proceedings of the 2023 5th International Conference on Image Processing and Machine Vision","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125643755","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Semi-supervised Defect Segmentation with Uncertainty-aware Pseudo-labels from Multi-branch Network","authors":"Dejene M. Sime, Guotai Wang, Zhi Zeng, Bei Peng","doi":"10.1145/3582177.3582190","DOIUrl":"https://doi.org/10.1145/3582177.3582190","url":null,"abstract":"Semi-supervised learning methods have recently gained considerable attention for training deep learning networks with limited labeled samples and additional large label-free samples. Consistency regularization and pseudo-labeling methods are among the most widely used semi-supervised learning methods. However, unreliable pseudo labels will largely limit the model’s performance when learning from unlabeled images. To alleviate this problem, we propose uncertainty-rectified pseudo labels generated from dynamically mixing predictions of multiple decoders with a shared encoder network for semi-supervised defect segmentation. We estimated the uncertainty as the prediction discrepancy between the average prediction and the output of each decoder head. The estimated uncertainty then guides the consistency training as well as the pseudo-label-based supervision. The proposed method achieved significant performance improvement over the fully supervised baseline and other state-of-the-art semi-supervised segmentation methods on similar labeled data proportions. We also performed an extensive ablation study to demonstrate that the proposed method performs well under various setups.","PeriodicalId":170327,"journal":{"name":"Proceedings of the 2023 5th International Conference on Image Processing and Machine Vision","volume":"420 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122794761","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Case study of 3D Interactive Design of Nanjing Tongji Gate Wall","authors":"Yang Cao, Yuan Gao, Ang Geng","doi":"10.1145/3582177.3582189","DOIUrl":"https://doi.org/10.1145/3582177.3582189","url":null,"abstract":"Through the case study of 3D interactive design of Nanjing Tongji Gate Wall, this paper explores the application of 3D interactive design in relic visual reconstruction. This paper collects literature on Nanjing City Wall and conducts field research and interviews about Tongji Gate. On the basis of relevant data and literature, 3D interactive technology is used to design the interactive animation system of Tongji Gate. 3D interactive animation contributes to the visual reconstruction of material cultural heritage, the improvement of traditional cultural relics protection and research, the adaptation to communication and promotion in the new media era, the protection of cultural heritage, and the continuous cultural inheritance.","PeriodicalId":170327,"journal":{"name":"Proceedings of the 2023 5th International Conference on Image Processing and Machine Vision","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130488268","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Action Recognition with Non-Uniform Key Frame Selector","authors":"Haohe Li, Chong Wang, Shenghao Yu, Chenchen Tao","doi":"10.1145/3582177.3582182","DOIUrl":"https://doi.org/10.1145/3582177.3582182","url":null,"abstract":"Current approaches for spatiotemporal action recognition have achieved impressive progress, especially in temporal information processing. Meanwhile, the power of spatial information may be underestimated. Thus, a non-uniform key frame selector is proposed to pick the most representative frames according to the relationship between frames along the temporal dimension. Specifically, the reweight high-level frame features are used to generate an importance score sequence, while the key frames, in each temporal section, are selected based on the above scores. Such selected frames have richer semantic information, which has positive impact on the network training. The proposed model is evaluated on two action recognition, namely datasets HMDB51 and UCF101, and promising accuracy improvement is achieved.","PeriodicalId":170327,"journal":{"name":"Proceedings of the 2023 5th International Conference on Image Processing and Machine Vision","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131511653","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Proceedings of the 2023 5th International Conference on Image Processing and Machine Vision","authors":"","doi":"10.1145/3582177","DOIUrl":"https://doi.org/10.1145/3582177","url":null,"abstract":"","PeriodicalId":170327,"journal":{"name":"Proceedings of the 2023 5th International Conference on Image Processing and Machine Vision","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131428210","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}