{"title":"Intelligent Robot for Worker Safety Surveillance: Deep Learning Perception and Visual Navigation","authors":"Min-Fan Ricky Lee, Tzu-Wei Chien","doi":"10.1109/ARIS50834.2020.9205772","DOIUrl":null,"url":null,"abstract":"The fatal injury rate for the construction industry is higher than the average for all industries. Recently, researchers have shown an increased interest in occupational safety in the construction industry. However, all the current methods using conventional machine learning with stationary cameras suffer from some severe limitations, perceptual aliasing (e.g., different places/objects can appear identical), occlusion (e.g., place/object appearance changes between visits), seasonal / illumination changes, significant viewpoint changes, etc. This paper proposes a perception module using end-to-end deep-learning and visual SLAM (Simultaneous Localization and Mapping) for an effective and efficient object recognition and navigation using a differential-drive mobile robot. Various deep-learning frameworks and visual navigation strategies with evaluation metrics are implemented and validated for the selection of the best model. The deep-learning model's predictions are evaluated via the metrics (model speed, accuracy, complexity, precision, recall, P-R curve, F1 score). The YOLOv3 shows the best trade-off among all algorithms, 57.9% mean average precision (mAP), in real-world settings, and can process 45 frames per second (FPS) on NVIDIA Jetson TX2 which makes it suitable for real-time detection, as well as a right candidate for deploying the neural network on a mobile robot. The evaluation metrics used for the comparison of laser SLAM are Root Mean Square Error (RMSE). The Google Cartographer SLAM shows the lowest RMSE and acceptable processing time. The experimental results demonstrate that the perception module can meet the requirements of head protection criteria in Occupational Safety and Health Administration (OSHA) standards for construction. To be more precise, this module can effectively detect construction worker's non-hardhat-use in different construction site conditions and can facilitate improved safety inspection and supervision.","PeriodicalId":423389,"journal":{"name":"2020 International Conference on Advanced Robotics and Intelligent Systems (ARIS)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 International Conference on Advanced Robotics and Intelligent Systems (ARIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ARIS50834.2020.9205772","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6
Abstract
The fatal injury rate for the construction industry is higher than the average for all industries. Recently, researchers have shown an increased interest in occupational safety in the construction industry. However, all the current methods using conventional machine learning with stationary cameras suffer from some severe limitations, perceptual aliasing (e.g., different places/objects can appear identical), occlusion (e.g., place/object appearance changes between visits), seasonal / illumination changes, significant viewpoint changes, etc. This paper proposes a perception module using end-to-end deep-learning and visual SLAM (Simultaneous Localization and Mapping) for an effective and efficient object recognition and navigation using a differential-drive mobile robot. Various deep-learning frameworks and visual navigation strategies with evaluation metrics are implemented and validated for the selection of the best model. The deep-learning model's predictions are evaluated via the metrics (model speed, accuracy, complexity, precision, recall, P-R curve, F1 score). The YOLOv3 shows the best trade-off among all algorithms, 57.9% mean average precision (mAP), in real-world settings, and can process 45 frames per second (FPS) on NVIDIA Jetson TX2 which makes it suitable for real-time detection, as well as a right candidate for deploying the neural network on a mobile robot. The evaluation metrics used for the comparison of laser SLAM are Root Mean Square Error (RMSE). The Google Cartographer SLAM shows the lowest RMSE and acceptable processing time. The experimental results demonstrate that the perception module can meet the requirements of head protection criteria in Occupational Safety and Health Administration (OSHA) standards for construction. To be more precise, this module can effectively detect construction worker's non-hardhat-use in different construction site conditions and can facilitate improved safety inspection and supervision.
建筑业的致命伤害率高于所有行业的平均水平。最近,研究人员对建筑行业的职业安全表现出越来越大的兴趣。然而,目前所有使用固定相机的传统机器学习方法都存在一些严重的局限性,如感知混叠(例如,不同的地方/物体可能看起来相同)、遮挡(例如,两次访问之间的地方/物体外观变化)、季节/光照变化、重大的视点变化等。本文提出了一种基于端到端深度学习和视觉SLAM (Simultaneous Localization and Mapping)的感知模块,用于差分驱动移动机器人的有效和高效的目标识别和导航。实现并验证了各种深度学习框架和带有评估指标的视觉导航策略,以选择最佳模型。深度学习模型的预测通过指标(模型速度、准确性、复杂性、精度、召回率、P-R曲线、F1分数)进行评估。YOLOv3在所有算法中表现出最好的折衷,在现实环境中平均精度(mAP)为57.9%,并且可以在NVIDIA Jetson TX2上每秒处理45帧(FPS),这使得它适合于实时检测,并且是在移动机器人上部署神经网络的合适候选人。比较激光SLAM的评价指标为均方根误差(RMSE)。b谷歌Cartographer SLAM显示最低RMSE和可接受的处理时间。实验结果表明,感知模块能够满足OSHA (Occupational Safety and Health Administration,职业安全与健康管理局)建筑标准中头部保护标准的要求。更准确地说,该模块可以有效地检测建筑工人在不同施工现场条件下的不戴安全帽情况,便于改进安全检查和监督。