{"title":"MonoSOD:基于预测深度的单目显著目标检测","authors":"George Dimas, Panagiota Gatoula, D. Iakovidis","doi":"10.1109/ICRA48506.2021.9561211","DOIUrl":null,"url":null,"abstract":"Salient object detection (SOD) can directly improve the performance of tasks like obstacle detection, semantic segmentation and object recognition. Such tasks are important for robotic and other autonomous navigation systems. State-of-the-art SOD methodologies, provide improved performance by incorporating depth information, usually acquired using additional specialized sensors, e.g., RGB-D cameras. This introduces an overhead to the overall cost and flexibility of such systems. Nevertheless, the recent advances of machine learning, have provided models, capable of generating depth map approximations, given a single RGB image. In this work, we propose a novel monocular SOD (MonoSOD) methodology, based on a two-branch CNN autoencoder architecture capable of predicting depth maps and estimating saliency through a trainable refinement scheme. Its application on benchmark datasets, indicates that its performance is comparable to that of state-of-the-art SOD methods relying on RGB-D data. Therefore, it could be considered as a lower-cost alternative of such methods for future applications.","PeriodicalId":108312,"journal":{"name":"2021 IEEE International Conference on Robotics and Automation (ICRA)","volume":"50 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"MonoSOD: Monocular Salient Object Detection based on Predicted Depth\",\"authors\":\"George Dimas, Panagiota Gatoula, D. Iakovidis\",\"doi\":\"10.1109/ICRA48506.2021.9561211\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Salient object detection (SOD) can directly improve the performance of tasks like obstacle detection, semantic segmentation and object recognition. Such tasks are important for robotic and other autonomous navigation systems. State-of-the-art SOD methodologies, provide improved performance by incorporating depth information, usually acquired using additional specialized sensors, e.g., RGB-D cameras. This introduces an overhead to the overall cost and flexibility of such systems. Nevertheless, the recent advances of machine learning, have provided models, capable of generating depth map approximations, given a single RGB image. In this work, we propose a novel monocular SOD (MonoSOD) methodology, based on a two-branch CNN autoencoder architecture capable of predicting depth maps and estimating saliency through a trainable refinement scheme. Its application on benchmark datasets, indicates that its performance is comparable to that of state-of-the-art SOD methods relying on RGB-D data. Therefore, it could be considered as a lower-cost alternative of such methods for future applications.\",\"PeriodicalId\":108312,\"journal\":{\"name\":\"2021 IEEE International Conference on Robotics and Automation (ICRA)\",\"volume\":\"50 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-05-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE International Conference on Robotics and Automation (ICRA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICRA48506.2021.9561211\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Conference on Robotics and Automation (ICRA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICRA48506.2021.9561211","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
MonoSOD: Monocular Salient Object Detection based on Predicted Depth
Salient object detection (SOD) can directly improve the performance of tasks like obstacle detection, semantic segmentation and object recognition. Such tasks are important for robotic and other autonomous navigation systems. State-of-the-art SOD methodologies, provide improved performance by incorporating depth information, usually acquired using additional specialized sensors, e.g., RGB-D cameras. This introduces an overhead to the overall cost and flexibility of such systems. Nevertheless, the recent advances of machine learning, have provided models, capable of generating depth map approximations, given a single RGB image. In this work, we propose a novel monocular SOD (MonoSOD) methodology, based on a two-branch CNN autoencoder architecture capable of predicting depth maps and estimating saliency through a trainable refinement scheme. Its application on benchmark datasets, indicates that its performance is comparable to that of state-of-the-art SOD methods relying on RGB-D data. Therefore, it could be considered as a lower-cost alternative of such methods for future applications.