基于单镜头形状的模态到模态实例分割

2020 IEEE 16th International Conference on Automation Science and Engineering (CASE) Pub Date : 2020-08-01 DOI:10.1109/CASE48305.2020.9216733

Andrew Li, Michael Danielczuk, Ken Goldberg

{"title":"基于单镜头形状的模态到模态实例分割","authors":"Andrew Li, Michael Danielczuk, Ken Goldberg","doi":"10.1109/CASE48305.2020.9216733","DOIUrl":null,"url":null,"abstract":"Image instance segmentation plays an important role in mechanical search, a task where robots must search for a target object in a cluttered scene. Perception pipelines for this task often rely on target object color or depth information and require multiple networks to segment and identify the target object. However, creating large training datasets of real images for these networks can be time intensive and the networks may require retraining for novel objects. We propose OSSIS, a single-stage One-Shot Shape-based Instance Segmentation algorithm that produces the target object modal segmentation mask in a depth image of a scene based only on a binary shape mask of the target object. We train a fully-convolutional Siamese network with 800, 000 pairs of synthetic binary target object masks and scene depth images, then evaluate the network with real target objects never seen during training in densely-cluttered scenes with target object occlusions. OSSIS achieves a one-shot mean intersection-over-union (mIoU) of 0.38 on the real data, improving on filter matching and two-stage CNN baselines by 21% and 6%, respectively, while reducing computation time by 50 times as compared to the two-stage CNN due in part to the fact that OSSIS is one-stage and does not require pairwise segmentation mask comparisons.","PeriodicalId":212181,"journal":{"name":"2020 IEEE 16th International Conference on Automation Science and Engineering (CASE)","volume":"82 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"One-Shot Shape-Based Amodal-to-Modal Instance Segmentation\",\"authors\":\"Andrew Li, Michael Danielczuk, Ken Goldberg\",\"doi\":\"10.1109/CASE48305.2020.9216733\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Image instance segmentation plays an important role in mechanical search, a task where robots must search for a target object in a cluttered scene. Perception pipelines for this task often rely on target object color or depth information and require multiple networks to segment and identify the target object. However, creating large training datasets of real images for these networks can be time intensive and the networks may require retraining for novel objects. We propose OSSIS, a single-stage One-Shot Shape-based Instance Segmentation algorithm that produces the target object modal segmentation mask in a depth image of a scene based only on a binary shape mask of the target object. We train a fully-convolutional Siamese network with 800, 000 pairs of synthetic binary target object masks and scene depth images, then evaluate the network with real target objects never seen during training in densely-cluttered scenes with target object occlusions. OSSIS achieves a one-shot mean intersection-over-union (mIoU) of 0.38 on the real data, improving on filter matching and two-stage CNN baselines by 21% and 6%, respectively, while reducing computation time by 50 times as compared to the two-stage CNN due in part to the fact that OSSIS is one-stage and does not require pairwise segmentation mask comparisons.\",\"PeriodicalId\":212181,\"journal\":{\"name\":\"2020 IEEE 16th International Conference on Automation Science and Engineering (CASE)\",\"volume\":\"82 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 IEEE 16th International Conference on Automation Science and Engineering (CASE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CASE48305.2020.9216733\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE 16th International Conference on Automation Science and Engineering (CASE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CASE48305.2020.9216733","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

图像实例分割在机械搜索中起着重要的作用，这是一项机器人必须在混乱的场景中搜索目标物体的任务。该任务的感知管道通常依赖于目标物体的颜色或深度信息，并且需要多个网络来分割和识别目标物体。然而，为这些网络创建真实图像的大型训练数据集可能会耗费大量时间，并且网络可能需要对新对象进行重新训练。我们提出了OSSIS，这是一种单阶段的基于形状的实例分割算法，它仅基于目标物体的二进制形状掩模在场景的深度图像中产生目标物体的模态分割掩模。我们用80万对合成的二进制目标物体蒙版和场景深度图像训练了一个全卷积的Siamese网络，然后在目标物体遮挡的密集混乱场景中，用从未见过的真实目标物体来评估该网络。OSSIS在真实数据上实现了0.38的一次平均相交-过并(mIoU)，在滤波器匹配和两阶段CNN基线方面分别提高了21%和6%，同时与两阶段CNN相比，计算时间减少了50倍，部分原因是OSSIS是单阶段的，不需要两两分割掩码比较。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

One-Shot Shape-Based Amodal-to-Modal Instance Segmentation

Image instance segmentation plays an important role in mechanical search, a task where robots must search for a target object in a cluttered scene. Perception pipelines for this task often rely on target object color or depth information and require multiple networks to segment and identify the target object. However, creating large training datasets of real images for these networks can be time intensive and the networks may require retraining for novel objects. We propose OSSIS, a single-stage One-Shot Shape-based Instance Segmentation algorithm that produces the target object modal segmentation mask in a depth image of a scene based only on a binary shape mask of the target object. We train a fully-convolutional Siamese network with 800, 000 pairs of synthetic binary target object masks and scene depth images, then evaluate the network with real target objects never seen during training in densely-cluttered scenes with target object occlusions. OSSIS achieves a one-shot mean intersection-over-union (mIoU) of 0.38 on the real data, improving on filter matching and two-stage CNN baselines by 21% and 6%, respectively, while reducing computation time by 50 times as compared to the two-stage CNN due in part to the fact that OSSIS is one-stage and does not require pairwise segmentation mask comparisons.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2020 IEEE 16th International Conference on Automation Science and Engineering (CASE)

自引率

0.00%

发文量