具有原位对象注释的手势感知交互式机器教学

Proceedings of the 35th Annual ACM Symposium on User Interface Software and Technology Pub Date : 2022-08-02 DOI:10.1145/3526113.3545648

Zhongyi Zhou, K. Yatani

{"title":"具有原位对象注释的手势感知交互式机器教学","authors":"Zhongyi Zhou, K. Yatani","doi":"10.1145/3526113.3545648","DOIUrl":null,"url":null,"abstract":"Interactive Machine Teaching (IMT) systems allow non-experts to easily create Machine Learning (ML) models. However, existing vision-based IMT systems either ignore annotations on the objects of interest or require users to annotate in a post-hoc manner. Without the annotations on objects, the model may misinterpret the objects using unrelated features. Post-hoc annotations cause additional workload, which diminishes the usability of the overall model building process. In this paper, we develop LookHere, which integrates in-situ object annotations into vision-based IMT. LookHere exploits users’ deictic gestures to segment the objects of interest in real time. This segmentation information can be additionally used for training. To achieve the reliable performance of this object segmentation, we utilize our custom dataset called HuTics, including 2040 front-facing images of deictic gestures toward various objects by 170 people. The quantitative results of our user study showed that participants were 16.3 times faster in creating a model with our system compared to a standard IMT system with a post-hoc annotation process while demonstrating comparable accuracies. Additionally, models created by our system showed a significant accuracy improvement (ΔmIoU = 0.466) in segmenting the objects of interest compared to those without annotations.","PeriodicalId":200048,"journal":{"name":"Proceedings of the 35th Annual ACM Symposium on User Interface Software and Technology","volume":"3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Gesture-aware Interactive Machine Teaching with In-situ Object Annotations\",\"authors\":\"Zhongyi Zhou, K. Yatani\",\"doi\":\"10.1145/3526113.3545648\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Interactive Machine Teaching (IMT) systems allow non-experts to easily create Machine Learning (ML) models. However, existing vision-based IMT systems either ignore annotations on the objects of interest or require users to annotate in a post-hoc manner. Without the annotations on objects, the model may misinterpret the objects using unrelated features. Post-hoc annotations cause additional workload, which diminishes the usability of the overall model building process. In this paper, we develop LookHere, which integrates in-situ object annotations into vision-based IMT. LookHere exploits users’ deictic gestures to segment the objects of interest in real time. This segmentation information can be additionally used for training. To achieve the reliable performance of this object segmentation, we utilize our custom dataset called HuTics, including 2040 front-facing images of deictic gestures toward various objects by 170 people. The quantitative results of our user study showed that participants were 16.3 times faster in creating a model with our system compared to a standard IMT system with a post-hoc annotation process while demonstrating comparable accuracies. Additionally, models created by our system showed a significant accuracy improvement (ΔmIoU = 0.466) in segmenting the objects of interest compared to those without annotations.\",\"PeriodicalId\":200048,\"journal\":{\"name\":\"Proceedings of the 35th Annual ACM Symposium on User Interface Software and Technology\",\"volume\":\"3 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-08-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 35th Annual ACM Symposium on User Interface Software and Technology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3526113.3545648\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 35th Annual ACM Symposium on User Interface Software and Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3526113.3545648","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

摘要

交互式机器教学(IMT)系统允许非专家轻松创建机器学习(ML)模型。然而，现有的基于视觉的IMT系统要么忽略感兴趣对象上的注释，要么要求用户以事后的方式进行注释。如果没有对象上的注释，模型可能会使用不相关的特征来误解对象。事后注释会导致额外的工作量，从而降低整个模型构建过程的可用性。在本文中，我们开发了LookHere，它将原位对象注释集成到基于视觉的IMT中。LookHere利用用户的指示手势来实时分割感兴趣的对象。这些分割信息还可以用于训练。为了实现这种对象分割的可靠性能，我们使用了我们的自定义数据集HuTics，其中包括170个人对各种对象的指示手势的2040个正面图像。我们的用户研究的定量结果表明，与具有事后注释过程的标准IMT系统相比，参与者使用我们的系统创建模型的速度要快16.3倍，同时显示出相当的准确性。此外，与没有注释的模型相比，我们的系统创建的模型在分割感兴趣的对象方面显示出显着的准确性提高(ΔmIoU = 0.466)。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Gesture-aware Interactive Machine Teaching with In-situ Object Annotations

Interactive Machine Teaching (IMT) systems allow non-experts to easily create Machine Learning (ML) models. However, existing vision-based IMT systems either ignore annotations on the objects of interest or require users to annotate in a post-hoc manner. Without the annotations on objects, the model may misinterpret the objects using unrelated features. Post-hoc annotations cause additional workload, which diminishes the usability of the overall model building process. In this paper, we develop LookHere, which integrates in-situ object annotations into vision-based IMT. LookHere exploits users’ deictic gestures to segment the objects of interest in real time. This segmentation information can be additionally used for training. To achieve the reliable performance of this object segmentation, we utilize our custom dataset called HuTics, including 2040 front-facing images of deictic gestures toward various objects by 170 people. The quantitative results of our user study showed that participants were 16.3 times faster in creating a model with our system compared to a standard IMT system with a post-hoc annotation process while demonstrating comparable accuracies. Additionally, models created by our system showed a significant accuracy improvement (ΔmIoU = 0.466) in segmenting the objects of interest compared to those without annotations.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 35th Annual ACM Symposium on User Interface Software and Technology

自引率

0.00%

发文量