众包多标签音频注释任务与公民科学家

Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems Pub Date : 2019-05-02 DOI:10.1145/3290605.3300522

M. Cartwright, G. Dove, Ana Elisa Méndez Méndez, J. Bello, O. Nov

{"title":"众包多标签音频注释任务与公民科学家","authors":"M. Cartwright, G. Dove, Ana Elisa Méndez Méndez, J. Bello, O. Nov","doi":"10.1145/3290605.3300522","DOIUrl":null,"url":null,"abstract":"Annotating rich audio data is an essential aspect of training and evaluating machine listening systems. We approach this task in the context of temporally-complex urban soundscapes, which require multiple labels to identify overlapping sound sources. Typically this work is crowdsourced, and previous studies have shown that workers can quickly label audio with binary annotation for single classes. However, this approach can be difficult to scale when multiple passes with different focus classes are required to annotate data with multiple labels. In citizen science, where tasks are often image-based, annotation efforts typically label multiple classes simultaneously in a single pass. This paper describes our data collection on the Zooniverse citizen science platform, comparing the efficiencies of different audio annotation strategies. We compared multiple-pass binary annotation, single-pass multi-label annotation, and a hybrid approach: hierarchical multi-pass multi-label annotation. We discuss our findings, which support using multi-label annotation, with reference to volunteer citizen scientists' motivations.","PeriodicalId":20454,"journal":{"name":"Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2019-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"41","resultStr":"{\"title\":\"Crowdsourcing Multi-label Audio Annotation Tasks with Citizen Scientists\",\"authors\":\"M. Cartwright, G. Dove, Ana Elisa Méndez Méndez, J. Bello, O. Nov\",\"doi\":\"10.1145/3290605.3300522\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Annotating rich audio data is an essential aspect of training and evaluating machine listening systems. We approach this task in the context of temporally-complex urban soundscapes, which require multiple labels to identify overlapping sound sources. Typically this work is crowdsourced, and previous studies have shown that workers can quickly label audio with binary annotation for single classes. However, this approach can be difficult to scale when multiple passes with different focus classes are required to annotate data with multiple labels. In citizen science, where tasks are often image-based, annotation efforts typically label multiple classes simultaneously in a single pass. This paper describes our data collection on the Zooniverse citizen science platform, comparing the efficiencies of different audio annotation strategies. We compared multiple-pass binary annotation, single-pass multi-label annotation, and a hybrid approach: hierarchical multi-pass multi-label annotation. We discuss our findings, which support using multi-label annotation, with reference to volunteer citizen scientists' motivations.\",\"PeriodicalId\":20454,\"journal\":{\"name\":\"Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-05-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"41\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3290605.3300522\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3290605.3300522","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 41

摘要

注释丰富的音频数据是训练和评估机器听力系统的一个重要方面。我们在时间复杂的城市声景背景下处理这项任务，这需要多个标签来识别重叠的声源。这项工作通常是众包的，以前的研究表明，工作人员可以快速地为单个类标记二进制注释的音频。然而，当需要使用不同焦点类的多个传递来用多个标签注释数据时，这种方法可能很难扩展。在公民科学中，任务通常是基于图像的，注释工作通常在一次传递中同时标记多个类。本文描述了我们在Zooniverse公民科学平台上收集的数据，比较了不同音频注释策略的效率。我们比较了多通道二元标注、单通道多标签标注和混合方法:分层多通道多标签标注。我们讨论了我们的发现，支持使用多标签注释，参考志愿者公民科学家的动机。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Crowdsourcing Multi-label Audio Annotation Tasks with Citizen Scientists

Annotating rich audio data is an essential aspect of training and evaluating machine listening systems. We approach this task in the context of temporally-complex urban soundscapes, which require multiple labels to identify overlapping sound sources. Typically this work is crowdsourced, and previous studies have shown that workers can quickly label audio with binary annotation for single classes. However, this approach can be difficult to scale when multiple passes with different focus classes are required to annotate data with multiple labels. In citizen science, where tasks are often image-based, annotation efforts typically label multiple classes simultaneously in a single pass. This paper describes our data collection on the Zooniverse citizen science platform, comparing the efficiencies of different audio annotation strategies. We compared multiple-pass binary annotation, single-pass multi-label annotation, and a hybrid approach: hierarchical multi-pass multi-label annotation. We discuss our findings, which support using multi-label annotation, with reference to volunteer citizen scientists' motivations.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems

自引率

0.00%

发文量