探索手语数据集的收集:隐私、参与和模型性能

Proceedings of the 22nd International ACM SIGACCESS Conference on Computers and Accessibility Pub Date : 2020-10-26 DOI:10.1145/3373625.3417024

Danielle Bragg, Oscar Koller, Naomi K. Caselli, W. Thies

{"title":"探索手语数据集的收集:隐私、参与和模型性能","authors":"Danielle Bragg, Oscar Koller, Naomi K. Caselli, W. Thies","doi":"10.1145/3373625.3417024","DOIUrl":null,"url":null,"abstract":"As machine learning algorithms continue to improve, collecting training data becomes increasingly valuable. At the same time, increased focus on data collection may introduce compounding privacy concerns. Accessibility projects in particular may put vulnerable populations at risk, as disability status is sensitive, and collecting data from small populations limits anonymity. To help address privacy concerns while maintaining algorithmic performance on machine learning tasks, we propose privacy-enhancing distortions of training datasets. We explore this idea through the lens of sign language video collection, which is crucial for advancing sign language recognition and translation. We present a web study exploring signers’ concerns in contributing to video corpora and their attitudes about using filters, and a computer vision experiment exploring sign language recognition performance with filtered data. Our results suggest that privacy concerns may exist in contributing to sign language corpora, that filters (especially expressive avatars and blurred faces) may impact willingness to participate, and that training on more filtered data may boost recognition accuracy in some cases.","PeriodicalId":433618,"journal":{"name":"Proceedings of the 22nd International ACM SIGACCESS Conference on Computers and Accessibility","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"26","resultStr":"{\"title\":\"Exploring Collection of Sign Language Datasets: Privacy, Participation, and Model Performance\",\"authors\":\"Danielle Bragg, Oscar Koller, Naomi K. Caselli, W. Thies\",\"doi\":\"10.1145/3373625.3417024\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"As machine learning algorithms continue to improve, collecting training data becomes increasingly valuable. At the same time, increased focus on data collection may introduce compounding privacy concerns. Accessibility projects in particular may put vulnerable populations at risk, as disability status is sensitive, and collecting data from small populations limits anonymity. To help address privacy concerns while maintaining algorithmic performance on machine learning tasks, we propose privacy-enhancing distortions of training datasets. We explore this idea through the lens of sign language video collection, which is crucial for advancing sign language recognition and translation. We present a web study exploring signers’ concerns in contributing to video corpora and their attitudes about using filters, and a computer vision experiment exploring sign language recognition performance with filtered data. Our results suggest that privacy concerns may exist in contributing to sign language corpora, that filters (especially expressive avatars and blurred faces) may impact willingness to participate, and that training on more filtered data may boost recognition accuracy in some cases.\",\"PeriodicalId\":433618,\"journal\":{\"name\":\"Proceedings of the 22nd International ACM SIGACCESS Conference on Computers and Accessibility\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-10-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"26\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 22nd International ACM SIGACCESS Conference on Computers and Accessibility\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3373625.3417024\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 22nd International ACM SIGACCESS Conference on Computers and Accessibility","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3373625.3417024","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 26

摘要

随着机器学习算法的不断改进，收集训练数据变得越来越有价值。与此同时，对数据收集的日益关注可能会带来更多的隐私问题。无障碍项目尤其可能使弱势群体处于危险之中，因为残疾状况是敏感的，而从小群体收集数据限制了匿名性。为了帮助解决隐私问题，同时保持机器学习任务的算法性能，我们提出了训练数据集的隐私增强扭曲。我们通过手语视频收集的镜头来探索这一想法，这对促进手语识别和翻译至关重要。我们提出了一项网络研究，探讨了签字人对视频语料库的贡献和他们对使用过滤器的态度，以及一个计算机视觉实验，探讨了过滤数据的手语识别性能。我们的研究结果表明，在手语语料库中可能存在隐私问题，过滤器(特别是富有表现力的头像和模糊的面孔)可能会影响参与的意愿，在某些情况下，对更多过滤数据的训练可能会提高识别的准确性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Exploring Collection of Sign Language Datasets: Privacy, Participation, and Model Performance

As machine learning algorithms continue to improve, collecting training data becomes increasingly valuable. At the same time, increased focus on data collection may introduce compounding privacy concerns. Accessibility projects in particular may put vulnerable populations at risk, as disability status is sensitive, and collecting data from small populations limits anonymity. To help address privacy concerns while maintaining algorithmic performance on machine learning tasks, we propose privacy-enhancing distortions of training datasets. We explore this idea through the lens of sign language video collection, which is crucial for advancing sign language recognition and translation. We present a web study exploring signers’ concerns in contributing to video corpora and their attitudes about using filters, and a computer vision experiment exploring sign language recognition performance with filtered data. Our results suggest that privacy concerns may exist in contributing to sign language corpora, that filters (especially expressive avatars and blurred faces) may impact willingness to participate, and that training on more filtered data may boost recognition accuracy in some cases.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 22nd International ACM SIGACCESS Conference on Computers and Accessibility

自引率

0.00%

发文量