探索手语数据集的收集:隐私、参与和模型性能

Danielle Bragg, Oscar Koller, Naomi K. Caselli, W. Thies
{"title":"探索手语数据集的收集:隐私、参与和模型性能","authors":"Danielle Bragg, Oscar Koller, Naomi K. Caselli, W. Thies","doi":"10.1145/3373625.3417024","DOIUrl":null,"url":null,"abstract":"As machine learning algorithms continue to improve, collecting training data becomes increasingly valuable. At the same time, increased focus on data collection may introduce compounding privacy concerns. Accessibility projects in particular may put vulnerable populations at risk, as disability status is sensitive, and collecting data from small populations limits anonymity. To help address privacy concerns while maintaining algorithmic performance on machine learning tasks, we propose privacy-enhancing distortions of training datasets. We explore this idea through the lens of sign language video collection, which is crucial for advancing sign language recognition and translation. We present a web study exploring signers’ concerns in contributing to video corpora and their attitudes about using filters, and a computer vision experiment exploring sign language recognition performance with filtered data. Our results suggest that privacy concerns may exist in contributing to sign language corpora, that filters (especially expressive avatars and blurred faces) may impact willingness to participate, and that training on more filtered data may boost recognition accuracy in some cases.","PeriodicalId":433618,"journal":{"name":"Proceedings of the 22nd International ACM SIGACCESS Conference on Computers and Accessibility","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"26","resultStr":"{\"title\":\"Exploring Collection of Sign Language Datasets: Privacy, Participation, and Model Performance\",\"authors\":\"Danielle Bragg, Oscar Koller, Naomi K. Caselli, W. Thies\",\"doi\":\"10.1145/3373625.3417024\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"As machine learning algorithms continue to improve, collecting training data becomes increasingly valuable. At the same time, increased focus on data collection may introduce compounding privacy concerns. Accessibility projects in particular may put vulnerable populations at risk, as disability status is sensitive, and collecting data from small populations limits anonymity. To help address privacy concerns while maintaining algorithmic performance on machine learning tasks, we propose privacy-enhancing distortions of training datasets. We explore this idea through the lens of sign language video collection, which is crucial for advancing sign language recognition and translation. We present a web study exploring signers’ concerns in contributing to video corpora and their attitudes about using filters, and a computer vision experiment exploring sign language recognition performance with filtered data. Our results suggest that privacy concerns may exist in contributing to sign language corpora, that filters (especially expressive avatars and blurred faces) may impact willingness to participate, and that training on more filtered data may boost recognition accuracy in some cases.\",\"PeriodicalId\":433618,\"journal\":{\"name\":\"Proceedings of the 22nd International ACM SIGACCESS Conference on Computers and Accessibility\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-10-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"26\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 22nd International ACM SIGACCESS Conference on Computers and Accessibility\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3373625.3417024\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 22nd International ACM SIGACCESS Conference on Computers and Accessibility","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3373625.3417024","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 26

摘要

随着机器学习算法的不断改进,收集训练数据变得越来越有价值。与此同时,对数据收集的日益关注可能会带来更多的隐私问题。无障碍项目尤其可能使弱势群体处于危险之中,因为残疾状况是敏感的,而从小群体收集数据限制了匿名性。为了帮助解决隐私问题,同时保持机器学习任务的算法性能,我们提出了训练数据集的隐私增强扭曲。我们通过手语视频收集的镜头来探索这一想法,这对促进手语识别和翻译至关重要。我们提出了一项网络研究,探讨了签字人对视频语料库的贡献和他们对使用过滤器的态度,以及一个计算机视觉实验,探讨了过滤数据的手语识别性能。我们的研究结果表明,在手语语料库中可能存在隐私问题,过滤器(特别是富有表现力的头像和模糊的面孔)可能会影响参与的意愿,在某些情况下,对更多过滤数据的训练可能会提高识别的准确性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Exploring Collection of Sign Language Datasets: Privacy, Participation, and Model Performance
As machine learning algorithms continue to improve, collecting training data becomes increasingly valuable. At the same time, increased focus on data collection may introduce compounding privacy concerns. Accessibility projects in particular may put vulnerable populations at risk, as disability status is sensitive, and collecting data from small populations limits anonymity. To help address privacy concerns while maintaining algorithmic performance on machine learning tasks, we propose privacy-enhancing distortions of training datasets. We explore this idea through the lens of sign language video collection, which is crucial for advancing sign language recognition and translation. We present a web study exploring signers’ concerns in contributing to video corpora and their attitudes about using filters, and a computer vision experiment exploring sign language recognition performance with filtered data. Our results suggest that privacy concerns may exist in contributing to sign language corpora, that filters (especially expressive avatars and blurred faces) may impact willingness to participate, and that training on more filtered data may boost recognition accuracy in some cases.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信