Sanket Shah, Pratik M. Joshi, Sebastin Santy, Sunayana Sitaram
{"title":"CoSSAT: Code-Switched Speech Annotation Tool","authors":"Sanket Shah, Pratik M. Joshi, Sebastin Santy, Sunayana Sitaram","doi":"10.18653/v1/D19-5907","DOIUrl":null,"url":null,"abstract":"Code-switching refers to the alternation of two or more languages in a conversation or utterance and is common in multilingual communities across the world. Building code-switched speech and natural language processing systems are challenging due to the lack of annotated speech and text data. We present a speech annotation interface CoSSAT, which helps annotators transcribe code-switched speech faster, more easily and more accurately than a traditional interface, by displaying candidate words from monolingual speech recognizers. We conduct a user study on the transcription of Hindi-English code-switched speech with 10 annotators and describe quantitative and qualitative results.","PeriodicalId":129206,"journal":{"name":"Proceedings of the First Workshop on Aggregating and Analysing Crowdsourced Annotations for NLP","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the First Workshop on Aggregating and Analysing Crowdsourced Annotations for NLP","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18653/v1/D19-5907","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Code-switching refers to the alternation of two or more languages in a conversation or utterance and is common in multilingual communities across the world. Building code-switched speech and natural language processing systems are challenging due to the lack of annotated speech and text data. We present a speech annotation interface CoSSAT, which helps annotators transcribe code-switched speech faster, more easily and more accurately than a traditional interface, by displaying candidate words from monolingual speech recognizers. We conduct a user study on the transcription of Hindi-English code-switched speech with 10 annotators and describe quantitative and qualitative results.