{"title":"法语广播语料库中的性别代表性及其对ASR绩效的影响","authors":"Mahault Garnerin, Solange Rossato, L. Besacier","doi":"10.1145/3347449.3357480","DOIUrl":null,"url":null,"abstract":"This paper analyzes the gender representation in four major corpora of French broadcast. These corpora being widely used within the speech processing community, they are a primary material for training automatic speech recognition (ASR) systems. As gender bias has been highlighted in numerous natural language processing (NLP) applications, we study the impact of the gender imbalance in TV and radio broadcast on the performance of an ASR system. This analysis shows that women are under-represented in our data in terms of speakers and speech turns. We introduce the notion of speaker role to refine our analysis and find that women are even fewer within the Anchor category corresponding to prominent speakers. The disparity of available data for both gender causes performance to decrease on women. However, this global trend seems to be counterbalanced when sufficient amount of data per speaker is available.","PeriodicalId":276496,"journal":{"name":"Proceedings of the 1st International Workshop on AI for Smart TV Content Production, Access and Delivery","volume":"31 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"30","resultStr":"{\"title\":\"Gender Representation in French Broadcast Corpora and Its Impact on ASR Performance\",\"authors\":\"Mahault Garnerin, Solange Rossato, L. Besacier\",\"doi\":\"10.1145/3347449.3357480\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper analyzes the gender representation in four major corpora of French broadcast. These corpora being widely used within the speech processing community, they are a primary material for training automatic speech recognition (ASR) systems. As gender bias has been highlighted in numerous natural language processing (NLP) applications, we study the impact of the gender imbalance in TV and radio broadcast on the performance of an ASR system. This analysis shows that women are under-represented in our data in terms of speakers and speech turns. We introduce the notion of speaker role to refine our analysis and find that women are even fewer within the Anchor category corresponding to prominent speakers. The disparity of available data for both gender causes performance to decrease on women. However, this global trend seems to be counterbalanced when sufficient amount of data per speaker is available.\",\"PeriodicalId\":276496,\"journal\":{\"name\":\"Proceedings of the 1st International Workshop on AI for Smart TV Content Production, Access and Delivery\",\"volume\":\"31 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-08-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"30\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 1st International Workshop on AI for Smart TV Content Production, Access and Delivery\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3347449.3357480\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 1st International Workshop on AI for Smart TV Content Production, Access and Delivery","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3347449.3357480","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Gender Representation in French Broadcast Corpora and Its Impact on ASR Performance
This paper analyzes the gender representation in four major corpora of French broadcast. These corpora being widely used within the speech processing community, they are a primary material for training automatic speech recognition (ASR) systems. As gender bias has been highlighted in numerous natural language processing (NLP) applications, we study the impact of the gender imbalance in TV and radio broadcast on the performance of an ASR system. This analysis shows that women are under-represented in our data in terms of speakers and speech turns. We introduce the notion of speaker role to refine our analysis and find that women are even fewer within the Anchor category corresponding to prominent speakers. The disparity of available data for both gender causes performance to decrease on women. However, this global trend seems to be counterbalanced when sufficient amount of data per speaker is available.