{"title":"Human-Centered Design of Voice Communications: Gender Aspects","authors":"J. Holub, Yann Kowalczuk","doi":"10.54941/ahfe1002926","DOIUrl":null,"url":null,"abstract":"Perceiving the transmitted speech is a task that puts certain amount of cognitive\n load on the human brain. The degree of this load depends on several factors, e.g., the\n loudness of the perceived speech, the type and intensity of background noise, the\n quality and accent of the speech, familiarity with the topic of the message, etc. This\n load also varies between the native and non-native language (of the listener). Different\n levels of such load are manifested in longer duration workloads (e.g., during a work\n shift) by different levels of overall fatigue, which affects the decrease in the\n worker's action or decision error rate when performing other concurrent tasks (the\n so-called parallel-task paradigm). For technologies used in speech transmission or\n synthesis, e.g., in telecommunications, radio communications, and machine to human\n communications, the above implies a strong need to optimize the coding of human (or\n synthetic) voice to minimize listening effort during communication. Listening effort\n (LE) can be assessed by subjective tests following, e.g., ITU-T P.800 Recommendation,\n along with listening quality (LQ) as specified in P.800. A natural (but nowhere\n explicitely mentioned) requirement is that male and female voices are transferred with\n similar LQ and LE parameters; in other words, the transmission technology, including\n coding algorithms, frequency filters, or sampling rates, should not privilege one gender\n over the other to maintain similar working conditions and opportunities for all.The\n subjective test laboratory has performed gender analysis for all subjective test\n projects since 2018 to see how (mis)balanced the transmission quality between male and\n female speakers is. The identified misbalance can affect many professionals that deploy\n distant voice communication in their daily duties – think of female airport approach\n control dispatchers or other professionals (policewomen) who are principally handicapped\n by technological aspects of their job - worse voice transmission quality means higher\n listening effort is needed and may lead to consequent (subconscious) discomfort of their\n communication partners, or even intelligibility issues. Of course, this fact is not\n surprising for narrow-band or even old analog AM transmissions (as still used in\n AIRCOM). It can only be used as an argument to upgrade communication means to a suitable\n digital format. Unfortunately, some contemporary wide-band or even full-band digital\n communications also show statistically significant differences between quality of\n transferred male and female voices. The detailed results will be presented, including\n interesting systematic language dependencies (English, German, Mandarin).In the\n conclusions, suggestions for future codec designs considering the human-centric\n gender-balanced requirements are proposed. These include the minimum frequency response\n of the future coders, granularity of the perceptual frequency scaling, etc. Also,\n suggestions for gender neutrality of original (studio quality) recordings used to\n prepare the speech samples for the subjective tests are included.","PeriodicalId":383834,"journal":{"name":"Human Interaction and Emerging Technologies (IHIET-AI 2023): Artificial\n Intelligence and Future Applications","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Human Interaction and Emerging Technologies (IHIET-AI 2023): Artificial\n Intelligence and Future Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.54941/ahfe1002926","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Perceiving the transmitted speech is a task that puts certain amount of cognitive
load on the human brain. The degree of this load depends on several factors, e.g., the
loudness of the perceived speech, the type and intensity of background noise, the
quality and accent of the speech, familiarity with the topic of the message, etc. This
load also varies between the native and non-native language (of the listener). Different
levels of such load are manifested in longer duration workloads (e.g., during a work
shift) by different levels of overall fatigue, which affects the decrease in the
worker's action or decision error rate when performing other concurrent tasks (the
so-called parallel-task paradigm). For technologies used in speech transmission or
synthesis, e.g., in telecommunications, radio communications, and machine to human
communications, the above implies a strong need to optimize the coding of human (or
synthetic) voice to minimize listening effort during communication. Listening effort
(LE) can be assessed by subjective tests following, e.g., ITU-T P.800 Recommendation,
along with listening quality (LQ) as specified in P.800. A natural (but nowhere
explicitely mentioned) requirement is that male and female voices are transferred with
similar LQ and LE parameters; in other words, the transmission technology, including
coding algorithms, frequency filters, or sampling rates, should not privilege one gender
over the other to maintain similar working conditions and opportunities for all.The
subjective test laboratory has performed gender analysis for all subjective test
projects since 2018 to see how (mis)balanced the transmission quality between male and
female speakers is. The identified misbalance can affect many professionals that deploy
distant voice communication in their daily duties – think of female airport approach
control dispatchers or other professionals (policewomen) who are principally handicapped
by technological aspects of their job - worse voice transmission quality means higher
listening effort is needed and may lead to consequent (subconscious) discomfort of their
communication partners, or even intelligibility issues. Of course, this fact is not
surprising for narrow-band or even old analog AM transmissions (as still used in
AIRCOM). It can only be used as an argument to upgrade communication means to a suitable
digital format. Unfortunately, some contemporary wide-band or even full-band digital
communications also show statistically significant differences between quality of
transferred male and female voices. The detailed results will be presented, including
interesting systematic language dependencies (English, German, Mandarin).In the
conclusions, suggestions for future codec designs considering the human-centric
gender-balanced requirements are proposed. These include the minimum frequency response
of the future coders, granularity of the perceptual frequency scaling, etc. Also,
suggestions for gender neutrality of original (studio quality) recordings used to
prepare the speech samples for the subjective tests are included.