{"title":"PaSCoNT - 用于自动语音识别的泰语中北部平行语音库","authors":"Supawat Taerungruang , Phimphaka Taninpong , Vataya Chunwijitra , Sumonmas Thatphithakkul , Sawit Kasuriya , Viroj Inthanon , Pawat Paksaranuwat , Salinee Thumronglaohapun , Nawapon Nakharutai , Papangkorn Inkeaw , Jakramate Bootkrajang","doi":"10.1016/j.csl.2024.101692","DOIUrl":null,"url":null,"abstract":"<div><p>This paper proposed a Parallel Speech Corpus of Northern-central Thai (PaSCoNT). The purpose of this research is not only to understand the different linguistic characteristics between Northern and Central Thai, but also to utilize this corpus for automatic speech recognition. The corpus is composed of speech data from dialogues of daily life among northern Thai people. We designed 2,000 Northern Thai sentences covering all phonemes, in collaboration with linguists specialized in the Northern Thai dialect. The samples in this study are 200 Northern Thai dialect speakers who had been living in Chiang Mai province for more than 18 years. The speech was recorded in both open and closed environments. In the speech recording, each speaker must read 100 pairs of Northern-Central Thai sentences to ensure that the speech data comes from the same speaker. In total, 100 h of speech were recorded: 50 h of Northern Thai and 50 h of Central Thai. Overall, PaSCoNT consists of 907,832 words and 6,279 vocabulary items. Statistical analysis of the PaSCoNT corpus revealed that 49.64 % of words in the lexicon belongs to the Northern Thai dialect, 50.36 % from the Central Thai dialect, and 1,621 vocabulary items appeared in both Northern and Central Thai. Statistical analysis is used to examine the difference in speech tempo, i.e. time per phoneme (TTP), syllable per minute (SPM), between Northern and Central Thai. The results revealed that there were statistically significant differences speech tempo between Central and Northern Thai. The TTP speaking and articulation rate of Central Thai is lower than Northern Thai whereas SPM speaking and articulation rate of Central Thai is higher than Northern Thai. The results also showed that the ASR model training using Northern Thai speech corpus provides the lower WER% when testing using Northern Thai testing speech data and provides the higher WER% when testing using Central Thai Testing speech data and vice versa. However, the ASR model training using the PaSCoNT speech corpus provides the lower WER% for both Northern Thai and Central Thai testing speech data.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":null,"pages":null},"PeriodicalIF":3.1000,"publicationDate":"2024-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0885230824000755/pdfft?md5=f97afe2aa357037c83c6473c50174543&pid=1-s2.0-S0885230824000755-main.pdf","citationCount":"0","resultStr":"{\"title\":\"PaSCoNT - Parallel Speech Corpus of Northern-central Thai for automatic speech recognition\",\"authors\":\"Supawat Taerungruang , Phimphaka Taninpong , Vataya Chunwijitra , Sumonmas Thatphithakkul , Sawit Kasuriya , Viroj Inthanon , Pawat Paksaranuwat , Salinee Thumronglaohapun , Nawapon Nakharutai , Papangkorn Inkeaw , Jakramate Bootkrajang\",\"doi\":\"10.1016/j.csl.2024.101692\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>This paper proposed a Parallel Speech Corpus of Northern-central Thai (PaSCoNT). The purpose of this research is not only to understand the different linguistic characteristics between Northern and Central Thai, but also to utilize this corpus for automatic speech recognition. The corpus is composed of speech data from dialogues of daily life among northern Thai people. We designed 2,000 Northern Thai sentences covering all phonemes, in collaboration with linguists specialized in the Northern Thai dialect. The samples in this study are 200 Northern Thai dialect speakers who had been living in Chiang Mai province for more than 18 years. The speech was recorded in both open and closed environments. In the speech recording, each speaker must read 100 pairs of Northern-Central Thai sentences to ensure that the speech data comes from the same speaker. In total, 100 h of speech were recorded: 50 h of Northern Thai and 50 h of Central Thai. Overall, PaSCoNT consists of 907,832 words and 6,279 vocabulary items. Statistical analysis of the PaSCoNT corpus revealed that 49.64 % of words in the lexicon belongs to the Northern Thai dialect, 50.36 % from the Central Thai dialect, and 1,621 vocabulary items appeared in both Northern and Central Thai. Statistical analysis is used to examine the difference in speech tempo, i.e. time per phoneme (TTP), syllable per minute (SPM), between Northern and Central Thai. The results revealed that there were statistically significant differences speech tempo between Central and Northern Thai. The TTP speaking and articulation rate of Central Thai is lower than Northern Thai whereas SPM speaking and articulation rate of Central Thai is higher than Northern Thai. The results also showed that the ASR model training using Northern Thai speech corpus provides the lower WER% when testing using Northern Thai testing speech data and provides the higher WER% when testing using Central Thai Testing speech data and vice versa. However, the ASR model training using the PaSCoNT speech corpus provides the lower WER% for both Northern Thai and Central Thai testing speech data.</p></div>\",\"PeriodicalId\":50638,\"journal\":{\"name\":\"Computer Speech and Language\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":3.1000,\"publicationDate\":\"2024-07-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S0885230824000755/pdfft?md5=f97afe2aa357037c83c6473c50174543&pid=1-s2.0-S0885230824000755-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computer Speech and Language\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0885230824000755\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Speech and Language","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0885230824000755","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
PaSCoNT - Parallel Speech Corpus of Northern-central Thai for automatic speech recognition
This paper proposed a Parallel Speech Corpus of Northern-central Thai (PaSCoNT). The purpose of this research is not only to understand the different linguistic characteristics between Northern and Central Thai, but also to utilize this corpus for automatic speech recognition. The corpus is composed of speech data from dialogues of daily life among northern Thai people. We designed 2,000 Northern Thai sentences covering all phonemes, in collaboration with linguists specialized in the Northern Thai dialect. The samples in this study are 200 Northern Thai dialect speakers who had been living in Chiang Mai province for more than 18 years. The speech was recorded in both open and closed environments. In the speech recording, each speaker must read 100 pairs of Northern-Central Thai sentences to ensure that the speech data comes from the same speaker. In total, 100 h of speech were recorded: 50 h of Northern Thai and 50 h of Central Thai. Overall, PaSCoNT consists of 907,832 words and 6,279 vocabulary items. Statistical analysis of the PaSCoNT corpus revealed that 49.64 % of words in the lexicon belongs to the Northern Thai dialect, 50.36 % from the Central Thai dialect, and 1,621 vocabulary items appeared in both Northern and Central Thai. Statistical analysis is used to examine the difference in speech tempo, i.e. time per phoneme (TTP), syllable per minute (SPM), between Northern and Central Thai. The results revealed that there were statistically significant differences speech tempo between Central and Northern Thai. The TTP speaking and articulation rate of Central Thai is lower than Northern Thai whereas SPM speaking and articulation rate of Central Thai is higher than Northern Thai. The results also showed that the ASR model training using Northern Thai speech corpus provides the lower WER% when testing using Northern Thai testing speech data and provides the higher WER% when testing using Central Thai Testing speech data and vice versa. However, the ASR model training using the PaSCoNT speech corpus provides the lower WER% for both Northern Thai and Central Thai testing speech data.
期刊介绍:
Computer Speech & Language publishes reports of original research related to the recognition, understanding, production, coding and mining of speech and language.
The speech and language sciences have a long history, but it is only relatively recently that large-scale implementation of and experimentation with complex models of speech and language processing has become feasible. Such research is often carried out somewhat separately by practitioners of artificial intelligence, computer science, electronic engineering, information retrieval, linguistics, phonetics, or psychology.