{"title":"使用句子转换器与外国人韩语相似度检查的开放语音转文本引擎的性能比较","authors":"A. B. Wahyutama, Mintae Hwang","doi":"10.1109/IAICT55358.2022.9887500","DOIUrl":null,"url":null,"abstract":"This paper contains the performance comparison of four Speech-to-Text (STT) engines which are Google STT, Naver Clova CSR, IBM Watson, and Microsoft Azure STT when transcribing foreigners speaking the Korean Language. The respondents are recording themselves speaking a predetermined sentence to be compiled together and then feeding it into the STT engine one by one to generate the transcribed text. The performance is evaluated using the Sentence Transformer Python framework that checks the similarity percentage between the original sentence to each of the transcribed texts and then finds the average result. The engine’s performance is categorized into four different categories which are sentence, nationality, age, and gender. The performance comparison results can be used to help determine the optimal STT engine for the Korean Language Spoken by Foreigner to develop STT-based or AI-based applications.","PeriodicalId":154027,"journal":{"name":"2022 IEEE International Conference on Industry 4.0, Artificial Intelligence, and Communications Technology (IAICT)","volume":"77 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Performance Comparison of Open Speech-To-Text Engines using Sentence Transformer Similarity Check with the Korean Language by Foreigners\",\"authors\":\"A. B. Wahyutama, Mintae Hwang\",\"doi\":\"10.1109/IAICT55358.2022.9887500\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper contains the performance comparison of four Speech-to-Text (STT) engines which are Google STT, Naver Clova CSR, IBM Watson, and Microsoft Azure STT when transcribing foreigners speaking the Korean Language. The respondents are recording themselves speaking a predetermined sentence to be compiled together and then feeding it into the STT engine one by one to generate the transcribed text. The performance is evaluated using the Sentence Transformer Python framework that checks the similarity percentage between the original sentence to each of the transcribed texts and then finds the average result. The engine’s performance is categorized into four different categories which are sentence, nationality, age, and gender. The performance comparison results can be used to help determine the optimal STT engine for the Korean Language Spoken by Foreigner to develop STT-based or AI-based applications.\",\"PeriodicalId\":154027,\"journal\":{\"name\":\"2022 IEEE International Conference on Industry 4.0, Artificial Intelligence, and Communications Technology (IAICT)\",\"volume\":\"77 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-07-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE International Conference on Industry 4.0, Artificial Intelligence, and Communications Technology (IAICT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IAICT55358.2022.9887500\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Conference on Industry 4.0, Artificial Intelligence, and Communications Technology (IAICT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IAICT55358.2022.9887500","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Performance Comparison of Open Speech-To-Text Engines using Sentence Transformer Similarity Check with the Korean Language by Foreigners
This paper contains the performance comparison of four Speech-to-Text (STT) engines which are Google STT, Naver Clova CSR, IBM Watson, and Microsoft Azure STT when transcribing foreigners speaking the Korean Language. The respondents are recording themselves speaking a predetermined sentence to be compiled together and then feeding it into the STT engine one by one to generate the transcribed text. The performance is evaluated using the Sentence Transformer Python framework that checks the similarity percentage between the original sentence to each of the transcribed texts and then finds the average result. The engine’s performance is categorized into four different categories which are sentence, nationality, age, and gender. The performance comparison results can be used to help determine the optimal STT engine for the Korean Language Spoken by Foreigner to develop STT-based or AI-based applications.