Rhenaldy, Ladysa Stella Karenza, Ivan Halim Parmonangan, F. Kurniadi
{"title":"利用大脑活动预测文本到语音的质量","authors":"Rhenaldy, Ladysa Stella Karenza, Ivan Halim Parmonangan, F. Kurniadi","doi":"10.1109/IoTaIS56727.2022.9975857","DOIUrl":null,"url":null,"abstract":"The perceived audio quality is one of the key factors that may determine a text-to-speech system’s success in the market. Therefore, it is important to conduct audio quality evaluation before releasing such system into the market. Evaluating the synthesized audio quality is usually done either subjectively or objectively with their own advantages and disadvantages. Subjective methods usually require a large amount of time and resources, while objective methods lack human influence factors, which are crucial for deriving the subjective perception of quality. These human influence factors are manifested inside an individual’s brain in forms such as electroencephalograph (EEG). Thus, in this study, we performed audio quality prediction using EEG data. Since the data used in this study is small, we also compared the prediction result of the augmented and the non-augmented data. Our result shows that certain method yield significantly better prediction with augmented training data.","PeriodicalId":138894,"journal":{"name":"2022 IEEE International Conference on Internet of Things and Intelligence Systems (IoTaIS)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Predicting Text-To-Speech Quality using Brain Activity\",\"authors\":\"Rhenaldy, Ladysa Stella Karenza, Ivan Halim Parmonangan, F. Kurniadi\",\"doi\":\"10.1109/IoTaIS56727.2022.9975857\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The perceived audio quality is one of the key factors that may determine a text-to-speech system’s success in the market. Therefore, it is important to conduct audio quality evaluation before releasing such system into the market. Evaluating the synthesized audio quality is usually done either subjectively or objectively with their own advantages and disadvantages. Subjective methods usually require a large amount of time and resources, while objective methods lack human influence factors, which are crucial for deriving the subjective perception of quality. These human influence factors are manifested inside an individual’s brain in forms such as electroencephalograph (EEG). Thus, in this study, we performed audio quality prediction using EEG data. Since the data used in this study is small, we also compared the prediction result of the augmented and the non-augmented data. Our result shows that certain method yield significantly better prediction with augmented training data.\",\"PeriodicalId\":138894,\"journal\":{\"name\":\"2022 IEEE International Conference on Internet of Things and Intelligence Systems (IoTaIS)\",\"volume\":\"7 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-11-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE International Conference on Internet of Things and Intelligence Systems (IoTaIS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IoTaIS56727.2022.9975857\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Conference on Internet of Things and Intelligence Systems (IoTaIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IoTaIS56727.2022.9975857","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Predicting Text-To-Speech Quality using Brain Activity
The perceived audio quality is one of the key factors that may determine a text-to-speech system’s success in the market. Therefore, it is important to conduct audio quality evaluation before releasing such system into the market. Evaluating the synthesized audio quality is usually done either subjectively or objectively with their own advantages and disadvantages. Subjective methods usually require a large amount of time and resources, while objective methods lack human influence factors, which are crucial for deriving the subjective perception of quality. These human influence factors are manifested inside an individual’s brain in forms such as electroencephalograph (EEG). Thus, in this study, we performed audio quality prediction using EEG data. Since the data used in this study is small, we also compared the prediction result of the augmented and the non-augmented data. Our result shows that certain method yield significantly better prediction with augmented training data.