{"title":"使用卷积神经网络进行英语口语流利度评分","authors":"Hoon Chung, Y. Lee, Sung Joo Lee, J. Park","doi":"10.1109/ICSDA.2017.8384444","DOIUrl":null,"url":null,"abstract":"In this paper, we propose a spoken English fluency scoring using Convolutional Neural Network (CNN) to learn feature extraction and scoring model jointly from raw time-domain signal input. In general, automatic spoken English fluency scoring is composed feature extraction and a scoring model. Feature extraction is used to compute the feature vectors that are assumed to represent spoken English fluency, and the scoring model predicts the fluency score of an input feature vector. Although the conventional approach works well, there are issues regarding feature extraction and model parameter optimization. First, because the fluency features are computed based on human knowledge, some crucial representations that are included in a raw data corpus can be missed. Second, each parameter of the model is optimized separately, which can lead to suboptimal performance. To address these issues, we propose a CNN-based approach to extract fluency features directly from a raw data corpus without hand-crafted engineering and optimizes all model parameters jointly. The effectiveness of the proposed approach is evaluated using Korean-Spoken English Corpus.","PeriodicalId":255147,"journal":{"name":"2017 20th Conference of the Oriental Chapter of the International Coordinating Committee on Speech Databases and Speech I/O Systems and Assessment (O-COCOSDA)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Spoken English fluency scoring using convolutional neural networks\",\"authors\":\"Hoon Chung, Y. Lee, Sung Joo Lee, J. Park\",\"doi\":\"10.1109/ICSDA.2017.8384444\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we propose a spoken English fluency scoring using Convolutional Neural Network (CNN) to learn feature extraction and scoring model jointly from raw time-domain signal input. In general, automatic spoken English fluency scoring is composed feature extraction and a scoring model. Feature extraction is used to compute the feature vectors that are assumed to represent spoken English fluency, and the scoring model predicts the fluency score of an input feature vector. Although the conventional approach works well, there are issues regarding feature extraction and model parameter optimization. First, because the fluency features are computed based on human knowledge, some crucial representations that are included in a raw data corpus can be missed. Second, each parameter of the model is optimized separately, which can lead to suboptimal performance. To address these issues, we propose a CNN-based approach to extract fluency features directly from a raw data corpus without hand-crafted engineering and optimizes all model parameters jointly. The effectiveness of the proposed approach is evaluated using Korean-Spoken English Corpus.\",\"PeriodicalId\":255147,\"journal\":{\"name\":\"2017 20th Conference of the Oriental Chapter of the International Coordinating Committee on Speech Databases and Speech I/O Systems and Assessment (O-COCOSDA)\",\"volume\":\"12 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 20th Conference of the Oriental Chapter of the International Coordinating Committee on Speech Databases and Speech I/O Systems and Assessment (O-COCOSDA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICSDA.2017.8384444\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 20th Conference of the Oriental Chapter of the International Coordinating Committee on Speech Databases and Speech I/O Systems and Assessment (O-COCOSDA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSDA.2017.8384444","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Spoken English fluency scoring using convolutional neural networks
In this paper, we propose a spoken English fluency scoring using Convolutional Neural Network (CNN) to learn feature extraction and scoring model jointly from raw time-domain signal input. In general, automatic spoken English fluency scoring is composed feature extraction and a scoring model. Feature extraction is used to compute the feature vectors that are assumed to represent spoken English fluency, and the scoring model predicts the fluency score of an input feature vector. Although the conventional approach works well, there are issues regarding feature extraction and model parameter optimization. First, because the fluency features are computed based on human knowledge, some crucial representations that are included in a raw data corpus can be missed. Second, each parameter of the model is optimized separately, which can lead to suboptimal performance. To address these issues, we propose a CNN-based approach to extract fluency features directly from a raw data corpus without hand-crafted engineering and optimizes all model parameters jointly. The effectiveness of the proposed approach is evaluated using Korean-Spoken English Corpus.