{"title":"基于语言模型的词级文本生成","authors":"P. Netisopakul, Usanisa Taoto","doi":"10.1109/ICSET53708.2021.9612541","DOIUrl":null,"url":null,"abstract":"This research constructs and evaluates text generation models created from three different language models, n-gram, a Continuous Bag of Words (CBOW) and gated recurrent unit (GRU), using two training corpora, Berkeley Restaurant (Berkeley) and Alice's Adventures in Wonderland (Alice), and evaluated using two evaluation metrics; perplexity measure and count of grammar errors. The mean perplexities of all three models are comparable for each corpus, the N-gram model produces slightly lower values of perplexity. As for the number of grammatical errors in the Alice corpus, all three models show a slightly higher number of errors than the original corpus. In the Berkeley corpus, the n-gram model had the lowest number of errors, even lower than the original corpus, but the CBOW model had the highest number of errors and the GRU model had the highest number of errors.","PeriodicalId":433197,"journal":{"name":"2021 IEEE 11th International Conference on System Engineering and Technology (ICSET)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Word-level Text Generation from Language Models\",\"authors\":\"P. Netisopakul, Usanisa Taoto\",\"doi\":\"10.1109/ICSET53708.2021.9612541\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This research constructs and evaluates text generation models created from three different language models, n-gram, a Continuous Bag of Words (CBOW) and gated recurrent unit (GRU), using two training corpora, Berkeley Restaurant (Berkeley) and Alice's Adventures in Wonderland (Alice), and evaluated using two evaluation metrics; perplexity measure and count of grammar errors. The mean perplexities of all three models are comparable for each corpus, the N-gram model produces slightly lower values of perplexity. As for the number of grammatical errors in the Alice corpus, all three models show a slightly higher number of errors than the original corpus. In the Berkeley corpus, the n-gram model had the lowest number of errors, even lower than the original corpus, but the CBOW model had the highest number of errors and the GRU model had the highest number of errors.\",\"PeriodicalId\":433197,\"journal\":{\"name\":\"2021 IEEE 11th International Conference on System Engineering and Technology (ICSET)\",\"volume\":\"18 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-11-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE 11th International Conference on System Engineering and Technology (ICSET)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICSET53708.2021.9612541\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 11th International Conference on System Engineering and Technology (ICSET)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSET53708.2021.9612541","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
This research constructs and evaluates text generation models created from three different language models, n-gram, a Continuous Bag of Words (CBOW) and gated recurrent unit (GRU), using two training corpora, Berkeley Restaurant (Berkeley) and Alice's Adventures in Wonderland (Alice), and evaluated using two evaluation metrics; perplexity measure and count of grammar errors. The mean perplexities of all three models are comparable for each corpus, the N-gram model produces slightly lower values of perplexity. As for the number of grammatical errors in the Alice corpus, all three models show a slightly higher number of errors than the original corpus. In the Berkeley corpus, the n-gram model had the lowest number of errors, even lower than the original corpus, but the CBOW model had the highest number of errors and the GRU model had the highest number of errors.