W. Limpanadusadee, P. Punyabukkana, A. Suchato, Onintra Poobrasert
{"title":"Text corpus for natural language story-telling sentence generation: A design and evaluation","authors":"W. Limpanadusadee, P. Punyabukkana, A. Suchato, Onintra Poobrasert","doi":"10.1109/JCSSE.2014.6841846","DOIUrl":null,"url":null,"abstract":"Automatic generation of narrative sentences from unordered word sets is desirable in Augmentative and Alternative Communication (AAC) systems for children with certain learning disabilities (LD). Regardless of the complexity of the Natural Language Processing deployed in sentence generation procedures, the qualities of language models always affect the generation results. This work compared sentence generation accuracies obtained from a multi-tier N-gram-based procedure trained on BEST2010, a large publicly available text corpus, and a smaller but more specifically designed corpus in the task of Thai simple sentence generation. The latter, a new corpus called TELL-S, was created based on an analysis of the contents belonging to textbooks used in grade 1 and grade 2 for Thai language subjects according to the compulsory curriculum for Thai schools. The original procedure was also modified to incorporate additional constraints based on a story-telling guideline developed for LD children. Evaluated upon test sets of 195 sentences, each of which was composed of 3-6 words with a specific Part-Of-Speech combination, TELL-S was shown to provide better generalization and yielded higher accuracies than BEST2010 in all cases with unbiased word sets. The sentence generation accuracies were 100% and 70% for 3-word and 4-word sentences, respectively. The average accuracy was at 58.8% when longer sentences were also included.","PeriodicalId":331610,"journal":{"name":"2014 11th International Joint Conference on Computer Science and Software Engineering (JCSSE)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 11th International Joint Conference on Computer Science and Software Engineering (JCSSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/JCSSE.2014.6841846","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Automatic generation of narrative sentences from unordered word sets is desirable in Augmentative and Alternative Communication (AAC) systems for children with certain learning disabilities (LD). Regardless of the complexity of the Natural Language Processing deployed in sentence generation procedures, the qualities of language models always affect the generation results. This work compared sentence generation accuracies obtained from a multi-tier N-gram-based procedure trained on BEST2010, a large publicly available text corpus, and a smaller but more specifically designed corpus in the task of Thai simple sentence generation. The latter, a new corpus called TELL-S, was created based on an analysis of the contents belonging to textbooks used in grade 1 and grade 2 for Thai language subjects according to the compulsory curriculum for Thai schools. The original procedure was also modified to incorporate additional constraints based on a story-telling guideline developed for LD children. Evaluated upon test sets of 195 sentences, each of which was composed of 3-6 words with a specific Part-Of-Speech combination, TELL-S was shown to provide better generalization and yielded higher accuracies than BEST2010 in all cases with unbiased word sets. The sentence generation accuracies were 100% and 70% for 3-word and 4-word sentences, respectively. The average accuracy was at 58.8% when longer sentences were also included.