{"title":"Improving Accuracy in Thai Sign and Symptom Classification using Context-Free Grammar Approach","authors":"Jarunee Duangsuwan, Pawin Saeku","doi":"10.1145/3192975.3193011","DOIUrl":null,"url":null,"abstract":"We examine our proposed word separator for Thai script called two-level tokenization (2LT) by applying this tokenizer to medical Thai script including chief complaints, ICD-10 descriptions. We verify the results of tokenization through the machine learning-based classification. The experimental result shows that the proposed tokenizer works well for Classification and Regression Trees (CART) method with an 85% of precision and 71% of recall. While the F1 score is also 76%. However these values are not high enough to make the proposed tokenizer worthwhile. This paper presents how to improve the results of Thai sign and symptom classification. To increase the precision, recall, and F1 score we adapt context-free grammar (CFG) concept to eliminate the unnecessary some conjunction words which are a common word from the consideration of experimental results. Consequently the precision, recall, and F1 score change from 85%, 71%, and 76% to 93%, 86%, and 89% respectively, this shows that applying CFG can be exploited to yield a higher accuracy than the previous experimental results without applying the CFG concept.","PeriodicalId":128533,"journal":{"name":"Proceedings of the 2018 10th International Conference on Computer and Automation Engineering","volume":"102 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2018 10th International Conference on Computer and Automation Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3192975.3193011","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
We examine our proposed word separator for Thai script called two-level tokenization (2LT) by applying this tokenizer to medical Thai script including chief complaints, ICD-10 descriptions. We verify the results of tokenization through the machine learning-based classification. The experimental result shows that the proposed tokenizer works well for Classification and Regression Trees (CART) method with an 85% of precision and 71% of recall. While the F1 score is also 76%. However these values are not high enough to make the proposed tokenizer worthwhile. This paper presents how to improve the results of Thai sign and symptom classification. To increase the precision, recall, and F1 score we adapt context-free grammar (CFG) concept to eliminate the unnecessary some conjunction words which are a common word from the consideration of experimental results. Consequently the precision, recall, and F1 score change from 85%, 71%, and 76% to 93%, 86%, and 89% respectively, this shows that applying CFG can be exploited to yield a higher accuracy than the previous experimental results without applying the CFG concept.