Improving Accuracy in Thai Sign and Symptom Classification using Context-Free Grammar Approach

Proceedings of the 2018 10th International Conference on Computer and Automation Engineering Pub Date : 2018-02-24 DOI:10.1145/3192975.3193011

Jarunee Duangsuwan, Pawin Saeku

{"title":"Improving Accuracy in Thai Sign and Symptom Classification using Context-Free Grammar Approach","authors":"Jarunee Duangsuwan, Pawin Saeku","doi":"10.1145/3192975.3193011","DOIUrl":null,"url":null,"abstract":"We examine our proposed word separator for Thai script called two-level tokenization (2LT) by applying this tokenizer to medical Thai script including chief complaints, ICD-10 descriptions. We verify the results of tokenization through the machine learning-based classification. The experimental result shows that the proposed tokenizer works well for Classification and Regression Trees (CART) method with an 85% of precision and 71% of recall. While the F1 score is also 76%. However these values are not high enough to make the proposed tokenizer worthwhile. This paper presents how to improve the results of Thai sign and symptom classification. To increase the precision, recall, and F1 score we adapt context-free grammar (CFG) concept to eliminate the unnecessary some conjunction words which are a common word from the consideration of experimental results. Consequently the precision, recall, and F1 score change from 85%, 71%, and 76% to 93%, 86%, and 89% respectively, this shows that applying CFG can be exploited to yield a higher accuracy than the previous experimental results without applying the CFG concept.","PeriodicalId":128533,"journal":{"name":"Proceedings of the 2018 10th International Conference on Computer and Automation Engineering","volume":"102 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2018 10th International Conference on Computer and Automation Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3192975.3193011","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

We examine our proposed word separator for Thai script called two-level tokenization (2LT) by applying this tokenizer to medical Thai script including chief complaints, ICD-10 descriptions. We verify the results of tokenization through the machine learning-based classification. The experimental result shows that the proposed tokenizer works well for Classification and Regression Trees (CART) method with an 85% of precision and 71% of recall. While the F1 score is also 76%. However these values are not high enough to make the proposed tokenizer worthwhile. This paper presents how to improve the results of Thai sign and symptom classification. To increase the precision, recall, and F1 score we adapt context-free grammar (CFG) concept to eliminate the unnecessary some conjunction words which are a common word from the consideration of experimental results. Consequently the precision, recall, and F1 score change from 85%, 71%, and 76% to 93%, 86%, and 89% respectively, this shows that applying CFG can be exploited to yield a higher accuracy than the previous experimental results without applying the CFG concept.

查看原文本刊更多论文

使用上下文无关语法方法提高泰语符号和症状分类的准确性

我们通过将该标记器应用于包括主诉、ICD-10描述在内的医疗泰语脚本，检查了我们提出的两级标记化(2LT)的泰语脚本词分隔符。我们通过基于机器学习的分类来验证标记化的结果。实验结果表明，所提出的标记器可以很好地用于分类回归树(CART)方法，准确率为85%，召回率为71%。而F1的得分也是76%。然而，这些值还不足以使所建议的标记器值得使用。本文介绍了如何改进泰证分型结果。为了提高准确率、查全率和F1分数，我们从实验结果出发，采用了上下文无关语法(CFG)的概念去除了一些不必要的常用词连接词。因此，准确率、召回率和F1分数分别从85%、71%和76%提高到93%、86%和89%，这表明应用CFG可以比不应用CFG概念的实验结果产生更高的准确率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 2018 10th International Conference on Computer and Automation Engineering

自引率

0.00%

发文量