Improving Accuracy in Thai Sign and Symptom Classification using Context-Free Grammar Approach

Jarunee Duangsuwan, Pawin Saeku
{"title":"Improving Accuracy in Thai Sign and Symptom Classification using Context-Free Grammar Approach","authors":"Jarunee Duangsuwan, Pawin Saeku","doi":"10.1145/3192975.3193011","DOIUrl":null,"url":null,"abstract":"We examine our proposed word separator for Thai script called two-level tokenization (2LT) by applying this tokenizer to medical Thai script including chief complaints, ICD-10 descriptions. We verify the results of tokenization through the machine learning-based classification. The experimental result shows that the proposed tokenizer works well for Classification and Regression Trees (CART) method with an 85% of precision and 71% of recall. While the F1 score is also 76%. However these values are not high enough to make the proposed tokenizer worthwhile. This paper presents how to improve the results of Thai sign and symptom classification. To increase the precision, recall, and F1 score we adapt context-free grammar (CFG) concept to eliminate the unnecessary some conjunction words which are a common word from the consideration of experimental results. Consequently the precision, recall, and F1 score change from 85%, 71%, and 76% to 93%, 86%, and 89% respectively, this shows that applying CFG can be exploited to yield a higher accuracy than the previous experimental results without applying the CFG concept.","PeriodicalId":128533,"journal":{"name":"Proceedings of the 2018 10th International Conference on Computer and Automation Engineering","volume":"102 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2018 10th International Conference on Computer and Automation Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3192975.3193011","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

We examine our proposed word separator for Thai script called two-level tokenization (2LT) by applying this tokenizer to medical Thai script including chief complaints, ICD-10 descriptions. We verify the results of tokenization through the machine learning-based classification. The experimental result shows that the proposed tokenizer works well for Classification and Regression Trees (CART) method with an 85% of precision and 71% of recall. While the F1 score is also 76%. However these values are not high enough to make the proposed tokenizer worthwhile. This paper presents how to improve the results of Thai sign and symptom classification. To increase the precision, recall, and F1 score we adapt context-free grammar (CFG) concept to eliminate the unnecessary some conjunction words which are a common word from the consideration of experimental results. Consequently the precision, recall, and F1 score change from 85%, 71%, and 76% to 93%, 86%, and 89% respectively, this shows that applying CFG can be exploited to yield a higher accuracy than the previous experimental results without applying the CFG concept.
使用上下文无关语法方法提高泰语符号和症状分类的准确性
我们通过将该标记器应用于包括主诉、ICD-10描述在内的医疗泰语脚本,检查了我们提出的两级标记化(2LT)的泰语脚本词分隔符。我们通过基于机器学习的分类来验证标记化的结果。实验结果表明,所提出的标记器可以很好地用于分类回归树(CART)方法,准确率为85%,召回率为71%。而F1的得分也是76%。然而,这些值还不足以使所建议的标记器值得使用。本文介绍了如何改进泰证分型结果。为了提高准确率、查全率和F1分数,我们从实验结果出发,采用了上下文无关语法(CFG)的概念去除了一些不必要的常用词连接词。因此,准确率、召回率和F1分数分别从85%、71%和76%提高到93%、86%和89%,这表明应用CFG可以比不应用CFG概念的实验结果产生更高的准确率。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信