Multi-combined Features Text Mining of TCM Medical Cases with CRF

Qi-yu Jiang, Hongyi Li, Jiafen Liang, Qing-Xiang Wang, Xiao-Mu Luo, Hui-Ling Liu
{"title":"Multi-combined Features Text Mining of TCM Medical Cases with CRF","authors":"Qi-yu Jiang, Hongyi Li, Jiafen Liang, Qing-Xiang Wang, Xiao-Mu Luo, Hui-Ling Liu","doi":"10.1109/ITME.2016.0146","DOIUrl":null,"url":null,"abstract":"TCM medical cases in records are free text with much valuable data and clinical terms, how to recognize and extract these clinical terms automatically is a valuable work. TCM medical records obtained from Guangdong Provincial Hospital of Chinese Medicine are segmented to single word and labeled with five labeling features(words in sentence, grammatical property of words, words in clinical dictionary, set phrases acting on neighbor context, and set phrases acting on far distance.), and divided into training sets and testing sets. Training sets are also handled with outputted labeling (labeling of symptoms or signs, TCM diagnosis, TCM syndrome type, Chinese medicines (drug), and Names of TCM prescriptions.). In order to evaluate abilities of labeling features on improving clinical terms recognition with CRF, three indicators (recognition Precision (P), recognition Recall (R) and F-score (F)) are defined, and three comparisons are given: comparisons of individual labeling features, comparisons of combined labeling features, and comparisons of combined features in different diseases. The results show that, \"grammatical property of words\" is the best labeling features in all individual labeling features. Multi-combined features have higher scores than individual labeling features on improving clinical terms recognition. The combined mode of \"grammatical property of words\", \"words in sentence\", and \"words in clinical dictionary\" may be the most suitable labeling features. Multi-combined labeling features can improve term recognition with CRF model for text mining in TCM medical cases.","PeriodicalId":184905,"journal":{"name":"2016 8th International Conference on Information Technology in Medicine and Education (ITME)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 8th International Conference on Information Technology in Medicine and Education (ITME)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ITME.2016.0146","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

TCM medical cases in records are free text with much valuable data and clinical terms, how to recognize and extract these clinical terms automatically is a valuable work. TCM medical records obtained from Guangdong Provincial Hospital of Chinese Medicine are segmented to single word and labeled with five labeling features(words in sentence, grammatical property of words, words in clinical dictionary, set phrases acting on neighbor context, and set phrases acting on far distance.), and divided into training sets and testing sets. Training sets are also handled with outputted labeling (labeling of symptoms or signs, TCM diagnosis, TCM syndrome type, Chinese medicines (drug), and Names of TCM prescriptions.). In order to evaluate abilities of labeling features on improving clinical terms recognition with CRF, three indicators (recognition Precision (P), recognition Recall (R) and F-score (F)) are defined, and three comparisons are given: comparisons of individual labeling features, comparisons of combined labeling features, and comparisons of combined features in different diseases. The results show that, "grammatical property of words" is the best labeling features in all individual labeling features. Multi-combined features have higher scores than individual labeling features on improving clinical terms recognition. The combined mode of "grammatical property of words", "words in sentence", and "words in clinical dictionary" may be the most suitable labeling features. Multi-combined labeling features can improve term recognition with CRF model for text mining in TCM medical cases.
基于CRF的中医案例多组合特征文本挖掘
中医病历是自由文本,包含大量有价值的数据和临床术语,如何自动识别和提取这些临床术语是一项有价值的工作。将广东省中医院的中医病案分割为单个单词,并使用五个标记特征(句子中的单词、单词的语法性质、临床词典中的单词、作用于邻近上下文的集合短语和作用于远距离的集合短语)进行标记,并将其分为训练集和测试集。训练集还处理输出标注(症状或体征标注、中医诊断、中医证型标注、中药(药物)标注、中药方剂名称标注)。为了评价标记特征对CRF提高临床术语识别的能力,定义了三个指标(识别精度(recognition Precision, P)、识别召回率(recognition Recall, R)和F-score (F)),并给出了三个比较:单个标记特征的比较、组合标记特征的比较和不同疾病的组合特征的比较。结果表明,在所有标注特征中,“词的语法性质”是最好的标注特征。多组合特征在提高临床术语识别方面比单个标记特征得分更高。“词的语法性质”、“句中的词”和“临床词典中的词”的组合模式可能是最合适的标注特征。多组合标注特征可以提高中医案例文本挖掘中CRF模型的术语识别能力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信