将基于专家的医疗实体识别器与机器学习系统相结合:方法和案例研究。

Biomedical informatics insights Pub Date : 2013-08-01 eCollection Date: 2013-01-01 DOI:10.4137/BII.S11770
Pierre Zweigenbaum, Thomas Lavergne, Natalia Grabar, Thierry Hamon, Sophie Rosset, Cyril Grouin
{"title":"将基于专家的医疗实体识别器与机器学习系统相结合:方法和案例研究。","authors":"Pierre Zweigenbaum,&nbsp;Thomas Lavergne,&nbsp;Natalia Grabar,&nbsp;Thierry Hamon,&nbsp;Sophie Rosset,&nbsp;Cyril Grouin","doi":"10.4137/BII.S11770","DOIUrl":null,"url":null,"abstract":"<p><p>Medical entity recognition is currently generally performed by data-driven methods based on supervised machine learning. Expert-based systems, where linguistic and domain expertise are directly provided to the system are often combined with data-driven systems. We present here a case study where an existing expert-based medical entity recognition system, Ogmios, is combined with a data-driven system, Caramba, based on a linear-chain Conditional Random Field (CRF) classifier. Our case study specifically highlights the risk of overfitting incurred by an expert-based system. We observe that it prevents the combination of the 2 systems from obtaining improvements in precision, recall, or F-measure, and analyze the underlying mechanisms through a post-hoc feature-level analysis. Wrapping the expert-based system alone as attributes input to a CRF classifier does boost its F-measure from 0.603 to 0.710, bringing it on par with the data-driven system. The generalization of this method remains to be further investigated. </p>","PeriodicalId":88397,"journal":{"name":"Biomedical informatics insights","volume":"6 Suppl 1","pages":"51-62"},"PeriodicalIF":0.0000,"publicationDate":"2013-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.4137/BII.S11770","citationCount":"8","resultStr":"{\"title\":\"Combining an expert-based medical entity recognizer to a machine-learning system: methods and a case study.\",\"authors\":\"Pierre Zweigenbaum,&nbsp;Thomas Lavergne,&nbsp;Natalia Grabar,&nbsp;Thierry Hamon,&nbsp;Sophie Rosset,&nbsp;Cyril Grouin\",\"doi\":\"10.4137/BII.S11770\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Medical entity recognition is currently generally performed by data-driven methods based on supervised machine learning. Expert-based systems, where linguistic and domain expertise are directly provided to the system are often combined with data-driven systems. We present here a case study where an existing expert-based medical entity recognition system, Ogmios, is combined with a data-driven system, Caramba, based on a linear-chain Conditional Random Field (CRF) classifier. Our case study specifically highlights the risk of overfitting incurred by an expert-based system. We observe that it prevents the combination of the 2 systems from obtaining improvements in precision, recall, or F-measure, and analyze the underlying mechanisms through a post-hoc feature-level analysis. Wrapping the expert-based system alone as attributes input to a CRF classifier does boost its F-measure from 0.603 to 0.710, bringing it on par with the data-driven system. The generalization of this method remains to be further investigated. </p>\",\"PeriodicalId\":88397,\"journal\":{\"name\":\"Biomedical informatics insights\",\"volume\":\"6 Suppl 1\",\"pages\":\"51-62\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.4137/BII.S11770\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Biomedical informatics insights\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.4137/BII.S11770\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2013/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biomedical informatics insights","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4137/BII.S11770","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2013/1/1 0:00:00","PubModel":"eCollection","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8

摘要

医疗实体识别目前一般是通过基于监督机器学习的数据驱动方法来实现的。直接向系统提供语言和领域专业知识的基于专家的系统通常与数据驱动的系统相结合。我们在这里提出了一个案例研究,其中现有的基于专家的医疗实体识别系统Ogmios与基于线性链条件随机场(CRF)分类器的数据驱动系统Caramba相结合。我们的案例研究特别强调了基于专家的系统所带来的过拟合风险。我们观察到,它阻止了两个系统的组合在精度、召回率或f测量方面的提高,并通过事后特征级分析分析了潜在的机制。将基于专家的系统单独包装为CRF分类器的属性输入确实将其f度量从0.603提高到0.710,使其与数据驱动的系统相当。该方法的推广还有待进一步研究。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Combining an expert-based medical entity recognizer to a machine-learning system: methods and a case study.

Medical entity recognition is currently generally performed by data-driven methods based on supervised machine learning. Expert-based systems, where linguistic and domain expertise are directly provided to the system are often combined with data-driven systems. We present here a case study where an existing expert-based medical entity recognition system, Ogmios, is combined with a data-driven system, Caramba, based on a linear-chain Conditional Random Field (CRF) classifier. Our case study specifically highlights the risk of overfitting incurred by an expert-based system. We observe that it prevents the combination of the 2 systems from obtaining improvements in precision, recall, or F-measure, and analyze the underlying mechanisms through a post-hoc feature-level analysis. Wrapping the expert-based system alone as attributes input to a CRF classifier does boost its F-measure from 0.603 to 0.710, bringing it on par with the data-driven system. The generalization of this method remains to be further investigated.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信