Identifying relevant medical reports from an assorted report collection using the multinomial naïve Bayes classifier and the UMLS.

Vijayaraghavan Bashyam, Craig Morioka, Suzie El-Saden, Alex At Bui, Ricky K Taira
{"title":"Identifying relevant medical reports from an assorted report collection using the multinomial naïve Bayes classifier and the UMLS.","authors":"Vijayaraghavan Bashyam, Craig Morioka, Suzie El-Saden, Alex At Bui, Ricky K Taira","doi":"","DOIUrl":null,"url":null,"abstract":"<p><p>A patient's electronic medical record contains a large number of medical reports and imaging studies. Identifying the relevant information in order to make a diagnosis can be a time consuming process that can easily overwhelm the physician. Summarizing key clinical information for physicians evaluating brain tumor patients is an ongoing research project at our institution. Notably, identifying documents associated with brain tumor is an important step in collecting the data relevant for summarization. Current electronic medical record systems lack meta-information which is useful in structuring heterogeneous medical information. Thus, identifying reports relevant to a particular task cannot be easily retrieved from a structured database. This necessitates content analysis methods for identifying relevant reports. This paper reports a system designed to identify brain-tumor related reports from an assorted collection of clinical reports. A large collection of clinical reports was obtained from our university hospital database. A domain expert manually annotated the documents classifying them into `related' and ùnrelated' categories. A multinomial naïve Bayes classifier was trained to use word level and UMLS concept level features from the reports to identify brain tumor related reports from the assorted collection. The system was trained on 90% and tested on 10% of the manually annotated corpus. A ten-fold cross validation is reported. Performance of the system was best (f-score 94.7) when the system was trained using both word level and UMLS concept level features. Using UMLS concepts improved classifier accuracy.</p>","PeriodicalId":91274,"journal":{"name":"Indian journal of medical informatics","volume":"2 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2007-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9592058/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Indian journal of medical informatics","FirstCategoryId":"1085","ListUrlMain":"","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

A patient's electronic medical record contains a large number of medical reports and imaging studies. Identifying the relevant information in order to make a diagnosis can be a time consuming process that can easily overwhelm the physician. Summarizing key clinical information for physicians evaluating brain tumor patients is an ongoing research project at our institution. Notably, identifying documents associated with brain tumor is an important step in collecting the data relevant for summarization. Current electronic medical record systems lack meta-information which is useful in structuring heterogeneous medical information. Thus, identifying reports relevant to a particular task cannot be easily retrieved from a structured database. This necessitates content analysis methods for identifying relevant reports. This paper reports a system designed to identify brain-tumor related reports from an assorted collection of clinical reports. A large collection of clinical reports was obtained from our university hospital database. A domain expert manually annotated the documents classifying them into `related' and ùnrelated' categories. A multinomial naïve Bayes classifier was trained to use word level and UMLS concept level features from the reports to identify brain tumor related reports from the assorted collection. The system was trained on 90% and tested on 10% of the manually annotated corpus. A ten-fold cross validation is reported. Performance of the system was best (f-score 94.7) when the system was trained using both word level and UMLS concept level features. Using UMLS concepts improved classifier accuracy.

使用多项式天真贝叶斯分类器和 UMLS 从各种报告集中识别相关医疗报告。
病人的电子病历包含大量医疗报告和成像研究。识别相关信息以便做出诊断是一个耗时的过程,很容易让医生不知所措。为评估脑肿瘤患者的医生总结关键临床信息是我们机构正在进行的一个研究项目。值得注意的是,识别与脑肿瘤相关的文件是收集总结相关数据的重要一步。目前的电子病历系统缺乏元信息,而元信息对于异构医疗信息的结构化非常有用。因此,从结构化数据库中检索与特定任务相关的报告并不容易。这就需要采用内容分析方法来识别相关报告。本文报告了一个旨在从各种临床报告中识别脑肿瘤相关报告的系统。我们从大学医院数据库中获取了大量临床报告。一位领域专家对文档进行了人工标注,将其分为 "相关 "和 "不相关 "两类。我们训练了一个多项式天真贝叶斯分类器,利用报告中的词级和 UMLS 概念级特征,从各类报告中识别出与脑肿瘤相关的报告。该系统在 90% 的人工注释语料库中进行了训练,并在 10% 的人工注释语料库中进行了测试。报告进行了十倍交叉验证。当系统同时使用单词级和 UMLS 概念级特征进行训练时,其性能最佳(f-score 94.7)。使用 UMLS 概念提高了分类器的准确性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信