Identifying relevant medical reports from an assorted report collection using the multinomial naïve Bayes classifier and the UMLS.

Indian journal of medical informatics Pub Date : 2007-01-01

Vijayaraghavan Bashyam, Craig Morioka, Suzie El-Saden, Alex At Bui, Ricky K Taira

{"title":"Identifying relevant medical reports from an assorted report collection using the multinomial naïve Bayes classifier and the UMLS.","authors":"Vijayaraghavan Bashyam, Craig Morioka, Suzie El-Saden, Alex At Bui, Ricky K Taira","doi":"","DOIUrl":null,"url":null,"abstract":"<p><p>A patient's electronic medical record contains a large number of medical reports and imaging studies. Identifying the relevant information in order to make a diagnosis can be a time consuming process that can easily overwhelm the physician. Summarizing key clinical information for physicians evaluating brain tumor patients is an ongoing research project at our institution. Notably, identifying documents associated with brain tumor is an important step in collecting the data relevant for summarization. Current electronic medical record systems lack meta-information which is useful in structuring heterogeneous medical information. Thus, identifying reports relevant to a particular task cannot be easily retrieved from a structured database. This necessitates content analysis methods for identifying relevant reports. This paper reports a system designed to identify brain-tumor related reports from an assorted collection of clinical reports. A large collection of clinical reports was obtained from our university hospital database. A domain expert manually annotated the documents classifying them into `related' and ùnrelated' categories. A multinomial naïve Bayes classifier was trained to use word level and UMLS concept level features from the reports to identify brain tumor related reports from the assorted collection. The system was trained on 90% and tested on 10% of the manually annotated corpus. A ten-fold cross validation is reported. Performance of the system was best (f-score 94.7) when the system was trained using both word level and UMLS concept level features. Using UMLS concepts improved classifier accuracy.</p>","PeriodicalId":91274,"journal":{"name":"Indian journal of medical informatics","volume":"2 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2007-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9592058/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Indian journal of medical informatics","FirstCategoryId":"1085","ListUrlMain":"","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

A patient's electronic medical record contains a large number of medical reports and imaging studies. Identifying the relevant information in order to make a diagnosis can be a time consuming process that can easily overwhelm the physician. Summarizing key clinical information for physicians evaluating brain tumor patients is an ongoing research project at our institution. Notably, identifying documents associated with brain tumor is an important step in collecting the data relevant for summarization. Current electronic medical record systems lack meta-information which is useful in structuring heterogeneous medical information. Thus, identifying reports relevant to a particular task cannot be easily retrieved from a structured database. This necessitates content analysis methods for identifying relevant reports. This paper reports a system designed to identify brain-tumor related reports from an assorted collection of clinical reports. A large collection of clinical reports was obtained from our university hospital database. A domain expert manually annotated the documents classifying them into `related' and ùnrelated' categories. A multinomial naïve Bayes classifier was trained to use word level and UMLS concept level features from the reports to identify brain tumor related reports from the assorted collection. The system was trained on 90% and tested on 10% of the manually annotated corpus. A ten-fold cross validation is reported. Performance of the system was best (f-score 94.7) when the system was trained using both word level and UMLS concept level features. Using UMLS concepts improved classifier accuracy.

Abstract Image

本刊更多论文

使用多项式天真贝叶斯分类器和 UMLS 从各种报告集中识别相关医疗报告。

病人的电子病历包含大量医疗报告和成像研究。识别相关信息以便做出诊断是一个耗时的过程，很容易让医生不知所措。为评估脑肿瘤患者的医生总结关键临床信息是我们机构正在进行的一个研究项目。值得注意的是，识别与脑肿瘤相关的文件是收集总结相关数据的重要一步。目前的电子病历系统缺乏元信息，而元信息对于异构医疗信息的结构化非常有用。因此，从结构化数据库中检索与特定任务相关的报告并不容易。这就需要采用内容分析方法来识别相关报告。本文报告了一个旨在从各种临床报告中识别脑肿瘤相关报告的系统。我们从大学医院数据库中获取了大量临床报告。一位领域专家对文档进行了人工标注，将其分为 "相关 "和 "不相关 "两类。我们训练了一个多项式天真贝叶斯分类器，利用报告中的词级和 UMLS 概念级特征，从各类报告中识别出与脑肿瘤相关的报告。该系统在 90% 的人工注释语料库中进行了训练，并在 10% 的人工注释语料库中进行了测试。报告进行了十倍交叉验证。当系统同时使用单词级和 UMLS 概念级特征进行训练时，其性能最佳（f-score 94.7）。使用 UMLS 概念提高了分类器的准确性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Indian journal of medical informatics

自引率

0.00%

发文量