模糊和不完整:自然语言处理揭示了甲状腺超声报告中有问题的报告风格。

IF 1.8 4区医学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS

Methods of Information in Medicine Pub Date : 2022-05-01 Epub Date: 2022-01-06 DOI:10.1055/s-0041-1740493

Priya H Dedhia, Kallie Chen, Yiqiang Song, Eric LaRose, Joseph R Imbus, Peggy L Peissig, Eneida A Mendonca, David F Schneider

{"title":"模糊和不完整:自然语言处理揭示了甲状腺超声报告中有问题的报告风格。","authors":"Priya H Dedhia, Kallie Chen, Yiqiang Song, Eric LaRose, Joseph R Imbus, Peggy L Peissig, Eneida A Mendonca, David F Schneider","doi":"10.1055/s-0041-1740493","DOIUrl":null,"url":null,"abstract":"Objective: Natural language processing (NLP) systems convert unstructured text into analyzable data. Here, we describe the performance measures of NLP to capture granular details on nodules from thyroid ultrasound (US) reports and reveal critical issues with reporting language.Methods: We iteratively developed NLP tools using clinical Text Analysis and Knowledge Extraction System (cTAKES) and thyroid US reports from 2007 to 2013. We incorporated nine nodule features for NLP extraction. Next, we evaluated the precision, recall, and accuracy of our NLP tools using a separate set of US reports from an academic medical center (A) and a regional health care system (B) during the same period. Two physicians manually annotated each test-set report. A third physician then adjudicated discrepancies. The adjudicated \"gold standard\" was then used to evaluate NLP performance on the test-set.Results: A total of 243 thyroid US reports contained 6,405 data elements. Inter-annotator agreement for all elements was 91.3%. Compared with the gold standard, overall recall of the NLP tool was 90%. NLP recall for thyroid lobe or isthmus characteristics was: laterality 96% and size 95%. NLP accuracy for nodule characteristics was: laterality 92%, size 92%, calcifications 76%, vascularity 65%, echogenicity 62%, contents 76%, and borders 40%. NLP recall for presence or absence of lymphadenopathy was 61%. Reporting style accounted for 18% errors. For example, the word \"heterogeneous\" interchangeably referred to nodule contents or echogenicity. While nodule dimensions and laterality were often described, US reports only described contents, echogenicity, vascularity, calcifications, borders, and lymphadenopathy, 46, 41, 17, 15, 9, and 41% of the time, respectively. Most nodule characteristics were equally likely to be described at hospital A compared with hospital B.Conclusions: NLP can automate extraction of critical information from thyroid US reports. However, ambiguous and incomplete reporting language hinders performance of NLP systems regardless of institutional setting. Standardized or synoptic thyroid US reports could improve NLP performance.","PeriodicalId":49822,"journal":{"name":"Methods of Information in Medicine","volume":" ","pages":"11-18"},"PeriodicalIF":1.8000,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Ambiguous and Incomplete: Natural Language Processing Reveals Problematic Reporting Styles in Thyroid Ultrasound Reports.\",\"authors\":\"Priya H Dedhia, Kallie Chen, Yiqiang Song, Eric LaRose, Joseph R Imbus, Peggy L Peissig, Eneida A Mendonca, David F Schneider\",\"doi\":\"10.1055/s-0041-1740493\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Objective: Natural language processing (NLP) systems convert unstructured text into analyzable data. Here, we describe the performance measures of NLP to capture granular details on nodules from thyroid ultrasound (US) reports and reveal critical issues with reporting language.Methods: We iteratively developed NLP tools using clinical Text Analysis and Knowledge Extraction System (cTAKES) and thyroid US reports from 2007 to 2013. We incorporated nine nodule features for NLP extraction. Next, we evaluated the precision, recall, and accuracy of our NLP tools using a separate set of US reports from an academic medical center (A) and a regional health care system (B) during the same period. Two physicians manually annotated each test-set report. A third physician then adjudicated discrepancies. The adjudicated \\\"gold standard\\\" was then used to evaluate NLP performance on the test-set.Results: A total of 243 thyroid US reports contained 6,405 data elements. Inter-annotator agreement for all elements was 91.3%. Compared with the gold standard, overall recall of the NLP tool was 90%. NLP recall for thyroid lobe or isthmus characteristics was: laterality 96% and size 95%. NLP accuracy for nodule characteristics was: laterality 92%, size 92%, calcifications 76%, vascularity 65%, echogenicity 62%, contents 76%, and borders 40%. NLP recall for presence or absence of lymphadenopathy was 61%. Reporting style accounted for 18% errors. For example, the word \\\"heterogeneous\\\" interchangeably referred to nodule contents or echogenicity. While nodule dimensions and laterality were often described, US reports only described contents, echogenicity, vascularity, calcifications, borders, and lymphadenopathy, 46, 41, 17, 15, 9, and 41% of the time, respectively. Most nodule characteristics were equally likely to be described at hospital A compared with hospital B.Conclusions: NLP can automate extraction of critical information from thyroid US reports. However, ambiguous and incomplete reporting language hinders performance of NLP systems regardless of institutional setting. Standardized or synoptic thyroid US reports could improve NLP performance.\",\"PeriodicalId\":49822,\"journal\":{\"name\":\"Methods of Information in Medicine\",\"volume\":\" \",\"pages\":\"11-18\"},\"PeriodicalIF\":1.8000,\"publicationDate\":\"2022-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Methods of Information in Medicine\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1055/s-0041-1740493\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2022/1/6 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Methods of Information in Medicine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1055/s-0041-1740493","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2022/1/6 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 2

摘要

目的:自然语言处理(NLP)系统将非结构化文本转换为可分析的数据。在这里，我们描述了NLP的性能指标，以捕获甲状腺超声(US)报告中结节的颗粒细节，并揭示报告语言的关键问题。方法:从2007年到2013年，我们利用临床文本分析和知识提取系统(cTAKES)和甲状腺US报告迭代开发NLP工具。我们将9个结节特征纳入NLP提取。接下来，我们使用同一时期来自学术医疗中心(a)和地区医疗保健系统(B)的一组单独的美国报告来评估我们的NLP工具的精度、召回率和准确性。两名医生手动注释每个测试集报告。第三位医生随后判定了差异。然后使用判定的“金标准”来评估测试集上的NLP表现。结果:243份甲状腺US报告包含6405个数据元素。所有元素的注释者间一致性为91.3%。与金标准相比，NLP工具的总召回率为90%。NLP对甲状腺叶或峡部特征的回忆率为:侧边96%，大小95%。结节特征的NLP准确率为:侧边性92%，大小92%，钙化76%，血管性65%，回声性62%，内容物76%，边界40%。淋巴结病存在与否的NLP召回率为61%。报告风格占18%的错误。例如，“异质”一词可互换地指结节内容物或回声性。虽然结节的大小和侧边性经常被描述，但美国的报告只描述了内容物、回声性、血管性、钙化、边界和淋巴结病变，分别为46%、41%、17%、15%、9%和41%。与b医院相比，大多数结节特征在A医院描述的可能性相同。结论:NLP可以自动提取甲状腺US报告中的关键信息。然而，无论机构设置如何，模棱两可和不完整的报告语言都会阻碍NLP系统的性能。标准化的或综合的甲状腺US报告可以提高NLP的表现。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Ambiguous and Incomplete: Natural Language Processing Reveals Problematic Reporting Styles in Thyroid Ultrasound Reports.

Objective: Natural language processing (NLP) systems convert unstructured text into analyzable data. Here, we describe the performance measures of NLP to capture granular details on nodules from thyroid ultrasound (US) reports and reveal critical issues with reporting language.

Methods: We iteratively developed NLP tools using clinical Text Analysis and Knowledge Extraction System (cTAKES) and thyroid US reports from 2007 to 2013. We incorporated nine nodule features for NLP extraction. Next, we evaluated the precision, recall, and accuracy of our NLP tools using a separate set of US reports from an academic medical center (A) and a regional health care system (B) during the same period. Two physicians manually annotated each test-set report. A third physician then adjudicated discrepancies. The adjudicated "gold standard" was then used to evaluate NLP performance on the test-set.

Results: A total of 243 thyroid US reports contained 6,405 data elements. Inter-annotator agreement for all elements was 91.3%. Compared with the gold standard, overall recall of the NLP tool was 90%. NLP recall for thyroid lobe or isthmus characteristics was: laterality 96% and size 95%. NLP accuracy for nodule characteristics was: laterality 92%, size 92%, calcifications 76%, vascularity 65%, echogenicity 62%, contents 76%, and borders 40%. NLP recall for presence or absence of lymphadenopathy was 61%. Reporting style accounted for 18% errors. For example, the word "heterogeneous" interchangeably referred to nodule contents or echogenicity. While nodule dimensions and laterality were often described, US reports only described contents, echogenicity, vascularity, calcifications, borders, and lymphadenopathy, 46, 41, 17, 15, 9, and 41% of the time, respectively. Most nodule characteristics were equally likely to be described at hospital A compared with hospital B.

Conclusions: NLP can automate extraction of critical information from thyroid US reports. However, ambiguous and incomplete reporting language hinders performance of NLP systems regardless of institutional setting. Standardized or synoptic thyroid US reports could improve NLP performance.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Methods of Information in Medicine 医学-计算机：信息系统

CiteScore

3.70

自引率

11.80%

发文量

审稿时长

6-12 weeks

期刊介绍： Good medicine and good healthcare demand good information. Since the journal''s founding in 1962, Methods of Information in Medicine has stressed the methodology and scientific fundamentals of organizing, representing and analyzing data, information and knowledge in biomedicine and health care. Covering publications in the fields of biomedical and health informatics, medical biometry, and epidemiology, the journal publishes original papers, reviews, reports, opinion papers, editorials, and letters to the editor. From time to time, the journal publishes articles on particular focus themes as part of a journal''s issue.