文本挖掘大脑成像报告。

IF 1.6 3区工程技术 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Journal of Biomedical Semantics Pub Date : 2019-11-12 DOI:10.1186/s13326-019-0211-7

Beatrice Alex, Claire Grover, Richard Tobin, Cathie Sudlow, Grant Mair, William Whiteley

{"title":"文本挖掘大脑成像报告。","authors":"Beatrice Alex, Claire Grover, Richard Tobin, Cathie Sudlow, Grant Mair, William Whiteley","doi":"10.1186/s13326-019-0211-7","DOIUrl":null,"url":null,"abstract":"Background: With the improvements to text mining technology and the availability of large unstructured Electronic Healthcare Records (EHR) datasets, it is now possible to extract structured information from raw text contained within EHR at reasonably high accuracy. We describe a text mining system for classifying radiologists' reports of CT and MRI brain scans, assigning labels indicating occurrence and type of stroke, as well as other observations. Our system, the Edinburgh Information Extraction for Radiology reports (EdIE-R) system, which we describe here, was developed and tested on a collection of radiology reports.The work reported in this paper is based on 1168 radiology reports from the Edinburgh Stroke Study (ESS), a hospital-based register of stroke and transient ischaemic attack patients. We manually created annotations for this data in parallel with developing the rule-based EdIE-R system to identify phenotype information related to stroke in radiology reports. This process was iterative and domain expert feedback was considered at each iteration to adapt and tune the EdIE-R text mining system which identifies entities, negation and relations between entities in each report and determines report-level labels (phenotypes).Results: The inter-annotator agreement (IAA) for all types of annotations is high at 96.96 for entities, 96.46 for negation, 95.84 for relations and 94.02 for labels. The equivalent system scores on the blind test set are equally high at 95.49 for entities, 94.41 for negation, 98.27 for relations and 96.39 for labels for the first annotator and 96.86, 96.01, 96.53 and 92.61, respectively for the second annotator.Conclusion: Automated reading of such EHR data at such high levels of accuracies opens up avenues for population health monitoring and audit, and can provide a resource for epidemiological studies. We are in the process of validating EdIE-R in separate larger cohorts in NHS England and Scotland. The manually annotated ESS corpus will be available for research purposes on application.","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":"10 Suppl 1","pages":"23"},"PeriodicalIF":1.6000,"publicationDate":"2019-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/s13326-019-0211-7","citationCount":"30","resultStr":"{\"title\":\"Text mining brain imaging reports.\",\"authors\":\"Beatrice Alex, Claire Grover, Richard Tobin, Cathie Sudlow, Grant Mair, William Whiteley\",\"doi\":\"10.1186/s13326-019-0211-7\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Background: With the improvements to text mining technology and the availability of large unstructured Electronic Healthcare Records (EHR) datasets, it is now possible to extract structured information from raw text contained within EHR at reasonably high accuracy. We describe a text mining system for classifying radiologists' reports of CT and MRI brain scans, assigning labels indicating occurrence and type of stroke, as well as other observations. Our system, the Edinburgh Information Extraction for Radiology reports (EdIE-R) system, which we describe here, was developed and tested on a collection of radiology reports.The work reported in this paper is based on 1168 radiology reports from the Edinburgh Stroke Study (ESS), a hospital-based register of stroke and transient ischaemic attack patients. We manually created annotations for this data in parallel with developing the rule-based EdIE-R system to identify phenotype information related to stroke in radiology reports. This process was iterative and domain expert feedback was considered at each iteration to adapt and tune the EdIE-R text mining system which identifies entities, negation and relations between entities in each report and determines report-level labels (phenotypes).Results: The inter-annotator agreement (IAA) for all types of annotations is high at 96.96 for entities, 96.46 for negation, 95.84 for relations and 94.02 for labels. The equivalent system scores on the blind test set are equally high at 95.49 for entities, 94.41 for negation, 98.27 for relations and 96.39 for labels for the first annotator and 96.86, 96.01, 96.53 and 92.61, respectively for the second annotator.Conclusion: Automated reading of such EHR data at such high levels of accuracies opens up avenues for population health monitoring and audit, and can provide a resource for epidemiological studies. We are in the process of validating EdIE-R in separate larger cohorts in NHS England and Scotland. The manually annotated ESS corpus will be available for research purposes on application.\",\"PeriodicalId\":15055,\"journal\":{\"name\":\"Journal of Biomedical Semantics\",\"volume\":\"10 Suppl 1\",\"pages\":\"23\"},\"PeriodicalIF\":1.6000,\"publicationDate\":\"2019-11-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.1186/s13326-019-0211-7\",\"citationCount\":\"30\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Biomedical Semantics\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://doi.org/10.1186/s13326-019-0211-7\",\"RegionNum\":3,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"MATHEMATICAL & COMPUTATIONAL BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Biomedical Semantics","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1186/s13326-019-0211-7","RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}

引用次数: 30

摘要

背景：随着文本挖掘技术的改进和大型非结构化电子医疗记录（EHR）数据集的可用性，现在可以从EHR中包含的原始文本中以相当高的精度提取结构化信息。我们描述了一个文本挖掘系统，用于对放射科医生的CT和MRI脑部扫描报告进行分类，分配指示中风发生和类型的标签，以及其他观察结果。我们的系统，我们在这里描述的爱丁堡放射学报告信息提取（EdIE-R）系统，是在一组放射学报告上开发和测试的。本文报道的工作基于爱丁堡中风研究（ESS）的1168份放射学报告，该研究是一项基于医院的中风和短暂性脑缺血发作患者登记。我们为这些数据手动创建注释，同时开发基于规则的EdIE-R系统，以识别放射学报告中与中风相关的表型信息。这个过程是迭代的，在每次迭代时都考虑了领域专家的反馈，以适应和调整EdIE-R文本挖掘系统，该系统识别每个报告中的实体、否定和实体之间的关系，并确定报告级别的标签（表型）。结果：所有类型的注释的注释者间一致性（IAA）都很高，实体为96.96，否定为96.46，95.84表示关系，94.02表示标签。盲测试集中的等价系统分数同样高，第一个注释者的实体分数为95.49，否定分数为94.41，关系分数为98.27，标签分数为96.39，第二个注释者分别为96.86、96.01、96.53和92.61。结论：以如此高的准确度自动读取此类EHR数据为人口健康监测和审计开辟了途径，并可以为流行病学研究提供资源。我们正在英国国家医疗服务体系（NHS）英格兰和苏格兰的较大人群中验证EdIE-R。人工注释的ESS语料库将在应用时用于研究目的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Text mining brain imaging reports.

查看原文本刊更多论文

Text mining brain imaging reports.

Background: With the improvements to text mining technology and the availability of large unstructured Electronic Healthcare Records (EHR) datasets, it is now possible to extract structured information from raw text contained within EHR at reasonably high accuracy. We describe a text mining system for classifying radiologists' reports of CT and MRI brain scans, assigning labels indicating occurrence and type of stroke, as well as other observations. Our system, the Edinburgh Information Extraction for Radiology reports (EdIE-R) system, which we describe here, was developed and tested on a collection of radiology reports.The work reported in this paper is based on 1168 radiology reports from the Edinburgh Stroke Study (ESS), a hospital-based register of stroke and transient ischaemic attack patients. We manually created annotations for this data in parallel with developing the rule-based EdIE-R system to identify phenotype information related to stroke in radiology reports. This process was iterative and domain expert feedback was considered at each iteration to adapt and tune the EdIE-R text mining system which identifies entities, negation and relations between entities in each report and determines report-level labels (phenotypes).

Results: The inter-annotator agreement (IAA) for all types of annotations is high at 96.96 for entities, 96.46 for negation, 95.84 for relations and 94.02 for labels. The equivalent system scores on the blind test set are equally high at 95.49 for entities, 94.41 for negation, 98.27 for relations and 96.39 for labels for the first annotator and 96.86, 96.01, 96.53 and 92.61, respectively for the second annotator.

Conclusion: Automated reading of such EHR data at such high levels of accuracies opens up avenues for population health monitoring and audit, and can provide a resource for epidemiological studies. We are in the process of validating EdIE-R in separate larger cohorts in NHS England and Scotland. The manually annotated ESS corpus will be available for research purposes on application.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Biomedical Semantics MATHEMATICAL & COMPUTATIONAL BIOLOGY-

CiteScore

4.20

自引率

5.30%

发文量

审稿时长

30 weeks

期刊介绍： Journal of Biomedical Semantics addresses issues of semantic enrichment and semantic processing in the biomedical domain. The scope of the journal covers two main areas: Infrastructure for biomedical semantics: focusing on semantic resources and repositories, meta-data management and resource description, knowledge representation and semantic frameworks, the Biomedical Semantic Web, and semantic interoperability. Semantic mining, annotation, and analysis: focusing on approaches and applications of semantic resources; and tools for investigation, reasoning, prediction, and discoveries in biomedicine.