临床文本中的不连续命名实体：系统文献综述。

IF 4 2区医学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Journal of Biomedical Informatics Pub Date : 2025-02-01 DOI:10.1016/j.jbi.2025.104783

Areej Alhassan , Viktor Schlegel , Monira Aloud , Riza Batista-Navarro , Goran Nenadic

{"title":"临床文本中的不连续命名实体：系统文献综述。","authors":"Areej Alhassan , Viktor Schlegel , Monira Aloud , Riza Batista-Navarro , Goran Nenadic","doi":"10.1016/j.jbi.2025.104783","DOIUrl":null,"url":null,"abstract":"<div><h3>Objective</h3><div>Extracting named entities from clinical free-text presents unique challenges, particularly when dealing with discontinuous entities—mentions that are separated by unrelated words. Traditional NER methods often struggle to accurately identify these entities, prompting the development of specialised computational solutions. This paper systematically reviews and presents the methodologies developed for Discontinuous Named Entity Recognition in clinical texts, highlighting their effectiveness and the challenges they face.</div></div><div><h3>Method</h3><div>We conducted a systematic literature review focused on discontinuous named entities, using structured searches across four Computer Science-related and one medical-related electronic database. A combination of search terms, grouped into three synonym categories—problem, entity/approach, and task—yielded 2,442 articles. Guided by our research objectives, we identified five key dimensions to systematically annotate and normalise the data for comprehensive analysis.</div></div><div><h3>Result</h3><div>The review included 44 studies which were coded across several key dimensions: the chronological development of approaches, the corpora used, the downstream tasks affected by discontinuous named entities, the methodological approaches proposed to address the issue, and the reported performance outcomes. The discussion section examines the challenges encountered in this area and suggests potential directions for future research.</div></div><div><h3>Conclusion</h3><div>Significant progress has been made in discontinuous named entity recognition; however, there remains a need for more adaptable, generalisable solutions that are independent of custom annotation schemes. Exploring various configurations of generative language models presents a promising avenue for advancing this area. Additionally, future research should investigate the impact of precise versus imprecise recognition of discontinuous entities on clinical downstream tasks to better understand its practical implications in healthcare applications.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"162 ","pages":"Article 104783"},"PeriodicalIF":4.0000,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Discontinuous named entities in clinical text: A systematic literature review\",\"authors\":\"Areej Alhassan , Viktor Schlegel , Monira Aloud , Riza Batista-Navarro , Goran Nenadic\",\"doi\":\"10.1016/j.jbi.2025.104783\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Objective</h3><div>Extracting named entities from clinical free-text presents unique challenges, particularly when dealing with discontinuous entities—mentions that are separated by unrelated words. Traditional NER methods often struggle to accurately identify these entities, prompting the development of specialised computational solutions. This paper systematically reviews and presents the methodologies developed for Discontinuous Named Entity Recognition in clinical texts, highlighting their effectiveness and the challenges they face.</div></div><div><h3>Method</h3><div>We conducted a systematic literature review focused on discontinuous named entities, using structured searches across four Computer Science-related and one medical-related electronic database. A combination of search terms, grouped into three synonym categories—problem, entity/approach, and task—yielded 2,442 articles. Guided by our research objectives, we identified five key dimensions to systematically annotate and normalise the data for comprehensive analysis.</div></div><div><h3>Result</h3><div>The review included 44 studies which were coded across several key dimensions: the chronological development of approaches, the corpora used, the downstream tasks affected by discontinuous named entities, the methodological approaches proposed to address the issue, and the reported performance outcomes. The discussion section examines the challenges encountered in this area and suggests potential directions for future research.</div></div><div><h3>Conclusion</h3><div>Significant progress has been made in discontinuous named entity recognition; however, there remains a need for more adaptable, generalisable solutions that are independent of custom annotation schemes. Exploring various configurations of generative language models presents a promising avenue for advancing this area. Additionally, future research should investigate the impact of precise versus imprecise recognition of discontinuous entities on clinical downstream tasks to better understand its practical implications in healthcare applications.</div></div>\",\"PeriodicalId\":15263,\"journal\":{\"name\":\"Journal of Biomedical Informatics\",\"volume\":\"162 \",\"pages\":\"Article 104783\"},\"PeriodicalIF\":4.0000,\"publicationDate\":\"2025-02-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Biomedical Informatics\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1532046425000127\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Biomedical Informatics","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1532046425000127","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

摘要

目的：从临床自由文本中提取命名实体提出了独特的挑战，特别是在处理不连续实体时-由不相关的单词分隔的提及。传统的NER方法往往难以准确识别这些实体，这促使了专门计算解决方案的发展。本文系统地回顾和介绍了临床文本中不连续命名实体识别的方法，突出了它们的有效性和面临的挑战。方法：我们在四个计算机科学相关的电子数据库中使用结构化搜索，对不连续命名实体进行了系统的文献综述。将搜索词组合成三个同义词类别——问题、实体/方法和任务——产生了2442篇文章。在我们的研究目标的指导下，我们确定了五个关键维度来系统地注释和规范化数据以进行全面分析。结果：该综述包括44项研究，这些研究在几个关键维度上进行了编码：方法的时间顺序发展，使用的语料库，受不连续命名实体影响的下游任务，提出的解决问题的方法方法，以及报告的绩效结果。讨论部分探讨了在这一领域遇到的挑战，并提出了未来研究的潜在方向。结论：不连续命名实体识别取得了显著进展；然而，仍然需要一种适应性更强、可推广的、独立于自定义注释方案的解决方案。探索生成语言模型的各种配置为推进这一领域提供了一条有前途的途径。此外，未来的研究应该调查精确和不精确识别不连续实体对临床下游任务的影响，以更好地了解其在医疗保健应用中的实际意义。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Discontinuous named entities in clinical text: A systematic literature review

查看原文本刊更多论文

Discontinuous named entities in clinical text: A systematic literature review

Objective

Extracting named entities from clinical free-text presents unique challenges, particularly when dealing with discontinuous entities—mentions that are separated by unrelated words. Traditional NER methods often struggle to accurately identify these entities, prompting the development of specialised computational solutions. This paper systematically reviews and presents the methodologies developed for Discontinuous Named Entity Recognition in clinical texts, highlighting their effectiveness and the challenges they face.

Method

We conducted a systematic literature review focused on discontinuous named entities, using structured searches across four Computer Science-related and one medical-related electronic database. A combination of search terms, grouped into three synonym categories—problem, entity/approach, and task—yielded 2,442 articles. Guided by our research objectives, we identified five key dimensions to systematically annotate and normalise the data for comprehensive analysis.

Result

The review included 44 studies which were coded across several key dimensions: the chronological development of approaches, the corpora used, the downstream tasks affected by discontinuous named entities, the methodological approaches proposed to address the issue, and the reported performance outcomes. The discussion section examines the challenges encountered in this area and suggests potential directions for future research.

Conclusion

Significant progress has been made in discontinuous named entity recognition; however, there remains a need for more adaptable, generalisable solutions that are independent of custom annotation schemes. Exploring various configurations of generative language models presents a promising avenue for advancing this area. Additionally, future research should investigate the impact of precise versus imprecise recognition of discontinuous entities on clinical downstream tasks to better understand its practical implications in healthcare applications.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Biomedical Informatics 医学-计算机：跨学科应用

CiteScore

8.90

自引率

6.70%

发文量

243

审稿时长

32 days

期刊介绍： The Journal of Biomedical Informatics reflects a commitment to high-quality original research papers, reviews, and commentaries in the area of biomedical informatics methodology. Although we publish articles motivated by applications in the biomedical sciences (for example, clinical medicine, health care, population health, and translational bioinformatics), the journal emphasizes reports of new methodologies and techniques that have general applicability and that form the basis for the evolving science of biomedical informatics. Articles on medical devices; evaluations of implemented systems (including clinical trials of information technologies); or papers that provide insight into a biological process, a specific disease, or treatment options would generally be more suitable for publication in other venues. Papers on applications of signal processing and image analysis are often more suitable for biomedical engineering journals or other informatics journals, although we do publish papers that emphasize the information management and knowledge representation/modeling issues that arise in the storage and use of biological signals and images. System descriptions are welcome if they illustrate and substantiate the underlying methodology that is the principal focus of the report and an effort is made to address the generalizability and/or range of application of that methodology. Note also that, given the international nature of JBI, papers that deal with specific languages other than English, or with country-specific health systems or approaches, are acceptable for JBI only if they offer generalizable lessons that are relevant to the broad JBI readership, regardless of their country, language, culture, or health system.