概念图:医学文献文本分析的新方法。

Q3 Health Professions

Studies in Health Technology and Informatics Pub Date : 2023-09-12 DOI:10.3233/SHTI230710

Franz Matthies, Christoph Beger, Ralph Schäfermeier, Alexandr Uciteli

{"title":"概念图:医学文献文本分析的新方法。","authors":"Franz Matthies, Christoph Beger, Ralph Schäfermeier, Alexandr Uciteli","doi":"10.3233/SHTI230710","DOIUrl":null,"url":null,"abstract":"The task of automatically analyzing the textual content of documents faces a number of challenges in general but even more so when dealing with the medical domain. Here, we can't normally rely on specifically pre-trained NLP models or even, due to data privacy reasons, (massive) amounts of training material to generate said models. We, therefore, propose a method that utilizes general-purpose basic text analysis components and state-of-the-art transformer models to represent a corpus of documents as multiple graphs, wherein important conceptually related phrases from documents constitute the nodes and their semantic relation form the edges. This method could serve as a basis for several explorative procedures and is able to draw on a plethora of publicly available resources. We test it by comparing the effectiveness of these so-called Concept Graphs with another recently suggested approach for a common use case in information retrieval, document clustering.","PeriodicalId":39242,"journal":{"name":"Studies in Health Technology and Informatics","volume":"307 ","pages":"172-179"},"PeriodicalIF":0.0000,"publicationDate":"2023-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Concept Graphs: A Novel Approach for Textual Analysis of Medical Documents.\",\"authors\":\"Franz Matthies, Christoph Beger, Ralph Schäfermeier, Alexandr Uciteli\",\"doi\":\"10.3233/SHTI230710\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The task of automatically analyzing the textual content of documents faces a number of challenges in general but even more so when dealing with the medical domain. Here, we can't normally rely on specifically pre-trained NLP models or even, due to data privacy reasons, (massive) amounts of training material to generate said models. We, therefore, propose a method that utilizes general-purpose basic text analysis components and state-of-the-art transformer models to represent a corpus of documents as multiple graphs, wherein important conceptually related phrases from documents constitute the nodes and their semantic relation form the edges. This method could serve as a basis for several explorative procedures and is able to draw on a plethora of publicly available resources. We test it by comparing the effectiveness of these so-called Concept Graphs with another recently suggested approach for a common use case in information retrieval, document clustering.\",\"PeriodicalId\":39242,\"journal\":{\"name\":\"Studies in Health Technology and Informatics\",\"volume\":\"307 \",\"pages\":\"172-179\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-09-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Studies in Health Technology and Informatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3233/SHTI230710\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"Health Professions\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Studies in Health Technology and Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3233/SHTI230710","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Health Professions","Score":null,"Total":0}

引用次数: 0

摘要

自动分析文档文本内容的任务通常面临许多挑战，但在处理医学领域时更是如此。在这里，我们通常不能依靠专门预训练的NLP模型，甚至由于数据隐私原因，(大量)训练材料来生成所述模型。因此，我们提出了一种方法，利用通用的基本文本分析组件和最先进的转换器模型将文档语料库表示为多个图，其中文档中重要的概念相关短语构成节点，它们的语义关系构成边缘。这种方法可以作为若干探索性程序的基础，并能够利用大量的公共资源。我们通过比较这些所谓的概念图和最近提出的另一种方法的有效性来测试它，这种方法用于信息检索中的一个常见用例——文档聚类。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Concept Graphs: A Novel Approach for Textual Analysis of Medical Documents.

The task of automatically analyzing the textual content of documents faces a number of challenges in general but even more so when dealing with the medical domain. Here, we can't normally rely on specifically pre-trained NLP models or even, due to data privacy reasons, (massive) amounts of training material to generate said models. We, therefore, propose a method that utilizes general-purpose basic text analysis components and state-of-the-art transformer models to represent a corpus of documents as multiple graphs, wherein important conceptually related phrases from documents constitute the nodes and their semantic relation form the edges. This method could serve as a basis for several explorative procedures and is able to draw on a plethora of publicly available resources. We test it by comparing the effectiveness of these so-called Concept Graphs with another recently suggested approach for a common use case in information retrieval, document clustering.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Studies in Health Technology and Informatics Health Professions-Health Information Management

CiteScore

1.20

自引率

0.00%

发文量

1463

期刊介绍： This book series was started in 1990 to promote research conducted under the auspices of the EC programmes’ Advanced Informatics in Medicine (AIM) and Biomedical and Health Research (BHR) bioengineering branch. A driving aspect of international health informatics is that telecommunication technology, rehabilitative technology, intelligent home technology and many other components are moving together and form one integrated world of information and communication media.