生物医学领域实体识别的学习自适应表示。

IF 1.6 3区工程技术 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Journal of Biomedical Semantics Pub Date : 2021-05-17 DOI:10.1186/s13326-021-00238-0

Ivano Lauriola, Fabio Aiolli, Alberto Lavelli, Fabio Rinaldi

{"title":"生物医学领域实体识别的学习自适应表示。","authors":"Ivano Lauriola, Fabio Aiolli, Alberto Lavelli, Fabio Rinaldi","doi":"10.1186/s13326-021-00238-0","DOIUrl":null,"url":null,"abstract":"Background: Named Entity Recognition is a common task in Natural Language Processing applications, whose purpose is to recognize named entities in textual documents. Several systems exist to solve this task in the biomedical domain, based on Natural Language Processing techniques and Machine Learning algorithms. A crucial step of these applications is the choice of the representation which describes data. Several representations have been proposed in the literature, some of which are based on a strong knowledge of the domain, and they consist of features manually defined by domain experts. Usually, these representations describe the problem well, but they require a lot of human effort and annotated data. On the other hand, general-purpose representations like word-embeddings do not require human domain knowledge, but they could be too general for a specific task.Results: This paper investigates methods to learn the best representation from data directly, by combining several knowledge-based representations and word embeddings. Two mechanisms have been considered to perform the combination, which are neural networks and Multiple Kernel Learning. To this end, we use a hybrid architecture for biomedical entity recognition which integrates dictionary look-up (also known as gazetteers) with machine learning techniques. Results on the CRAFT corpus clearly show the benefits of the proposed algorithm in terms of F1 score.Conclusions: Our experiments show that the principled combination of general, domain specific, word-, and character-level representations improves the performance of entity recognition. We also discussed the contribution of each representation in the final solution.","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":" ","pages":"10"},"PeriodicalIF":1.6000,"publicationDate":"2021-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/s13326-021-00238-0","citationCount":"2","resultStr":"{\"title\":\"Learning adaptive representations for entity recognition in the biomedical domain.\",\"authors\":\"Ivano Lauriola, Fabio Aiolli, Alberto Lavelli, Fabio Rinaldi\",\"doi\":\"10.1186/s13326-021-00238-0\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Background: Named Entity Recognition is a common task in Natural Language Processing applications, whose purpose is to recognize named entities in textual documents. Several systems exist to solve this task in the biomedical domain, based on Natural Language Processing techniques and Machine Learning algorithms. A crucial step of these applications is the choice of the representation which describes data. Several representations have been proposed in the literature, some of which are based on a strong knowledge of the domain, and they consist of features manually defined by domain experts. Usually, these representations describe the problem well, but they require a lot of human effort and annotated data. On the other hand, general-purpose representations like word-embeddings do not require human domain knowledge, but they could be too general for a specific task.Results: This paper investigates methods to learn the best representation from data directly, by combining several knowledge-based representations and word embeddings. Two mechanisms have been considered to perform the combination, which are neural networks and Multiple Kernel Learning. To this end, we use a hybrid architecture for biomedical entity recognition which integrates dictionary look-up (also known as gazetteers) with machine learning techniques. Results on the CRAFT corpus clearly show the benefits of the proposed algorithm in terms of F1 score.Conclusions: Our experiments show that the principled combination of general, domain specific, word-, and character-level representations improves the performance of entity recognition. We also discussed the contribution of each representation in the final solution.\",\"PeriodicalId\":15055,\"journal\":{\"name\":\"Journal of Biomedical Semantics\",\"volume\":\" \",\"pages\":\"10\"},\"PeriodicalIF\":1.6000,\"publicationDate\":\"2021-05-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.1186/s13326-021-00238-0\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Biomedical Semantics\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://doi.org/10.1186/s13326-021-00238-0\",\"RegionNum\":3,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"MATHEMATICAL & COMPUTATIONAL BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Biomedical Semantics","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1186/s13326-021-00238-0","RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}

引用次数: 2

摘要

背景:命名实体识别是自然语言处理应用中的一个常见任务，其目的是识别文本文档中的命名实体。基于自然语言处理技术和机器学习算法，在生物医学领域有几个系统可以解决这一任务。这些应用程序的一个关键步骤是选择描述数据的表示形式。文献中已经提出了几种表示，其中一些是基于对领域的强大知识，它们由领域专家手动定义的特征组成。通常，这些表示很好地描述了问题，但它们需要大量的人力和带注释的数据。另一方面，像词嵌入这样的通用表示不需要人类的领域知识，但对于特定的任务来说，它们可能过于通用。结果:本文通过结合几种基于知识的表示和词嵌入，研究了直接从数据中学习最佳表示的方法。神经网络和多核学习被认为是实现这两种组合的两种机制。为此，我们使用混合架构进行生物医学实体识别，该架构将字典查找(也称为地名词典)与机器学习技术集成在一起。在CRAFT语料库上的结果清楚地显示了该算法在F1分数方面的优势。结论:我们的实验表明，一般、特定领域、词级和字符级表示的原则组合提高了实体识别的性能。我们还讨论了每个表示在最终解决方案中的贡献。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Learning adaptive representations for entity recognition in the biomedical domain.

查看原文本刊更多论文

Learning adaptive representations for entity recognition in the biomedical domain.

Background: Named Entity Recognition is a common task in Natural Language Processing applications, whose purpose is to recognize named entities in textual documents. Several systems exist to solve this task in the biomedical domain, based on Natural Language Processing techniques and Machine Learning algorithms. A crucial step of these applications is the choice of the representation which describes data. Several representations have been proposed in the literature, some of which are based on a strong knowledge of the domain, and they consist of features manually defined by domain experts. Usually, these representations describe the problem well, but they require a lot of human effort and annotated data. On the other hand, general-purpose representations like word-embeddings do not require human domain knowledge, but they could be too general for a specific task.

Results: This paper investigates methods to learn the best representation from data directly, by combining several knowledge-based representations and word embeddings. Two mechanisms have been considered to perform the combination, which are neural networks and Multiple Kernel Learning. To this end, we use a hybrid architecture for biomedical entity recognition which integrates dictionary look-up (also known as gazetteers) with machine learning techniques. Results on the CRAFT corpus clearly show the benefits of the proposed algorithm in terms of F₁ score.

Conclusions: Our experiments show that the principled combination of general, domain specific, word-, and character-level representations improves the performance of entity recognition. We also discussed the contribution of each representation in the final solution.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Biomedical Semantics MATHEMATICAL & COMPUTATIONAL BIOLOGY-

CiteScore

4.20

自引率

5.30%

发文量

审稿时长

30 weeks

期刊介绍： Journal of Biomedical Semantics addresses issues of semantic enrichment and semantic processing in the biomedical domain. The scope of the journal covers two main areas: Infrastructure for biomedical semantics: focusing on semantic resources and repositories, meta-data management and resource description, knowledge representation and semantic frameworks, the Biomedical Semantic Web, and semantic interoperability. Semantic mining, annotation, and analysis: focusing on approaches and applications of semantic resources; and tools for investigation, reasoning, prediction, and discoveries in biomedicine.