基于深度多任务学习的生物命名实体识别和角色标记

2021 13th International Conference on Machine Learning and Computing Pub Date : 2021-02-26 DOI:10.1145/3457682.3457751

Fei Deng, Dongdong Zhang, Jing Peng

{"title":"基于深度多任务学习的生物命名实体识别和角色标记","authors":"Fei Deng, Dongdong Zhang, Jing Peng","doi":"10.1145/3457682.3457751","DOIUrl":null,"url":null,"abstract":"Bioscience is an experimental science. The qualitative and quantitative findings of the biological experiments are often exclusively available in the form of figures in published papers. In this paper, we introduce the SourceData model, which captures a key aspect of the biological experimental design by categorizing biological entity involved in the experiment into one of the six roles. Our work aims at determining whether a given entity is subjected to a perturbation or is the object of a measurement (entity role labeling) through automatic natural language algorithms. We use state-of-the-art transformer models (e.g., Bert and its variants) as a strong baseline, find that after jointly trained with biological named entity recognition task by deep multi-task learning (MTL), the F1 score gets improved by 2% compared to previous single-task architecture. Also, for named entity recognition task, the MTL method achieves comparable performance in five public datasets. Further analysis reveals the importance of fusing entity information at the input layer of entity role labeling task and incorporating global context.","PeriodicalId":142045,"journal":{"name":"2021 13th International Conference on Machine Learning and Computing","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Biological Named Entity Recognition and Role Labeling via Deep Multi-task Learning\",\"authors\":\"Fei Deng, Dongdong Zhang, Jing Peng\",\"doi\":\"10.1145/3457682.3457751\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Bioscience is an experimental science. The qualitative and quantitative findings of the biological experiments are often exclusively available in the form of figures in published papers. In this paper, we introduce the SourceData model, which captures a key aspect of the biological experimental design by categorizing biological entity involved in the experiment into one of the six roles. Our work aims at determining whether a given entity is subjected to a perturbation or is the object of a measurement (entity role labeling) through automatic natural language algorithms. We use state-of-the-art transformer models (e.g., Bert and its variants) as a strong baseline, find that after jointly trained with biological named entity recognition task by deep multi-task learning (MTL), the F1 score gets improved by 2% compared to previous single-task architecture. Also, for named entity recognition task, the MTL method achieves comparable performance in five public datasets. Further analysis reveals the importance of fusing entity information at the input layer of entity role labeling task and incorporating global context.\",\"PeriodicalId\":142045,\"journal\":{\"name\":\"2021 13th International Conference on Machine Learning and Computing\",\"volume\":\"8 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-02-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 13th International Conference on Machine Learning and Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3457682.3457751\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 13th International Conference on Machine Learning and Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3457682.3457751","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

生物科学是一门实验科学。生物学实验的定性和定量结果通常只以发表论文的数字形式提供。在本文中，我们引入了SourceData模型，该模型通过将实验中涉及的生物实体分类为六个角色之一来捕捉生物实验设计的一个关键方面。我们的工作旨在通过自动自然语言算法确定给定实体是受到扰动还是测量对象(实体角色标记)。我们使用最先进的变压器模型(例如Bert及其变体)作为强基线，发现通过深度多任务学习(MTL)与生物命名实体识别任务联合训练后，F1分数比以前的单任务架构提高了2%。此外，对于命名实体识别任务，MTL方法在5个公共数据集上也达到了相当的性能。进一步分析表明，在实体角色标注任务的输入层融合实体信息和结合全局上下文的重要性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Biological Named Entity Recognition and Role Labeling via Deep Multi-task Learning

Bioscience is an experimental science. The qualitative and quantitative findings of the biological experiments are often exclusively available in the form of figures in published papers. In this paper, we introduce the SourceData model, which captures a key aspect of the biological experimental design by categorizing biological entity involved in the experiment into one of the six roles. Our work aims at determining whether a given entity is subjected to a perturbation or is the object of a measurement (entity role labeling) through automatic natural language algorithms. We use state-of-the-art transformer models (e.g., Bert and its variants) as a strong baseline, find that after jointly trained with biological named entity recognition task by deep multi-task learning (MTL), the F1 score gets improved by 2% compared to previous single-task architecture. Also, for named entity recognition task, the MTL method achieves comparable performance in five public datasets. Further analysis reveals the importance of fusing entity information at the input layer of entity role labeling task and incorporating global context.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2021 13th International Conference on Machine Learning and Computing

自引率

0.00%

发文量