BioCreative VI Precision Medicine Track: creating a training corpus for mining protein-protein interactions affected by mutations

Workshop on Biomedical Natural Language Processing Pub Date : 2017-08-01 DOI:10.18653/v1/W17-2321

R. Dogan, A. Chatr-aryamontri, Sun Kim, Chih-Hsuan Wei, Yifan Peng, Donald C. Comeau, Zhiyong Lu

引用次数: 22

Abstract

The Precision Medicine Track in BioCre-ative VI aims to bring together the Bi-oNLP community for a novel challenge focused on mining the biomedical litera-ture in search of mutations and protein-protein interactions (PPI). In order to support this track with an effective train-ing dataset with limited curator time, the track organizers carefully reviewed Pub-Med articles from two different sources: curated public PPI databases, and the re-sults of state-of-the-art public text mining tools. We detail here the data collection, manual review and annotation process and describe this training corpus charac-teristics. We also describe a corpus per-formance baseline. This analysis will provide useful information to developers and researchers for comparing and devel-oping innovative text mining approaches for the BioCreative VI challenge and other Precision Medicine related applica-tions.

查看原文本刊更多论文

BioCreative VI精准医学轨道:创建一个训练语料库，用于挖掘受突变影响的蛋白质-蛋白质相互作用

biocreative VI的精准医学Track旨在将Bi-oNLP社区聚集在一起，专注于挖掘生物医学文献以寻找突变和蛋白质-蛋白质相互作用(PPI)。为了在有限的策展人时间内用有效的训练数据集支持这个赛道，赛道组织者仔细审查了来自两个不同来源的Pub-Med文章:策展的公共PPI数据库和最先进的公共文本挖掘工具的结果。本文详细介绍了数据采集、人工评审和标注过程，并描述了该训练语料库的特点。我们还描述了语料库性能基线。该分析将为开发人员和研究人员提供有用的信息，以比较和开发针对BioCreative VI挑战和其他精准医学相关应用的创新文本挖掘方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Workshop on Biomedical Natural Language Processing

自引率

0.00%

发文量