Bio-molecular event extraction using Support Vector Machine

S. Saha, A. Majumder, M. Hasanuzzaman, Asif Ekbal
{"title":"Bio-molecular event extraction using Support Vector Machine","authors":"S. Saha, A. Majumder, M. Hasanuzzaman, Asif Ekbal","doi":"10.1109/ICOAC.2011.6165192","DOIUrl":null,"url":null,"abstract":"The main goal of Biomedical Natural Language Processing (BioNLP) is to capture biomedical phenomena from textual data by extracting relevant entities, information and relations between biomedical entities (i.e. proteins and genes). In general, in most of the published papers, only binary relations were extracted. In a recent past, the focus is shifted towards extracting more complex relations in the form of bio-molecular events that may include several entities or other relations. In this paper we propose an approach that enables event extraction (detection and classification) of relatively complex bio-molecular events. We approach this problem as a supervised classificat ion problem and use the well-known algorithm, namely Support Vector Machine (SVM) that makes use of statistical and linguistic features that represent various morphological, syntactic and contextual information of the candidate bio-molecular trigger words. Firstly, we consider the problem of event detection and classification as a two-step process, first step of which deals with the event detection task and the second step classifies these identified events to one of the nine predefined classes. Later on we tr eat this problem as one-step process, and perform event detection and classification together. Three-fold cross validation expe riments on the BioNLP 2009 shared task datasets yield the overall average recall, precision and F-measure values of 62.95%, 74.53%, and 68.25%, respectively, for the event detection. We observed the overall classification accuracy of 72.50%. Evaluation resu lts of the proposed approach when detection and classification are performed together showed the overall recall, precision and F-measure values of 57.66%, 55.87%, and 56.75%, respectively.","PeriodicalId":369712,"journal":{"name":"2011 Third International Conference on Advanced Computing","volume":"148 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 Third International Conference on Advanced Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICOAC.2011.6165192","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

Abstract

The main goal of Biomedical Natural Language Processing (BioNLP) is to capture biomedical phenomena from textual data by extracting relevant entities, information and relations between biomedical entities (i.e. proteins and genes). In general, in most of the published papers, only binary relations were extracted. In a recent past, the focus is shifted towards extracting more complex relations in the form of bio-molecular events that may include several entities or other relations. In this paper we propose an approach that enables event extraction (detection and classification) of relatively complex bio-molecular events. We approach this problem as a supervised classificat ion problem and use the well-known algorithm, namely Support Vector Machine (SVM) that makes use of statistical and linguistic features that represent various morphological, syntactic and contextual information of the candidate bio-molecular trigger words. Firstly, we consider the problem of event detection and classification as a two-step process, first step of which deals with the event detection task and the second step classifies these identified events to one of the nine predefined classes. Later on we tr eat this problem as one-step process, and perform event detection and classification together. Three-fold cross validation expe riments on the BioNLP 2009 shared task datasets yield the overall average recall, precision and F-measure values of 62.95%, 74.53%, and 68.25%, respectively, for the event detection. We observed the overall classification accuracy of 72.50%. Evaluation resu lts of the proposed approach when detection and classification are performed together showed the overall recall, precision and F-measure values of 57.66%, 55.87%, and 56.75%, respectively.
基于支持向量机的生物分子事件提取
生物医学自然语言处理(BioNLP)的主要目标是通过提取生物医学实体(即蛋白质和基因)之间的相关实体、信息和关系,从文本数据中捕获生物医学现象。一般来说,在大多数已发表的论文中,只提取了二元关系。在最近的过去,重点转向提取生物分子事件形式的更复杂的关系,这些事件可能包括几个实体或其他关系。在本文中,我们提出了一种方法,使事件提取(检测和分类)相对复杂的生物分子事件。我们将这个问题作为一个监督分类问题来处理,并使用著名的算法,即支持向量机(SVM),该算法利用了代表候选生物分子触发词的各种形态、句法和上下文信息的统计和语言特征。首先,我们将事件检测和分类问题视为一个两步过程,第一步处理事件检测任务,第二步将这些识别出来的事件分类到九个预定义类中的一个。随后,我们将此问题作为一步处理,并将事件检测和分类一起执行。在BioNLP 2009共享任务数据集上进行的三倍交叉验证实验得出事件检测的总体平均召回率、精度和F-measure值分别为62.95%、74.53%和68.25%。我们观察到总体分类准确率为72.50%。当检测和分类同时进行时,所提方法的总召回率、精密度和F-measure值分别为57.66%、55.87%和56.75%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信