Improving semantic role labeling using high-level classification in complex networks

2017 13th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD) Pub Date : 2017-07-29 DOI:10.1109/FSKD.2017.8393113

M. Carneiro, J. Rosa, Qiusheng Zheng, Xiaoming Liu, Liang Zhao

{"title":"Improving semantic role labeling using high-level classification in complex networks","authors":"M. Carneiro, J. Rosa, Qiusheng Zheng, Xiaoming Liu, Liang Zhao","doi":"10.1109/FSKD.2017.8393113","DOIUrl":null,"url":null,"abstract":"While traditional supervised learning methods perform classification based only on the physical features of the data (e.g. distribution, similarity or distance), the high-level classification is characterized by its ability to capture topological features of the input data by using complex network measures. Recent works have shown that a variety of patterns can be detected by combining both features of the data, although the physical features alone are unable to uncover them. In this article we investigate such a hybrid method for the Semantic Role Labeling (SRL) task, which consists of the identification and classification of arguments in a sentence with roles that indicate semantic relations between an event and its participants. Due to its potential to improve many other natural language processing tasks, such as information extraction and plagiarism detection to name a few, we consider the SRL task over a Brazilian Portuguese corpus named PropBank-br, which was built with texts from Brazilian newspapers. Such a corpus represents a challenging classification problem as it suffers with the scarcity of annotated data and very imbalanced distributions, like the majority of non-English corpus. Experiments were performed considering the argument classification task over the whole corpus and, specifically, over the most frequent verbs. Results in the verb-specific scenario revealed that the high-level system is able to obtain a considerable gain in terms of predictive performance, even over a state-of-the-art algorithm for SRL.","PeriodicalId":236093,"journal":{"name":"2017 13th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 13th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FSKD.2017.8393113","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

While traditional supervised learning methods perform classification based only on the physical features of the data (e.g. distribution, similarity or distance), the high-level classification is characterized by its ability to capture topological features of the input data by using complex network measures. Recent works have shown that a variety of patterns can be detected by combining both features of the data, although the physical features alone are unable to uncover them. In this article we investigate such a hybrid method for the Semantic Role Labeling (SRL) task, which consists of the identification and classification of arguments in a sentence with roles that indicate semantic relations between an event and its participants. Due to its potential to improve many other natural language processing tasks, such as information extraction and plagiarism detection to name a few, we consider the SRL task over a Brazilian Portuguese corpus named PropBank-br, which was built with texts from Brazilian newspapers. Such a corpus represents a challenging classification problem as it suffers with the scarcity of annotated data and very imbalanced distributions, like the majority of non-English corpus. Experiments were performed considering the argument classification task over the whole corpus and, specifically, over the most frequent verbs. Results in the verb-specific scenario revealed that the high-level system is able to obtain a considerable gain in terms of predictive performance, even over a state-of-the-art algorithm for SRL.

查看原文本刊更多论文

在复杂网络中使用高级分类改进语义角色标注

传统的监督学习方法仅基于数据的物理特征(例如分布、相似度或距离)进行分类，而高级分类的特点是能够通过使用复杂的网络度量来捕获输入数据的拓扑特征。最近的研究表明，通过结合数据的两种特征可以检测到各种模式，尽管单独的物理特征无法揭示它们。在本文中，我们研究了这种用于语义角色标记(SRL)任务的混合方法，该方法包括对句子中的参数进行识别和分类，这些参数具有指示事件及其参与者之间语义关系的角色。由于SRL具有改进许多其他自然语言处理任务的潜力，例如信息提取和抄袭检测等，我们考虑在名为PropBank-br的巴西葡萄牙语语料库上进行SRL任务，该语料库是用巴西报纸的文本构建的。这样的语料库代表了一个具有挑战性的分类问题，因为它与大多数非英语语料库一样，缺乏注释数据和非常不平衡的分布。实验考虑了整个语料库的论点分类任务，特别是最常见的动词。动词特定场景的结果表明，高级系统能够在预测性能方面获得相当大的增益，甚至超过最先进的SRL算法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2017 13th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD)

自引率

0.00%

发文量