MCVAE: Margin-based Conditional Variational Autoencoder for Relation Classification and Pattern Generation

The World Wide Web Conference Pub Date : 2019-05-13 DOI:10.1145/3308558.3313436

Fenglong Ma, Yaliang Li, Chenwei Zhang, Jing Gao, Nan Du, Wei Fan

{"title":"MCVAE: Margin-based Conditional Variational Autoencoder for Relation Classification and Pattern Generation","authors":"Fenglong Ma, Yaliang Li, Chenwei Zhang, Jing Gao, Nan Du, Wei Fan","doi":"10.1145/3308558.3313436","DOIUrl":null,"url":null,"abstract":"Relation classification is a basic yet important task in natural language processing. Existing relation classification approaches mainly rely on distant supervision, which assumes that a bag of sentences mentioning a pair of entities and extracted from a given corpus should express the same relation type of this entity pair. The training of these models needs a lot of high-quality bag-level data. However, in some specific domains, such as medical domain, it is difficult to obtain sufficient and high-quality sentences in a text corpus that mention two entities with a certain medical relation between them. In such a case, it is hard for existing discriminative models to capture the representative features (i.e., common patterns) from diversely expressed entity pairs with a given relation. Thus, the classification performance cannot be guaranteed when limited features are obtained from the corpus. To address this challenge, in this paper, we propose to employ a generative model, called conditional variational autoencoder (CVAE), to handle the pattern sparsity. We define that each relation has an individually learned latent distribution from all possible sentences expressing this relation. As these distributions are learned based on the purpose of input reconstruction, the model's classification ability may not be strong enough and should be improved. By distinguishing the differences among different relation distributions, a margin-based regularizer is designed, which leads to a margin-based CVAE (MCVAE) that can significantly enhance the classification ability. Besides, MCVAE can automatically generate semantically meaningful patterns that describe the given relations. Experiments on two real-world datasets validate the effectiveness of the proposed MCVAE on the tasks of relation classification and relation-specific pattern generation.","PeriodicalId":23013,"journal":{"name":"The World Wide Web Conference","volume":"16 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2019-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The World Wide Web Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3308558.3313436","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

Abstract

Relation classification is a basic yet important task in natural language processing. Existing relation classification approaches mainly rely on distant supervision, which assumes that a bag of sentences mentioning a pair of entities and extracted from a given corpus should express the same relation type of this entity pair. The training of these models needs a lot of high-quality bag-level data. However, in some specific domains, such as medical domain, it is difficult to obtain sufficient and high-quality sentences in a text corpus that mention two entities with a certain medical relation between them. In such a case, it is hard for existing discriminative models to capture the representative features (i.e., common patterns) from diversely expressed entity pairs with a given relation. Thus, the classification performance cannot be guaranteed when limited features are obtained from the corpus. To address this challenge, in this paper, we propose to employ a generative model, called conditional variational autoencoder (CVAE), to handle the pattern sparsity. We define that each relation has an individually learned latent distribution from all possible sentences expressing this relation. As these distributions are learned based on the purpose of input reconstruction, the model's classification ability may not be strong enough and should be improved. By distinguishing the differences among different relation distributions, a margin-based regularizer is designed, which leads to a margin-based CVAE (MCVAE) that can significantly enhance the classification ability. Besides, MCVAE can automatically generate semantically meaningful patterns that describe the given relations. Experiments on two real-world datasets validate the effectiveness of the proposed MCVAE on the tasks of relation classification and relation-specific pattern generation.

查看原文本刊更多论文

基于边缘的关系分类和模式生成条件变分自编码器

关系分类是自然语言处理中一项基本而又重要的任务。现有的关系分类方法主要依赖于远程监督，该方法假设从给定语料库中提取的提到一对实体的一组句子应该表达该实体对的相同关系类型。这些模型的训练需要大量高质量的袋级数据。然而，在一些特定的领域，如医学领域，很难在文本语料库中获得足够的、高质量的句子，这些句子提到两个实体之间具有一定的医学关系。在这种情况下，现有的判别模型很难从具有给定关系的不同表达的实体对中捕获代表性特征(即公共模式)。因此，当从语料库中获得有限的特征时，不能保证分类性能。为了解决这一挑战，在本文中，我们建议采用一种称为条件变分自编码器(CVAE)的生成模型来处理模式稀疏性。我们定义每个关系都有一个单独学习的潜在分布，来自所有表达这种关系的可能句子。由于这些分布是基于输入重构的目的来学习的，所以模型的分类能力可能不够强，需要改进。通过区分不同关系分布之间的差异，设计了基于边缘的正则化器，从而得到了能显著提高分类能力的基于边缘的CVAE (MCVAE)。此外，MCVAE可以自动生成描述给定关系的语义上有意义的模式。在两个真实数据集上的实验验证了所提出的MCVAE在关系分类和特定于关系的模式生成任务上的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

The World Wide Web Conference

自引率

0.00%

发文量