篇章关系分类中人工与自动标记数据:实例和特征选择的影响

LDV Forum Pub Date : 2007-07-01 DOI:10.21248/jlcl.22.2007.86
C. Sporleder
{"title":"篇章关系分类中人工与自动标记数据:实例和特征选择的影响","authors":"C. Sporleder","doi":"10.21248/jlcl.22.2007.86","DOIUrl":null,"url":null,"abstract":"Wordnets are lexical reference systems that follow the design principles of the Princeton WordNet project (Fellbaum, ). Domain ontologies (or domain-specific ontologies such as GOLD, or the GENE Ontology) represent knowledge about a specific domain in a format that supports automated reasoning about the objects in that domain and the relations between them (Erdmann, ). In this paper, we will discuss how the Web Ontology Language OWL can be used to represent and interrelate the entities and relations in both types of resources. Our special focus will be on the question, whether synsets should be modelled as individuals (we use individual and instance as synonyms and will refer to this option as instance model) or as classes (we will refer to this option as class model). We will present three OWL models, each of which offers different solutions to this question. These models were developed in the context of the research group “Text-technological Modelling of Information” as a collaboration of the projects SemDok and HyTex. Since these projects are mainly concerned with German documents and with corpora that contain documents of a special technical or scientific domain, we used subsets of the German wordnet GermaNet (Kunze and Lemnitzer, ), henceforth referred to as GN, and the German domain ontology TermNet (Beiswenger et al., ), henceforth referred to as TN, to develop and evaluate the three models. To relate the general vocabulary of GN with the domain specific terms in TN, we developed an approach that was inspired by the plug-in model proposed by Magnini and Speranza (). In this approach, which has been developed in cooperation with the GermaNet research group (see Kunze et al. () for details), we adapted the OWL model for the English Princeton WordNet suggested by van Assem et al. () to GN, i.e. we modelled German synsets as instances of word-class-specific synset classes. For the reasons explained in section , we wanted to experiment with alternative models that implement the class model. In section  we will present three alternative OWL representations for GN and TN and discuss their benefits and drawbacks.","PeriodicalId":346957,"journal":{"name":"LDV Forum","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2007-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"Manually vs. Automatically Labelled Data in Discourse Relation Classification: Effects of Example and Feature Selection\",\"authors\":\"C. Sporleder\",\"doi\":\"10.21248/jlcl.22.2007.86\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Wordnets are lexical reference systems that follow the design principles of the Princeton WordNet project (Fellbaum, ). Domain ontologies (or domain-specific ontologies such as GOLD, or the GENE Ontology) represent knowledge about a specific domain in a format that supports automated reasoning about the objects in that domain and the relations between them (Erdmann, ). In this paper, we will discuss how the Web Ontology Language OWL can be used to represent and interrelate the entities and relations in both types of resources. Our special focus will be on the question, whether synsets should be modelled as individuals (we use individual and instance as synonyms and will refer to this option as instance model) or as classes (we will refer to this option as class model). We will present three OWL models, each of which offers different solutions to this question. These models were developed in the context of the research group “Text-technological Modelling of Information” as a collaboration of the projects SemDok and HyTex. Since these projects are mainly concerned with German documents and with corpora that contain documents of a special technical or scientific domain, we used subsets of the German wordnet GermaNet (Kunze and Lemnitzer, ), henceforth referred to as GN, and the German domain ontology TermNet (Beiswenger et al., ), henceforth referred to as TN, to develop and evaluate the three models. To relate the general vocabulary of GN with the domain specific terms in TN, we developed an approach that was inspired by the plug-in model proposed by Magnini and Speranza (). In this approach, which has been developed in cooperation with the GermaNet research group (see Kunze et al. () for details), we adapted the OWL model for the English Princeton WordNet suggested by van Assem et al. () to GN, i.e. we modelled German synsets as instances of word-class-specific synset classes. For the reasons explained in section , we wanted to experiment with alternative models that implement the class model. In section  we will present three alternative OWL representations for GN and TN and discuss their benefits and drawbacks.\",\"PeriodicalId\":346957,\"journal\":{\"name\":\"LDV Forum\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2007-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"LDV Forum\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.21248/jlcl.22.2007.86\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"LDV Forum","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.21248/jlcl.22.2007.86","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8

摘要

Wordnets是遵循普林斯顿WordNet项目(Fellbaum,)设计原则的词汇参考系统。领域本体(或特定于领域的本体,如GOLD或GENE本体)以支持对该领域中的对象及其之间关系的自动推理的格式表示关于特定领域的知识(Erdmann,)。在本文中,我们将讨论如何使用Web本体语言OWL来表示和关联两种类型资源中的实体和关系。我们将特别关注这个问题,即是否应该将同义词集建模为个体(我们使用个体和实例作为同义词,并将此选项称为实例模型),还是应该将其建模为类(我们将此选项称为类模型)。我们将介绍三个OWL模型,每个模型都为这个问题提供了不同的解决方案。这些模型是在“信息的文本技术建模”研究小组的背景下开发的,是SemDok和HyTex项目的合作。由于这些项目主要关注德语文档和包含特殊技术或科学领域文档的语料库,我们使用德语wordnet GermaNet (Kunze和Lemnitzer,)的子集(以下简称GN)和德语领域本体TermNet (Beiswenger等人,)的子集(以下简称TN)来开发和评估这三个模型。为了将GN的通用词汇表与TN中的领域特定术语联系起来,我们开发了一种方法,该方法受到Magnini和Speranza()提出的插件模型的启发。在这种与GermaNet研究小组合作开发的方法中(详见Kunze等人()),我们将van Assem等人()建议的英语普林斯顿WordNet的OWL模型改编为GN,即我们将德语同义词集建模为特定于词类的同义词集类的实例。由于节中解释的原因,我们想用实现类模型的替代模型进行实验。在节中,我们将介绍GN和TN的三种OWL表示,并讨论它们的优点和缺点。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Manually vs. Automatically Labelled Data in Discourse Relation Classification: Effects of Example and Feature Selection
Wordnets are lexical reference systems that follow the design principles of the Princeton WordNet project (Fellbaum, ). Domain ontologies (or domain-specific ontologies such as GOLD, or the GENE Ontology) represent knowledge about a specific domain in a format that supports automated reasoning about the objects in that domain and the relations between them (Erdmann, ). In this paper, we will discuss how the Web Ontology Language OWL can be used to represent and interrelate the entities and relations in both types of resources. Our special focus will be on the question, whether synsets should be modelled as individuals (we use individual and instance as synonyms and will refer to this option as instance model) or as classes (we will refer to this option as class model). We will present three OWL models, each of which offers different solutions to this question. These models were developed in the context of the research group “Text-technological Modelling of Information” as a collaboration of the projects SemDok and HyTex. Since these projects are mainly concerned with German documents and with corpora that contain documents of a special technical or scientific domain, we used subsets of the German wordnet GermaNet (Kunze and Lemnitzer, ), henceforth referred to as GN, and the German domain ontology TermNet (Beiswenger et al., ), henceforth referred to as TN, to develop and evaluate the three models. To relate the general vocabulary of GN with the domain specific terms in TN, we developed an approach that was inspired by the plug-in model proposed by Magnini and Speranza (). In this approach, which has been developed in cooperation with the GermaNet research group (see Kunze et al. () for details), we adapted the OWL model for the English Princeton WordNet suggested by van Assem et al. () to GN, i.e. we modelled German synsets as instances of word-class-specific synset classes. For the reasons explained in section , we wanted to experiment with alternative models that implement the class model. In section  we will present three alternative OWL representations for GN and TN and discuss their benefits and drawbacks.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信