{"title":"Manually vs. Automatically Labelled Data in Discourse Relation Classification: Effects of Example and Feature Selection","authors":"C. Sporleder","doi":"10.21248/jlcl.22.2007.86","DOIUrl":null,"url":null,"abstract":"Wordnets are lexical reference systems that follow the design principles of the Princeton WordNet project (Fellbaum, ). Domain ontologies (or domain-specific ontologies such as GOLD, or the GENE Ontology) represent knowledge about a specific domain in a format that supports automated reasoning about the objects in that domain and the relations between them (Erdmann, ). In this paper, we will discuss how the Web Ontology Language OWL can be used to represent and interrelate the entities and relations in both types of resources. Our special focus will be on the question, whether synsets should be modelled as individuals (we use individual and instance as synonyms and will refer to this option as instance model) or as classes (we will refer to this option as class model). We will present three OWL models, each of which offers different solutions to this question. These models were developed in the context of the research group “Text-technological Modelling of Information” as a collaboration of the projects SemDok and HyTex. Since these projects are mainly concerned with German documents and with corpora that contain documents of a special technical or scientific domain, we used subsets of the German wordnet GermaNet (Kunze and Lemnitzer, ), henceforth referred to as GN, and the German domain ontology TermNet (Beiswenger et al., ), henceforth referred to as TN, to develop and evaluate the three models. To relate the general vocabulary of GN with the domain specific terms in TN, we developed an approach that was inspired by the plug-in model proposed by Magnini and Speranza (). In this approach, which has been developed in cooperation with the GermaNet research group (see Kunze et al. () for details), we adapted the OWL model for the English Princeton WordNet suggested by van Assem et al. () to GN, i.e. we modelled German synsets as instances of word-class-specific synset classes. For the reasons explained in section , we wanted to experiment with alternative models that implement the class model. In section we will present three alternative OWL representations for GN and TN and discuss their benefits and drawbacks.","PeriodicalId":346957,"journal":{"name":"LDV Forum","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2007-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"LDV Forum","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.21248/jlcl.22.2007.86","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8
Abstract
Wordnets are lexical reference systems that follow the design principles of the Princeton WordNet project (Fellbaum, ). Domain ontologies (or domain-specific ontologies such as GOLD, or the GENE Ontology) represent knowledge about a specific domain in a format that supports automated reasoning about the objects in that domain and the relations between them (Erdmann, ). In this paper, we will discuss how the Web Ontology Language OWL can be used to represent and interrelate the entities and relations in both types of resources. Our special focus will be on the question, whether synsets should be modelled as individuals (we use individual and instance as synonyms and will refer to this option as instance model) or as classes (we will refer to this option as class model). We will present three OWL models, each of which offers different solutions to this question. These models were developed in the context of the research group “Text-technological Modelling of Information” as a collaboration of the projects SemDok and HyTex. Since these projects are mainly concerned with German documents and with corpora that contain documents of a special technical or scientific domain, we used subsets of the German wordnet GermaNet (Kunze and Lemnitzer, ), henceforth referred to as GN, and the German domain ontology TermNet (Beiswenger et al., ), henceforth referred to as TN, to develop and evaluate the three models. To relate the general vocabulary of GN with the domain specific terms in TN, we developed an approach that was inspired by the plug-in model proposed by Magnini and Speranza (). In this approach, which has been developed in cooperation with the GermaNet research group (see Kunze et al. () for details), we adapted the OWL model for the English Princeton WordNet suggested by van Assem et al. () to GN, i.e. we modelled German synsets as instances of word-class-specific synset classes. For the reasons explained in section , we wanted to experiment with alternative models that implement the class model. In section we will present three alternative OWL representations for GN and TN and discuss their benefits and drawbacks.