LDV ForumPub Date : 2007-07-01DOI: 10.21248/jlcl.22.2007.92
H. Lüngen, Angelika Storrer
{"title":"Domain ontologies and wordnets in OWL: Modelling options","authors":"H. Lüngen, Angelika Storrer","doi":"10.21248/jlcl.22.2007.92","DOIUrl":"https://doi.org/10.21248/jlcl.22.2007.92","url":null,"abstract":"Word nets are lexical reference systems that follow the design principles of the Princeton WordNet project (Fellbaum 1998, henceforth referred to as PWN1). Domain ontologies (or domain-specific ontologies, e.g. GOLD2 or the GENE Ontology3) represent knowledge about a specific domain in a format that supports automated reasoning about the objects in that domain and the relations between them (cf. Erdmann 2001, 78). Word nets have been used in various applications of text processing, e.g. discourse parsing, lexical and thematic chaining, cohesion analyses, automatic segmentation and linking, anaphora resolution, and information extraction. When these applications process documents dealing with a specific domain, one needs to combine knowlegde about the domain-specific vocabulary represented in domain ontologies with lexical repositories representing general vocabulary (like PWN). In this context, it is useful to represent and interrelate the entities and relations in both types of resources using a common representation language. In our research group “Text-technological Information Modelling4” we chose OWL as a common format for this purpose. Since our projects are mainly concerned with German documents, we developed an OWL model that relates the German wordnet GermaNet (henceforth referred to as GN)5 with domain-specific ontologies in an approach that was inspired by the Plug-In model proposed in Magnini/Speranza (2002). Our approach is decribed in Kunze et al. (to appear); it was evaluated using representative subsets of GN and of the domain ontology TermNet6 (henceforth referred to as TN) as data and Protégé","PeriodicalId":346957,"journal":{"name":"LDV Forum","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2007-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116352209","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
LDV ForumPub Date : 2007-07-01DOI: 10.21248/jlcl.22.2007.87
Edda Leopold, J. Kindermann, G. Paass
{"title":"Analysis of E-Discussions Using Classifier Induced Semantic Spaces","authors":"Edda Leopold, J. Kindermann, G. Paass","doi":"10.21248/jlcl.22.2007.87","DOIUrl":"https://doi.org/10.21248/jlcl.22.2007.87","url":null,"abstract":"We categorise contributions to an e-discussion platform using Classifier Induced Semantic Spaces and Self-Organising Maps. Analysing the contributions delivers insight into the nature of the communication process, makes it more comprehensible and renders the resulting decisions more transparent. Additionally, it can serve as a basis to monitor how the structure of the communication evolves over time. We evaluate our approach on a public ediscussion about an urban planning project, the Berlin Alexanderplatz, Germany. The proposed technique does not only produce high-level-features relevant to structure and monitor computer mediated communication, but also provides insight into how typical a particular document is for a specific category.","PeriodicalId":346957,"journal":{"name":"LDV Forum","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2007-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121208046","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
LDV ForumPub Date : 2007-07-01DOI: 10.21248/jlcl.22.2007.95
Alexander Mehler, Peter Geibel, O. Pustylnikov
{"title":"Structural Classifiers of Text Types: Towards a Novel Model of Text Representation","authors":"Alexander Mehler, Peter Geibel, O. Pustylnikov","doi":"10.21248/jlcl.22.2007.95","DOIUrl":"https://doi.org/10.21248/jlcl.22.2007.95","url":null,"abstract":"Texts can be distinguished in terms of their content, function, structure or layout (Brinker, 1992; Bateman et al., 2001; Joachims, 2002; Power et al., 2003). These reference points do not open necessarily orthogonal perspectives on text classification. As part of explorative data analysis, text classification aims at automatically dividing sets of textual objects into classes of maximum internal homogeneity and external heterogeneity. This paper deals with classifying texts into text types whose instances serve more or less homogeneous functions. Other than mainstream approaches, which rely on the vector space model (Sebastiani, 2002) or some of its descendants (Baeza-Yates and Ribeiro-Neto, 1999) and, thus, on content-related lexical features, we solely refer to structural dierentiae. That is, we explore patterns of text structure as determinants of class membership. Our starting point are tree-like text representations which induce feature vectors and tree kernels. These kernels are utilized in supervised learning based on cross-validation as a method of model selection (Hastie et al., 2001) by example of a corpus of press communication. For a subset of categories we show that classification can be performed very well by structural dierentia only.","PeriodicalId":346957,"journal":{"name":"LDV Forum","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2007-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128065404","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
LDV ForumPub Date : 2007-07-01DOI: 10.21248/jlcl.22.2007.96
J. Michaelis, Uwe Mönnich
{"title":"Towards a Logical Description of Trees in Annotation Graphs","authors":"J. Michaelis, Uwe Mönnich","doi":"10.21248/jlcl.22.2007.96","DOIUrl":"https://doi.org/10.21248/jlcl.22.2007.96","url":null,"abstract":"It is a matter of fact that a long history in artificial intelligence and computational linguistics tries to develop tools to extract semantic knowledge from syntactic information. In particular, from a text technological point of view the general research perspective is to extract (semantic) information from annotated documents. Regarding this aim, some of the relevant annotation models used are:","PeriodicalId":346957,"journal":{"name":"LDV Forum","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2007-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128116242","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
LDV ForumPub Date : 2007-07-01DOI: 10.21248/jlcl.22.2007.86
C. Sporleder
{"title":"Manually vs. Automatically Labelled Data in Discourse Relation Classification: Effects of Example and Feature Selection","authors":"C. Sporleder","doi":"10.21248/jlcl.22.2007.86","DOIUrl":"https://doi.org/10.21248/jlcl.22.2007.86","url":null,"abstract":"Wordnets are lexical reference systems that follow the design principles of the Princeton WordNet project (Fellbaum, ). Domain ontologies (or domain-specific ontologies such as GOLD, or the GENE Ontology) represent knowledge about a specific domain in a format that supports automated reasoning about the objects in that domain and the relations between them (Erdmann, ). In this paper, we will discuss how the Web Ontology Language OWL can be used to represent and interrelate the entities and relations in both types of resources. Our special focus will be on the question, whether synsets should be modelled as individuals (we use individual and instance as synonyms and will refer to this option as instance model) or as classes (we will refer to this option as class model). We will present three OWL models, each of which offers different solutions to this question. These models were developed in the context of the research group “Text-technological Modelling of Information” as a collaboration of the projects SemDok and HyTex. Since these projects are mainly concerned with German documents and with corpora that contain documents of a special technical or scientific domain, we used subsets of the German wordnet GermaNet (Kunze and Lemnitzer, ), henceforth referred to as GN, and the German domain ontology TermNet (Beiswenger et al., ), henceforth referred to as TN, to develop and evaluate the three models. To relate the general vocabulary of GN with the domain specific terms in TN, we developed an approach that was inspired by the plug-in model proposed by Magnini and Speranza (). In this approach, which has been developed in cooperation with the GermaNet research group (see Kunze et al. () for details), we adapted the OWL model for the English Princeton WordNet suggested by van Assem et al. () to GN, i.e. we modelled German synsets as instances of word-class-specific synset classes. For the reasons explained in section , we wanted to experiment with alternative models that implement the class model. In section we will present three alternative OWL representations for GN and TN and discuss their benefits and drawbacks.","PeriodicalId":346957,"journal":{"name":"LDV Forum","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2007-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129781474","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
LDV ForumPub Date : 2007-07-01DOI: 10.21248/jlcl.22.2007.89
Franziskus Geeb
{"title":"Chatbots in der praktischen Fachlexikographie / Terminologie","authors":"Franziskus Geeb","doi":"10.21248/jlcl.22.2007.89","DOIUrl":"https://doi.org/10.21248/jlcl.22.2007.89","url":null,"abstract":"Chatkommunikation im Sinne eines interaktiven, textbasierten Gesprächs von Internetnutzern als Teil des Internets ist in verschiedenen Benutzungszusammenhängen und für verschiedenste Anwendungen von Marketing bis Freizeit belegt. Als Chatpartner kommen neben anderen Internetnutzern aber auch Computer in Betracht, und auch diese Kommunikationsform ist sowohl in der Wirtschaft als auch im Privatgebrauch bekannt. Der Erfolg eines Chatroboters begründet sich dabei wesentlich in seiner Fähigkeit, einen Dialog mit dem Chatpartner zu führen und sinnvolle Aussagen zu machen. Als Wissensbasis für diese Kommunikation ist neben regelbasierten Verfahren auch ein Rückgriff auf fachlexikographische / terminologische Daten denkbar – nicht zuletzt in einer Fachkommunikation. Der vorliegende Beitrag versucht diese Problematik einzugrenzen und konzipiert Randbedingungen einer möglichen Umsetzung.","PeriodicalId":346957,"journal":{"name":"LDV Forum","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2007-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114006880","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
LDV ForumPub Date : 2006-07-01DOI: 10.21248/jlcl.21.2006.79
W. Zenk
{"title":"UniTerm - Formats and Terminology Exchange","authors":"W. Zenk","doi":"10.21248/jlcl.21.2006.79","DOIUrl":"https://doi.org/10.21248/jlcl.21.2006.79","url":null,"abstract":"LDV FORUM – Band 21(1) – 2006 Abstract Th is article presents UniTerm, a typical representative of terminology management systems (TMS). Th e fi rst part will highlight common characteristics of TMS and give further insight into the UniTerm entry format and database design. Practise has shown that automatic, i.e. blind exchange of terminologies is diffi cult to achieve. Th e second section gives criteria where the exchange between diff erent TMS can fail and points out the relationship between the UniTerm like TMS data formats and existing terminology standards. Finally, it will be discussed what requirements have to be met in order to enable a deeper integration of terminology standards in a TMS and thus also a smoother transition between diff erent TMS. Th ese requirements are evaluated with Acolada s next generation TMS UniTerm Enterprise.","PeriodicalId":346957,"journal":{"name":"LDV Forum","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2006-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114605350","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
LDV ForumPub Date : 2006-07-01DOI: 10.21248/jlcl.21.2006.80
Stefanie Geldbach
{"title":"Lexicon Exchange in MT - The Long Way to Standardization","authors":"Stefanie Geldbach","doi":"10.21248/jlcl.21.2006.80","DOIUrl":"https://doi.org/10.21248/jlcl.21.2006.80","url":null,"abstract":"LDV FORUM – Band 21(1) – 2006 Abstract Th is paper discusses the question to what extent lexicon exchange in MT has been standardized during the last years. Th e introductory section is followed by a brief description of OLIF2, a format specifi cally designed for the exchange of terminological and lexicographical data (Section 2). Section 3 contains an overview of the import/ export functionalities of fi ve MT systems (Promt Expert 7.0, Systran 5.0 Professional Premium, Translate pro 8.0, LexShop 2.2, OpenLogos). Th is evaluation shows that despite the standardization eff orts of the last years the exchange of lexicographical data between MT systems is still not a straightforward task.","PeriodicalId":346957,"journal":{"name":"LDV Forum","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2006-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127263962","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
LDV ForumPub Date : 2006-07-01DOI: 10.21248/jlcl.21.2006.78
Uta Seewald-Heeg
{"title":"Terminology Exchange without Loss? Feasibilities and Limitations of Terminology Management Systems (TMS)","authors":"Uta Seewald-Heeg","doi":"10.21248/jlcl.21.2006.78","DOIUrl":"https://doi.org/10.21248/jlcl.21.2006.78","url":null,"abstract":"LDV FORUM – Band 21(1) – 2006 Abstract Th e present article gives an overview over exchange formats supported by Terminology Management Systems (TMS) available on the market. As translation is one of the eldest application domains for terminology work, most terminology tools analyzed here are components of computer-aided translation (CAT) tools. In big corporates as well as in the localization industry, linguistic data, fi rst of all terminology, have to be shared by diff erent departments using diff erent systems, a situation that can be best solved by standardized formats. Th e evaluation of seven widely used TMS shows, however, that formats other than the standards proposed by organizations like LISA currently dominate the picture. In many cases, the only way to share data is to pass through fl at structured data stored as tab-delimited text fi les.","PeriodicalId":346957,"journal":{"name":"LDV Forum","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2006-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115088965","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
LDV ForumPub Date : 2006-07-01DOI: 10.21248/jlcl.21.2006.83
Georg Heeg
{"title":"Flexible Technologies to Visualize and Transform Terminological Representations Modelling Representations instead of Programming using Smalltalk","authors":"Georg Heeg","doi":"10.21248/jlcl.21.2006.83","DOIUrl":"https://doi.org/10.21248/jlcl.21.2006.83","url":null,"abstract":"LDV FORUM Abstract Th is paper discusses a software design approach to allow interchange of linguistic data. It focuses on the modelling of the linguistic concepts represented in the data and describes the transfer between exchange formats as a multi-tier interpretation/generation. Th ese concepts are implemented in Smalltalk, a programming environment enabling fl exible conversion of data between formats supported by Terminology Management Systems (TMS).","PeriodicalId":346957,"journal":{"name":"LDV Forum","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2006-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129229804","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}