LDV ForumPub Date : 2022-12-05DOI: 10.21248/jlcl.20.2005.74
Emmerich Kelih, Peter Grzybek
{"title":"Satzlänge: Definitionen, Häufigkeiten, Modelle (Am Beispiel slowenischer Prosatexte)","authors":"Emmerich Kelih, Peter Grzybek","doi":"10.21248/jlcl.20.2005.74","DOIUrl":"https://doi.org/10.21248/jlcl.20.2005.74","url":null,"abstract":"Die vorliegende Untersuchung versteht sich als ein Beitrag zur Satzlängenforschung. Nach einleitender Darstellung der Analysemöglichkeiten auf der Ebene der Satzlängen, geht es hauptsächlich um die Diskussion der Anwendung von unterschiedlichen Satzdefinitionen. Auf der Basis eines Korpus slowenischer Texte wird der Frage nachgegangen,welchen Einfluss die Anwendung unterschiedlicher (durchaus üblicher) Satzdefinitionenauf (a) deskriptive Kenngrößen der Häufigkeitsverteilung hat, und (b) inwiefern davondie Adäquatheit und Güte theoretischer Verteilungsmodelle abhängt.","PeriodicalId":346957,"journal":{"name":"LDV Forum","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115326043","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
LDV ForumPub Date : 2008-07-01DOI: 10.21248/jlcl.23.2008.100
A. Kumaran, R. Makin, Vijay Pattisapu, Shaik Sharif, Lucy Vanderwende
{"title":"Evaluating the Quality of Automatically Extracted Synonymy Information","authors":"A. Kumaran, R. Makin, Vijay Pattisapu, Shaik Sharif, Lucy Vanderwende","doi":"10.21248/jlcl.23.2008.100","DOIUrl":"https://doi.org/10.21248/jlcl.23.2008.100","url":null,"abstract":"Automatic extraction of semantic information, if successful, offers to languages with little or poor resources, the prospects of creating ontological resources inexpensively, thus providing support for common-sense reasoning applications in those languages. In this paper we explore the automatic extraction of synonymy information from large corpora using two complementary techniques: a generic broad-coverage parser for generation of bits of semantic information, and their synthesis into sets of synonyms using automatic sense-disambiguation. To validate the quality of the synonymy information thus extracted, we experiment with English, where appropriate semantic resources are already available. We cull synonymy information from a large corpus and compare it against synonymy information available in several standard sources. We present the results of our methodology, both quantitatively and qualitatively, that indicate good quality synonymy information may be extracted automatically from large corpora using the proposed methodology.","PeriodicalId":346957,"journal":{"name":"LDV Forum","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2008-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122872030","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
LDV ForumPub Date : 2008-07-01DOI: 10.21248/jlcl.23.2008.99
Maja Bärenfänger, M. Hilbert, Henning Lobin, H. Lüngen
{"title":"OWL ontologies as a resource for discourse parsing","authors":"Maja Bärenfänger, M. Hilbert, Henning Lobin, H. Lüngen","doi":"10.21248/jlcl.23.2008.99","DOIUrl":"https://doi.org/10.21248/jlcl.23.2008.99","url":null,"abstract":"In the project SemDok (Generic document structures in linearly organised texts) funded by the German Research Foundation DFG, a discourse parser for a complex type (scientific articles by example), is being developed. Discourse parsing (henceforth DP) according to the Rhetorical Structure Theory (RST) (Mann and Taboada, 2005; Marcu, 2000) deals with automatically assigning a text a tree structure in which discourse segments and rhetorical relations between them are marked, such as Concession. For identifying the combinable segments, declarative rules are employed, which describe linguistic and structural cues and constraints about possible combinations by referring to different XML annotation layers of the input text, and external knowledge bases such as a discourse marker lexicon, a lexico-semantic ontology (later to be combined with a domain ontology), and an ontology of rhetorical relations. In our text-technological environment, the obvious choice of formalism to represent such ontologies is OWL (Smith et al., 2004). In this paper, we describe two OWL ontologies and how they are consulted from the discourse parser to solve certain tasks within DP. The first ontology is a taxononomy of rhetorical relations which was developed in the project. The second one is an OWL version of GermaNet, the model of which we designed together with our project partners.","PeriodicalId":346957,"journal":{"name":"LDV Forum","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2008-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126240261","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
LDV ForumPub Date : 2008-07-01DOI: 10.21248/jlcl.23.2008.102
Pablo Gamallo, J. Lopes, Alexandre Agustini
{"title":"Automatic Acquisition of Formal Concepts from Text","authors":"Pablo Gamallo, J. Lopes, Alexandre Agustini","doi":"10.21248/jlcl.23.2008.102","DOIUrl":"https://doi.org/10.21248/jlcl.23.2008.102","url":null,"abstract":"This paper describes an unsupervised method for extracting concepts from Part-Of-Speech annotated corpora. The method consists in building bidimensional clusters of both words and their lexico-syntactic contexts. The method is based on Formal Concept Analysis (FCA). Each generated cluster is defined as a formal concept with a set of words describing the extension of the concept and a set of contexts perceived as the intensional attributes (or properties) valid for all the words in the extension. The clustering process relies on two concept operations: abstraction and specification. The former allows us to build a more generic concept by intersecting the intensions of the merged concepts and making the union of their extensions. By contrast, specification makes the union of the intensions and intersects the extensions. The result is a concept lattice that describes the domain-specific ontology underlying the training corpus.","PeriodicalId":346957,"journal":{"name":"LDV Forum","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2008-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132536980","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A hybrid approach to resolve nominal anaphora","authors":"Daniela Goecke, Maik Stührenberg, Tonio Wandmacher","doi":"10.21248/jlcl.23.2008.101","DOIUrl":"https://doi.org/10.21248/jlcl.23.2008.101","url":null,"abstract":"In order to resolve nominal anaphora, especially definite description anaphora, various sources of information have to be taken into account. These range from morphosyntactic information to domain knowledge encoded in ontologies. As the acquisition of ontological knowledge is a timeconsuming task, existing resources often model only a small set of information. This leads to a knowledge gap that has to be closed: We present a hybrid approach that combines several knowledge sources in order to resolve definite descriptions.1","PeriodicalId":346957,"journal":{"name":"LDV Forum","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2008-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121964887","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
LDV ForumPub Date : 2008-07-01DOI: 10.21248/jlcl.23.2008.98
C. Chiarcos
{"title":"An ontology of linguistic annotations","authors":"C. Chiarcos","doi":"10.21248/jlcl.23.2008.98","DOIUrl":"https://doi.org/10.21248/jlcl.23.2008.98","url":null,"abstract":"This paper describes development and design of an ontology of linguistic annotations, primarily word classes and morphosyntactic features, based on existing standardization approaches (e.g. EAGLES), a set of annotation schemes (e.g. for German, STTS and morphological annotations), and existing terminological resources (e.g. GOLD). The ontology is intended to be a platform for terminological integration, integrated representation and ontology-based search across existing linguistic resources with terminologically heterogeneous annotations. Further, it can be applied to augment the semantic analysis of a given text with an ontological interpretation of its morphosyntactic analysis.","PeriodicalId":346957,"journal":{"name":"LDV Forum","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2008-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128341041","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
LDV ForumPub Date : 2007-07-01DOI: 10.21248/jlcl.22.2007.94
Eduardo Torres Schumann, Uwe Mönnich, K. Schulz
{"title":"Integration Languages for Data-Driven Approaches to Ontology Population and Maintenance","authors":"Eduardo Torres Schumann, Uwe Mönnich, K. Schulz","doi":"10.21248/jlcl.22.2007.94","DOIUrl":"https://doi.org/10.21248/jlcl.22.2007.94","url":null,"abstract":"Populating an ontology with a vast amount of data and ensuring the quality of the integration process by means of human supervision seem to be mutually exclusive goals that nevertheless arise as requirements when building practical applications. In our case, we were confronted with the practical problem of populating the EFGT Net, a large-scale ontology that enables thematic reasoning in dierent NLP applications, out of already existing and partly very large data sources, but on condition of not putting the quality of the resource at risk. We present here our particular solution to this problem, which combines, in a single tool, on one hand an integration language capable of generating new entries for the ontology out of structured data with, on the other hand, a visualization of conflicting generated entries with online ontology editing facilities. This approach appears to enable ecient human supervision of the population process in an interactive way and to be also useful for maintenance tasks.","PeriodicalId":346957,"journal":{"name":"LDV Forum","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2007-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125098131","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
LDV ForumPub Date : 2007-07-01DOI: 10.21248/jlcl.22.2007.90
Xiaofei Lu
{"title":"A Hybrid Model for Chinese Word Segmentation","authors":"Xiaofei Lu","doi":"10.21248/jlcl.22.2007.90","DOIUrl":"https://doi.org/10.21248/jlcl.22.2007.90","url":null,"abstract":"This paper describes a hybrid model that combines machine learning with linguistic and statistical heuristics for integrating unknown word identification with Chinese word segmentation. The model consists of two major components: a tagging component that annotates each character in a Chinese sentence with a position-of-character (POC) tag that indicates its position in a word, and a merging component that transforms a POC-tagged character sequence into a word-segmented sentence. The tagging component uses a support vector machine (Vapnik, 1995) based tagger to produce an initial tagging of the text and a transformation-based tagger (Brill, 1995) to improve the initial tagging. In addition to the POC tags assigned to the characters, the merging component incorporates a number of linguistic and statistical heuristics to detect words with regular internal structures, recognize long words, and filter non-words. Experiments show that, without resorting to a separate unknown word identification mechanism, the model achieves an F-score of 95.0% for word segmentation and a competitive recall of 74.8% for unknown word identification.","PeriodicalId":346957,"journal":{"name":"LDV Forum","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2007-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114649226","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
LDV ForumPub Date : 2007-07-01DOI: 10.21248/jlcl.22.2007.88
Bayan Abu Shawar, E. Atwell
{"title":"Chatbots: Are they Really Useful?","authors":"Bayan Abu Shawar, E. Atwell","doi":"10.21248/jlcl.22.2007.88","DOIUrl":"https://doi.org/10.21248/jlcl.22.2007.88","url":null,"abstract":"Chatbots are computer programs that interact with users using natural lan- guages. This technology started in the 1960’s; the aim was to see if chatbot systems could fool users that they were real humans. However, chatbot sys- tems are not only built to mimic human conversation, and entertain users. In this paper, we investigate other applications where chatbots could be useful such as education, information retrival, business, and e-commerce. A range of chatbots with useful applications, including several based on the ALICE/AIML architecture, are presented in this paper.","PeriodicalId":346957,"journal":{"name":"LDV Forum","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2007-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125757889","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
LDV ForumPub Date : 2007-07-01DOI: 10.21248/jlcl.22.2007.93
Ekaterina Ovchinnikova, Kai-Uwe Kühnberger
{"title":"Automatic Ontology Extension: Resolving Inconsistencies","authors":"Ekaterina Ovchinnikova, Kai-Uwe Kühnberger","doi":"10.21248/jlcl.22.2007.93","DOIUrl":"https://doi.org/10.21248/jlcl.22.2007.93","url":null,"abstract":"Ontologies are widely used in text technology and artificial intelligence. The need to develop large ontologies for real-life applications provokes researchers to automatize ontology extension procedures. Automatic updates without the control of a human expert can generate potential conflicts between original and new knowledge resulting in inconsistencies occurring in the ontology. We propose an algorithm that models the process of the adaptation of an ontology to new information. 1 Automatic Ontology Extension There is an increasing interest in applying ontological knowledge in text technologies and artificial intelligence. Since the manual development of large ontologies proved to be a time-consuming task many current investigations are devoted to automatic ontology learning methods (see [6] for an overview). Several formalisms have been proposed to represent ontological knowledge. Probably the most important one of the existing markup languages for ontology design is the Web Ontology Language (OWL) based on the logical formalism called Description Logics (DL) [1]. In particular, description logics were designed for the representation of terminological knowledge and reasoning processes. Although most of the tools extracting or extending ontologies automatically output knowledge in the OWL-format, they usually use only a small subset of DL. The core ontologies generated in practice usually contain the subsumption relation defined on concepts (taxonomy) and general relations (such as part-of and others). At present complex ontologies making use of the whole expressive power and advances of the various versions of DLs can be achieved only manually or semi-automatically. However, several approaches appeared recently tending not only to learn taxonomic and general relations but also state which concepts in the knowledge base are equivalent or disjoint [5]. In the present paper, we concentrate on these approaches. We will consider only terminological knowledge (called TBox in DL) leaving the information about assertions in the knowledge base (called ABox in DL) for further investigations. 3 See the documentation at http://www.w3.org/TR/owl-features/","PeriodicalId":346957,"journal":{"name":"LDV Forum","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2007-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126754417","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}