{"title":"Maschinelle Übersetzung - ein Überblick","authors":"D. Stein","doi":"10.21248/jlcl.24.2009.119","DOIUrl":"https://doi.org/10.21248/jlcl.24.2009.119","url":null,"abstract":"Die Idee der formalen Manipulation von Sprachen geht auf die philosophischen Traditionen von Geheim- und Universalsprachen, wie sie Ramon Llull oder Gottfried Wilhelm Leibniz begrundet haben, zuruck. Bis heute ist die Maschinelle Ubersetzung (MU) Konigsdisziplin der Sprachverarbeitung geblieben: Die Fortschritte seit den ersten praktischen Versuchen sind auf den ersten Blick nur bescheiden. Dabei haben sich im Verlauf der Jahrzehnte zahlreiche unterschiedliche Ansatze zur MU gebildet. Nach einer von linguistischen Theorien dominierten Phase stehen seit Beginn der 1990er Jahre wiederentdeckte mathematische Methoden im Vordergrund. Im vorliegenden Beitrag werden die wichtigsten Ansatze eingebettet in ihren historischen Kontext vorgestellt. Besonderes Augenmerk gilt dabei dem regelbasierten und dem statistischen Ansatz.","PeriodicalId":402489,"journal":{"name":"J. Lang. Technol. Comput. Linguistics","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121278060","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The Evolution of Genre in Wikipedia","authors":"Malcolm Clark, I. Ruthven, P. O. Holt","doi":"10.21248/jlcl.24.2009.111","DOIUrl":"https://doi.org/10.21248/jlcl.24.2009.111","url":null,"abstract":"This paper presents an overview of the ways in which genres, or structural forms, develop in a community of practice, in this case, Wikipedia. Firstly, we collected data by performing a small search task in the Wikipedia search engine (powered by Lucene) to locate articles related to global car manufacturers, for example, British Leyland, Ferrari and General Motors. We also searched for typical biographical articles about notable people, such as Spike Milligan, Alex Ferguson, Nelson Mandela and Karl Marx. An examination of the data thus obtained revealed that these articles have particular forms and that some genres connect to each other and evolve, merge and overlap. We then looked at the ways in which the purpose and form of a biographical article have evolved over six years within this community. We concluded the work with a discussion on the usefulness of Wikipedia as a vehicle for such genre investigations. This small analysis has allowed us to start generating a number of detailed research questions as to how forms may act as descriptors of genre and to discuss plans for experimental work aimed at answering these questions.","PeriodicalId":402489,"journal":{"name":"J. Lang. Technol. Comput. Linguistics","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128793612","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Model of a Teacher Assisting Feedback Tool for Marking Free Worded Exercise Solutions","authors":"S. Ruda","doi":"10.21248/jlcl.23.2008.109","DOIUrl":"https://doi.org/10.21248/jlcl.23.2008.109","url":null,"abstract":"Free worded exercise solutions involve the advantage that students entirely have to resort to their own knowledge. However, their disadvantage is that they cannot be corrected automatically – in contrast to exercises with preset solution possibilities, e.g. multiple choices. This paper outlines these two types of exercise solutions, the methods and some results of a pragmalinguistic analysis of exercise types and their solutions as well as teachers’ correction actions of free worded exercise solutions. Afterwards a prototype of a feedback tool model, based on this study and assisting teachers’ correction actions, will be briefly introduced.","PeriodicalId":402489,"journal":{"name":"J. Lang. Technol. Comput. Linguistics","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116088998","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Lexical Models to Identify Unmarked Discourse Relations: Does WordNet help?","authors":"C. Sporleder","doi":"10.21248/jlcl.23.2008.105","DOIUrl":"https://doi.org/10.21248/jlcl.23.2008.105","url":null,"abstract":"In this paper, we address the task of automatically determining which discourse relation holds between two text spans. We focus on relations that are not explicitly signalled by a discourse marker like but. While lexical models have been found useful for the task, they are also prone to data sparseness problems, which is a big drawback given the scarcity of discourse annotated data. We therefore investigate whether the use of lexical-semantic resources, such as WordNet, can be exploited to back-off to a more general representation of lexical information in cases were data are sparse. We compare such a semantic back-off strategy to morphological generalisations over word forms, such as stemming and lemmatising.","PeriodicalId":402489,"journal":{"name":"J. Lang. Technol. Comput. Linguistics","volume":"102 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133800335","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nils Diewald, Maik Stührenberg, Anna Garbar, Daniela Goecke
{"title":"Serengeti - Webbasierte Annotation semantischer Relationen","authors":"Nils Diewald, Maik Stührenberg, Anna Garbar, Daniela Goecke","doi":"10.21248/jlcl.23.2008.108","DOIUrl":"https://doi.org/10.21248/jlcl.23.2008.108","url":null,"abstract":"Der Artikel stellt zum einen ein Annotationsschema für semantische Relationen vor, das für die Beschreibung eines deutschsprachigen Korpus für Training und Evaluation eines Systems zur Anaphernauflösung entwickelt wurde, zum anderen wird das webbasierte Annotationstool Serengeti beschrieben, das zur Annotation anaphorischer Relationen im Projekt A „Sekimo“ eingesetzt wird. Im Gegensatz zu anderen Annotationstools benötigt Serengeti keine lokale Installation, was den Einsatz an einer großen Anzahl von Rechnern erleichtert. Darüber hinaus implementiert Serengeti ein Mehrbenutzerkonzept, das sowohl Gruppen als auch einzelne Nutzer unterstützt und zugehörige Dateien und Annotationen verwaltet.","PeriodicalId":402489,"journal":{"name":"J. Lang. Technol. Comput. Linguistics","volume":"92 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127854494","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Irene M. Cramer, Marc Finthammer, Alexander Kurek, L. Sowa, Melina Wachtling, Tobias Claas
{"title":"Experiments on Lexical Chaining for German Corpora: Annotation, Extraction, and Application","authors":"Irene M. Cramer, Marc Finthammer, Alexander Kurek, L. Sowa, Melina Wachtling, Tobias Claas","doi":"10.21248/jlcl.23.2008.106","DOIUrl":"https://doi.org/10.21248/jlcl.23.2008.106","url":null,"abstract":"Converting linear text documents into documents publishable in a hypertext environment is a complex task requiring methods for segmentation, reorganization, and linking. The HyTex project, funded by the German Research Foundation (DFG), aims at the development of conversion strategies based on text-grammatical features. One focus of our work is on topic-based linking strategies using lexical chains, which can be regarded as partial text representations and form the basis of calculating topic views, an example of which is shown in Figure 1. This paper discusses the development of our lexical chainer, called GLexi, as well as several experiments on two aspects: Firstly, the manual annotation of lexical chains in German corpora of specialized text; secondly, the construction of topic views. The principle of lexical chaining is based on the concept of lexical cohesion as described by Halliday and Hasan (1976). Morris and Hirst (1991) as well as Hirst and St-Onge (1998) developed a method of automatically calculating lexical chains by drawing on a thesaurus or word net. This method employs information on semantic relations between pairs of words as a connector, i.e. classical lexical semantic relations such as synonymy and hypernymy as well as complex combinations of these. Typically, the relations are calculated using a lexical semantic resource such as Princeton WordNet (e.g. Hirst and St-Onge (1998)), Roget’s thesaurus (e.g. Morris and Hirst (1991)) or GermaNet (e.g. Mehler (2005) as well as Gurevych and Nahnsen (2005)). Hitherto, lexical chains have been successfully employed for various NLP-applications, such as text summarization (e.g. Barzilay and Elhadad (1997)), malapropism recognition (e.g. Hirst and St-Onge (1998)), automatic hyperlink generation (e.g. Green (1999)), question answering (e.g. Novischi and Moldovan (2006)), topic detection/topic tracking (e.g. Carthy (2004)). In order to formally evaluate the performance of a lexical chaining system in terms of precision and recall, a (preferably standardized and freely available) test set would be required. To our knowledge such a resource does not yet exist–neither for English nor for German. Therefore, we conducted several annotation experiments, which we intended to use for the evaluation of GLexi. These experiments are summarized in Section 2 . The findings derived from our annotation experiments also led us to developing the highly modularized system architecture, shown in Figure 4, which provides interfaces in order to be able to integrate different pre-processing steps, semantic relatedness measures, resources and modules for the display of results. A survey of the architecture and the","PeriodicalId":402489,"journal":{"name":"J. Lang. Technol. Comput. Linguistics","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125668256","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Local Coherence Analysis in a Multi-Level Approach to Automatic Text Analysis","authors":"Manfred Stede","doi":"10.21248/jlcl.23.2008.104","DOIUrl":"https://doi.org/10.21248/jlcl.23.2008.104","url":null,"abstract":"We characterize a text-technological approach to text analysis as combination of a multi-level representation framework and XML-based document processing techniques. The main advantages of such an approach are the chance to flexibly combine modules for constructing different applications, and the overall robustness resulting from the operational principle of higher-level modules combining the — possibly partial — results of lower-level ones. We illustrate the approach with the specific task of local coherence analysis, i.e. the computation of coherence relations between text spans.","PeriodicalId":402489,"journal":{"name":"J. Lang. Technol. Comput. Linguistics","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129660614","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Maja Bärenfänger, Daniela Goecke, M. Hilbert, H. Lüngen, Maik Stührenberg
{"title":"Anaphora as an Indicator of Elaboration: A Corpus Study","authors":"Maja Bärenfänger, Daniela Goecke, M. Hilbert, H. Lüngen, Maik Stührenberg","doi":"10.21248/jlcl.23.2008.107","DOIUrl":"https://doi.org/10.21248/jlcl.23.2008.107","url":null,"abstract":"This article describes an investigation of the relationship between anaphora and relational discourse structure, notably the Elaboration relation known from theories like RST. A corpus was annotated on the levels of anaphoric structure and rhetorical structure. The statistical analysis of interrelations between the two annotation layers revealed correlations between specific subtypes of anaphora and Elaboration, indicating that anaphora can function as a cue for Elaboration. 1","PeriodicalId":402489,"journal":{"name":"J. Lang. Technol. Comput. Linguistics","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129906027","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The Automatic Translation of Film Subtitles. A Machine Translation Success Story?","authors":"M. Volk","doi":"10.5167/UZH-8817","DOIUrl":"https://doi.org/10.5167/UZH-8817","url":null,"abstract":"Every so often one hears the complaint that 50 years of research in Machine Translation (MT) has not resulted in much progress, and that current MT systems are still unsatisfactory. A closer look reveals that web-based general-purpose MT systems are used by thousands of users every day. And, on the other hand, special-purpose MT systems have been in long-standing use and work successfully in particular domains or for specific companies. This paper investigates whether the automatic translation of film subtitles can be considered a machine translation success story. We describe various projects on MT of film subtitles and contrast them to our own project in this area. We argue that the text genre \"film subtitles\" is well suited for MT, in particular for Statistical MT. But before we look at the translation of film subtitles let us retrace some other MT success stories. Hutchins (1999) lists a number of successful MT systems. Amongst them is Meteo, a system for translating Canadian weather reports between English and French which is probably the most quoted MT system in practical use. References to Meteo usually remind us that this is a \"highly constrained sublanguage system\". On the other hand there are general purpose but customer-specific MT systems like the English to Spanish MT system at the Pan American Health Organization or the PaTrans system which Hutchins (1999) calls \"... possibly the best known success story for custom-built MT\". PaTrans was developed for LingTech A/S to translate English patents into Danish. Earlier Whitelock and Kilby (1995) (p.198) had called the METAL system \"a success story in the development of MT\". METAL is mentioned as \"successfully used at a number of European companies\" (by that time this meant a few dozen installations in industry, trade or banking). During the same time the European","PeriodicalId":402489,"journal":{"name":"J. Lang. Technol. Comput. Linguistics","volume":"163 10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130986895","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}