{"title":"Improving Multilingual Frame Identification by Estimating Frame Transferability","authors":"Jennifer Sikos, Michael Roth, Sebastian Padó","doi":"10.33011/lilt.v19i.939","DOIUrl":"https://doi.org/10.33011/lilt.v19i.939","url":null,"abstract":"A recent research direction in computational linguistics involves efforts to make the field, which used to focus primarily on English, more multilingual and inclusive. However, resource creation often remains a bottleneck for many languages, in particular at the semantic level. In this article, we consider the case of frame-semantic annotation. We investigate how to perform frame selection for annotation in a target language by taking advantage of existing annotations in different, supplementary languages, with the goal of reducing the required annotation effort in the target language. We measure success by training and testing frame identification models for the target language. We base our selection methods on measuring frame transferability in the supplementary language, where we estimate which frames will transfer poorly, and therefore should receive more annotation, in the target language. We apply our approach to English, German, and French – three languages which have annotations that are similar in size as well as frames with overlapping lexicographic definitions. We find that transferability is indeed a useful indicator and supports a setup where a limited amount of target language data is sufficient to train frame identification systems.","PeriodicalId":218122,"journal":{"name":"Linguistic Issues in Language Technology","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126866187","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Parsed Corpus as a Source for Testing Generalizations in Japanese Syntax","authors":"H. Kishimoto, Prashant Pardeshi","doi":"10.33011/lilt.v18i.1431","DOIUrl":"https://doi.org/10.33011/lilt.v18i.1431","url":null,"abstract":"In this paper, we discuss constituent ordering generalizations in Japanese. Japanese has SOV as its basic order, but a significant range of argument order variations brought about by ‘scrambling’ is permitted. Although scrambling does not induce much in the way of semantic effects, it is conceivable that marked orders are derived from the unmarked order under some pragmatic or other motivations. The difference in the effect of basic and derived order is not reflected in native speaker’s grammaticality judgments, but we suggest that the intuition about the ordering of arguments may be attested in corpus data. By using the Keyaki treebank (a proper subset of which is NINJAL Parsed Corpus of Modern Japanese (NPCMJ)), it is shown that the naturallyoccurring corpus data confirm that marked orderings of arguments are less frequent than their unmarked ordering counterparts. We suggest some possible motivations lying behind the argument order variations.","PeriodicalId":218122,"journal":{"name":"Linguistic Issues in Language Technology","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115347637","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Prashant Pardeshi, Alistair Butler, S. Horn, K. Yoshimoto, Iku Nagasaki
{"title":"Exploiting Parsed Corpora: Applications in Research, Pedagogy, and Processing","authors":"Prashant Pardeshi, Alistair Butler, S. Horn, K. Yoshimoto, Iku Nagasaki","doi":"10.33011/lilt.v18i.1427","DOIUrl":"https://doi.org/10.33011/lilt.v18i.1427","url":null,"abstract":"The articles in this special issue are based on presentations made at the international symposium entitled Exploiting Parsed Corpora: Applications in Research, Pedagogy, and Processing held at the National Institute for Japanese Language and Linguistics (NINJAL) on Dec. 9-10, 2017 and organized by the collaborative research project at NINJAL entitled 'Development of and Linguistic Research with a Parsed Corpus of Japanese' with which all the guest editors are associated.","PeriodicalId":218122,"journal":{"name":"Linguistic Issues in Language Technology","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122065063","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Exploiting parsed corpora in grammar teaching","authors":"S. Wallis, I. Cushing, B. Aarts","doi":"10.33011/lilt.v18i.1437","DOIUrl":"https://doi.org/10.33011/lilt.v18i.1437","url":null,"abstract":"The principal barrier to the uptake of technologies in schools is not technological, but social and political. Teachers must be convinced of the pedagogical benefits of a particular curriculum before they will agree to learn the means to teach it. The teaching of formal grammar to first language students in schools is no exception to this rule. Over the last three decades, most schools in England have been legally required to teach grammatical subject knowledge, i.e. linguistic knowledge of grammar terms and structure, to children age five and upwards as part of the national curriculum in English. A mandatory set of curriculum specifications for England and Wales was published in 2014, and elsewhere similar requirements were imposed. However, few current English school teachers were taught grammar themselves, and the dominant view has long been in favour of ‘real books’ rather than the teaching of a formal grammar. English grammar teaching thus faces multiple challenges: to convince teachers of the value of grammar in their own teaching, to teach the teachers the knowledge they need, and to develop relevant resources to use in the classroom. Alongside subject knowledge, teachers need pedagogical knowledge – how to teach grammar effectively and how to integrate this teaching into other kinds of language learning. The paper introduces the Englicious1 web platform for schools, and summarises its development and impact since publication. Englicious draws data from the fully-parsed British Component of the International Corpus of English, ICE-GB. The corpus offers plentiful examples of genuine natural language, speech and writing, with context and potentially audio playback. However, corpus examples may be ageinappropriate or over-complex, and without grammar training, teachers are insufficiently equipped to use them. In the absence of grammatical knowledge among teachers, it is insufficient simply to give teachers and children access to a corpus. Whereas so-called ‘classroom concordancing’ approaches offer access to tools and encourage bottom-up learning, Englicious approaches the question of grammar teaching in a concept-driven, top-down way. It contains a modular series of professional development resources, lessons and exercises focused on each concept in turn, in which corpus examples are used extensively. Teachers must be able to discuss with a class why, for instance, work is a noun in a particular sentence, rather than merely report that it is. The paper describes the development of Englicious from secondary to primary, and outlines some of the practical challenges facing the design of this type of teaching resource. A key question, the ‘selection problem’, concerns how tools parameterise the selection of relevant examples for teaching purposes. Finally we discuss curricula for teaching teachers and the evaluation of the effectiveness of the intervention.","PeriodicalId":218122,"journal":{"name":"Linguistic Issues in Language Technology","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130025790","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Building a Chinese AMR Bank with Concept and Relation Alignments","authors":"Bin Li, Y. Wen, Li Song, Weiguang Qu, Nianwen Xue","doi":"10.33011/lilt.v18i.1429","DOIUrl":"https://doi.org/10.33011/lilt.v18i.1429","url":null,"abstract":"Abstract Meaning Representation (AMR) is a meaning representation framework in which the meaning of a full sentence is represented as a single-rooted, acyclic, directed graph. In this article, we describe an on-going project to build a Chinese AMR (CAMR) corpus, which currently includes 10,149 sentences from the newsgroup and weblog portion of the Chinese TreeBank (CTB). We describe the annotation specifications for the CAMR corpus, which follow the annotation principles of English AMR but make adaptations where needed to accommodate the linguistic facts of Chinese. The CAMR specifications also include a systematic treatment of sentence-internal discourse relations. One significant change we have made to the AMR annotation methodology is the inclusion of the alignment between word tokens in the sentence and the concepts/relations in the CAMR annotation to make it easier for automatic parsers to model the correspondence between a sentence and its meaning representation. We develop an annotation tool for CAMR, and the inter-agreement as measured by the Smatch score between the two annotators is 0.83, indicating reliable annotation. We also present some quantitative analysis of the CAMR corpus. 46.71% of the AMRs of the sentences are non-tree graphs. Moreover, the AMR of 88.95% of the sentences has concepts inferred from the context of the sentence but do not correspond to a specific word.","PeriodicalId":218122,"journal":{"name":"Linguistic Issues in Language Technology","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131305519","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Complex predicates: Structure, potential structure and underspecification","authors":"Stephan Müller","doi":"10.33011/lilt.v17i.1423","DOIUrl":"https://doi.org/10.33011/lilt.v17i.1423","url":null,"abstract":"This paper compares a recent TAG-based analysis of complex predicates in Hindi/Urdu with its HPSG analog. It points out that TAG combines actual structure while HPSG (and Categorial Grammar and other valence-based frameworks) specify valence of lexical items and hence potential structure. This makes it possible to have light verbs decide which arguments of embedded heads get realized, somthing that is not possible in TAG. TAG has to retreat to disjunctions instead. While this allows straight-forward analyses of active/passive alternations based on the light verb in valence-based frameworks, such an option does not exist for TAG and it has to be assumed that preverbs come with different sets of arguments.","PeriodicalId":218122,"journal":{"name":"Linguistic Issues in Language Technology","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133679388","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Syntactic composition and selectional preferences in Hindi Light Verb Constructions","authors":"Ashwini Vaidya, Owen Rambow, Martha Palmer","doi":"10.33011/lilt.v17i.1419","DOIUrl":"https://doi.org/10.33011/lilt.v17i.1419","url":null,"abstract":"Previous work on light verb constructions (e.g. chorii kar ‘theft do; steal’) in Hindi describes their syntactic formation via co-predication (Ahmed et al., 2012, Butt, 2014). This implies that both noun and light verb contribute their arguments, and these overlapping argument structures must be composed in the syntax. In this paper, we present a co-predication analysis using Tree-Adjoining Grammar, which models syntactic composition and semantic selectional preferences without transformations (deletion or argument identification). The analysis has two key components (i) an underspecified category for the nominal and (ii) combinatorial constraints on the noun and light verb to specify selectional preferences. The former has the advantage of syntactic composition without argument identification and the latter prevents over-generalization, while recognizing the semantic contribution of both predicates. This work additionally accounts for the agreement facts for the Hindi LVC.","PeriodicalId":218122,"journal":{"name":"Linguistic Issues in Language Technology","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124287956","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Argument alternations in complex predicates: an LFG+glue perspective","authors":"J. Lowe","doi":"10.33011/lilt.v17i.1421","DOIUrl":"https://doi.org/10.33011/lilt.v17i.1421","url":null,"abstract":"Vaidya et al. (2019) discuss argument alternations in Hindi complex predicates, and propose an analysis within an LTAG framework, comparing this with an LFG analysis of complex predicates. In this paper I clarify the inadequacies in existing LFG analyses of complex predicates, and show how the LFG+glue approach proposed by Lowe (2015) can both address these inadequacies and provide a relatively simple treatment of the phenomena discussed by Vaidya et al. (2019).","PeriodicalId":218122,"journal":{"name":"Linguistic Issues in Language Technology","volume":"2007 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125621471","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Complex Predicates and Multidimensionality in Grammar","authors":"M. Butt","doi":"10.33011/lilt.v17i.1425","DOIUrl":"https://doi.org/10.33011/lilt.v17i.1425","url":null,"abstract":"This paper contributes to the on-going discussion of how best to analyze and handle complex predicate formations, commenting in particular on the properties of Hindi N-V complex predicates as set out by Vaidya et al. (2019). I highlight features of existing LFG analyses and focus in particular on the modular architecture of LFG, its attendant multidimensional lexicon and the analytic consequences which follow from this. I point out where the previously existing LFG proposals have been misunderstood as viewed from the lens of theories such as LTAG and HPSG, which assume a very different architectural set-up and provide a comparative discussion of the issues.","PeriodicalId":218122,"journal":{"name":"Linguistic Issues in Language Technology","volume":"160 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129257226","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Can Recurrent Neural Networks Learn Nested Recursion?","authors":"Jean-Philippe Bernardy","doi":"10.33011/lilt.v16i.1417","DOIUrl":"https://doi.org/10.33011/lilt.v16i.1417","url":null,"abstract":"Context-free grammars (CFG) were one of the first formal tools used to model natural languages, and they remain relevant today as the basis of several frameworks. A key ingredient of CFG is the presence of nested recursion. In this paper, we investigate experimentally the capability of several recurrent neural networks (RNNs) to learn nested recursion. More precisely, we measure an upper bound of their capability to do so, by simplifying the task to learning a generalized Dyck language, namely one composed of matching parentheses of various kinds. To do so, we present the RNNs with a set of random strings having a given maximum nesting depth and test its ability to predict the kind of closing parenthesis when facing deeper nested strings. We report mixed results: when generalizing to deeper nesting levels, the accuracy of standard RNNs is significantly higher than random, but still far from perfect. Additionally, we propose some non-standard stack-based models which can approach perfect accuracy, at the cost of robustness.","PeriodicalId":218122,"journal":{"name":"Linguistic Issues in Language Technology","volume":"225 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133353195","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}