J. Lang. Model.Pub Date : 2013-07-22DOI: 10.15398/jlm.v1i1.62
K. Koskenniemi
{"title":"An informal discovery procedure for two-level rules","authors":"K. Koskenniemi","doi":"10.15398/jlm.v1i1.62","DOIUrl":"https://doi.org/10.15398/jlm.v1i1.62","url":null,"abstract":"The paper shows how a certain kind of underlying representations (or deep forms) of words can be constructed in a straightforward manner through aligning the surface forms of the morphs of the word forms. The inventory of morphophonemes follows directly from this alignment. Furthermore, the two-level rules which govern the different realisations of such morphophonemes follow fairly directly from the previous steps. The alignment and rules are based upon an approximate general metric among phonemes, e.g., articulatory features, that determines which alternations are likely or possible. This enables us to summarise contexts for the different realisations.","PeriodicalId":403597,"journal":{"name":"J. Lang. Model.","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115074515","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. Lang. Model.Pub Date : 2012-12-18DOI: 10.15398/jlm.v0i1.64
A. Przepiórkowski
{"title":"Journal of Language Modelling","authors":"A. Przepiórkowski","doi":"10.15398/jlm.v0i1.64","DOIUrl":"https://doi.org/10.15398/jlm.v0i1.64","url":null,"abstract":"Welcome to the inaugural issue of the Journal of Language Modelling (JLM), a free open-access peer-reviewed journal aiming to help bridge the gap between theoretical linguistics and natural language processing (NLP). Setting up a new journal is not a trivial task, and running it possibly for decades requires determination and perseverance, so any such enterprise should not be taken up lightly. The publication of this issue has been preceded by years of growing conviction that there is no appropriate forum for the exchange of ideas between theoretical, formal and computational linguists. Many conversations with our colleagues – both linguists and NLP practitioners – convinced us that such a forum is indeed needed. Ideally, JLM papers should be accessible to many readers of such periodicals as Natural Language and Linguistic Theories, Journal of Linguistics, Language or Lingua on one hand, and Computational Linguistics, Journal of Natural Language Processing, Journal of Logic, Language and Information or Language Resources and Evaluation, on the other. The affinity to another relatively young journal, Linguistic Issues in Language Technology, should also be clear. On the map of the main linguistic and NLP conferences, we see JLM as close to conferences devoted to constraint-based and formal linguistic theories (HPSG, LFG, TAG, Construction Grammar; Dependency Grammar in general and Meaning-Text Theory in particular; etc.), the Formal Grammar conference at ESSLLI, COLING, Treebanks and Linguistic Theories, etc., but also to LREC (Language Resources and Evaluation Conference), TSD (Text, Speech and Dialogue) or the xTAL series of conferences (see Jap-","PeriodicalId":403597,"journal":{"name":"J. Lang. Model.","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125279215","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. Lang. Model.Pub Date : 2012-12-18DOI: 10.15398/jlm.v0i1.35
R. Garabík, M. Šimková
{"title":"Slovak Morphosyntactic Tagset","authors":"R. Garabík, M. Šimková","doi":"10.15398/jlm.v0i1.35","DOIUrl":"https://doi.org/10.15398/jlm.v0i1.35","url":null,"abstract":"Morphological annotation constitutes essential, very useful and very common linguistic information presented in corpora, especially for highly inflectional languages. The morphological tagset used in the Slovak National Corpus has been designed with several goals in mind – the tags are compact and easily human-readable, without sacrificing their informational contents. The tags consist of ASCII letters, numbers and several other characters. In general, they have a variable number of symbols, but their order is obligatory, and each category or specific feature is assigned a particular character, which can be shared among several parts of speech. The tagset is highly functional and pragmatic, although some allowances had to be made to accommodate the traditional analysis of Slovak morphology and part of speech categories.","PeriodicalId":403597,"journal":{"name":"J. Lang. Model.","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129701801","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. Lang. Model.Pub Date : 2012-12-18DOI: 10.15398/jlm.v0i1.34
Kresimir Sojat, Matea Srebacic, Marko Tadić
{"title":"Derivational and Semantic Relations of Croatian Verbs","authors":"Kresimir Sojat, Matea Srebacic, Marko Tadić","doi":"10.15398/jlm.v0i1.34","DOIUrl":"https://doi.org/10.15398/jlm.v0i1.34","url":null,"abstract":"This paper deals with certain morphosemantic relations between Croatian verbs and discusses their inclusion in Croatian WordNet. The morphosemantic relations in question are the semantic relations between unprefixed infinitives and their prefixed derivatives. We introduce the criteria for the division of aspectual pairs and further discuss verb prefixation which results in combinations of prefixes and base forms that can vary in terms of meaning from compositional to completely idiosyncratic. The focus is on the regularities in semantic modifications of base forms modified by one prefix. The aim of this procedure is to establish a set of morphosemantic relations based on regular or reoccuring meaning alternations.","PeriodicalId":403597,"journal":{"name":"J. Lang. Model.","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128600740","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. Lang. Model.Pub Date : 2012-12-18DOI: 10.15398/jlm.v0i1.58
S. Shieber
{"title":"The Case for the Journal's Use of a CC-BY License","authors":"S. Shieber","doi":"10.15398/jlm.v0i1.58","DOIUrl":"https://doi.org/10.15398/jlm.v0i1.58","url":null,"abstract":"Journal of Language Modelling provides its articles under a Creative Commons CC-BY license. We discuss why this is the appropriate choice for the journal.","PeriodicalId":403597,"journal":{"name":"J. Lang. Model.","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133826446","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. Lang. Model.Pub Date : 2012-12-18DOI: 10.15398/jlm.v0i1.31
György Szaszák, A. Beke
{"title":"Exploiting Prosody for Syntactic Analysis in Automatic Speech Understanding","authors":"György Szaszák, A. Beke","doi":"10.15398/jlm.v0i1.31","DOIUrl":"https://doi.org/10.15398/jlm.v0i1.31","url":null,"abstract":"The relation between syntax and prosody is evident, even if the prosodic structure cannot be directly mapped to the syntactic one and vice versa. Syntax-to-prosody mapping is widely used in text-tospeech applications, but prosody-to-syntax mapping is mostly missing from automatic speech recognition/understanding systems. This paper presents an experiment towards filling this gap and evaluating whether a HMM-based automatic prosodic segmentation tool can be used to support the reconstruction of the syntactic structure directly from speech. Results show that up to 85% of syntactic clause boundaries and up to about 70% of embedded syntactic phrase boundaries could be identified based on the detection of phonological phrases. Recall rates do not depend further on syntactic layering, in other words, whether the phrase is multiply embedded or not. Clause boundaries can be well assigned to intonational phrase level in read speech and can be well separated from lower level syntactic phrases based on the type of the aligned phonological phrase(s). These findings can be exploited in speech understanding systems, allowing for the recovery of the skeleton of the syntactic structure, based purely on the speech signal.","PeriodicalId":403597,"journal":{"name":"J. Lang. Model.","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133292454","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. Lang. Model.Pub Date : 2012-12-18DOI: 10.15398/jlm.v0i1.33
S. Koeva, I. Stoyanova, S. Leseva, Rositsa Dekova, Tsvetana Dimitrova, Ekaterina Tarpomanova
{"title":"The Bulgarian National Corpus: Theory and Practice in Corpus Design","authors":"S. Koeva, I. Stoyanova, S. Leseva, Rositsa Dekova, Tsvetana Dimitrova, Ekaterina Tarpomanova","doi":"10.15398/jlm.v0i1.33","DOIUrl":"https://doi.org/10.15398/jlm.v0i1.33","url":null,"abstract":"The paper discusses several key concepts related to the development of corpora and reconsiders them in light of recent developments in NLP. On the basis of an overview of present-day corpora, we conclude that the dominant practices of corpus design do not utilise adequately the technologies and, as a result, fail to meet the demands of corpus linguistics, computational lexicology and computational linguistics alike. We proceed to lay out a data-driven approach to corpus design, which integrates the best practices of traditional corpus linguistics with the potential of the latest technologies allowing fast collection, automatic metadata description and annotation of large amounts of data. Thus, the gist of the approach we propose is that corpus design should be centred on amassing large amounts of mono- and multilingual texts and on providing them with a detailed metadata description and high-quality multi-level annotation. We go on to illustrate this concept with a description of the compilation, structuring, documentation, and annotation of the Bulgarian National Corpus (BulNC). At present it consists of a Bulgarian part of 979.6 million words, constituting the corpus kernel, and 33 Bulgarian-X language corpora, totalling 972.3 million words, 1.95 billion words altogether. The BulNC is supplied with a comprehensive metadata description, which allows us to organise the texts according to different principles. The Bulgarian part of the BulNC is automatically processed (tokenised and sentence split) and annotated at several levels: morphosyntactic tagging, lemmatisation, word-sense annotation, annotation of noun phrases and named entities. Some levels of annotation are also applied to the Bulgarian-English parallel corpus with the prospect of expanding multilingual annotation both in terms of linguistic levels and the number of languages for which it is available. We conclude with a brief evaluation of the quality of the corpus and an outline of its applications in NLP and linguistic research.","PeriodicalId":403597,"journal":{"name":"J. Lang. Model.","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123122758","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. Lang. Model.Pub Date : 2012-12-18DOI: 10.15398/jlm.v0i1.52
Stefan Müller
{"title":"A Personal Note on Open Access in Linguistics","authors":"Stefan Müller","doi":"10.15398/jlm.v0i1.52","DOIUrl":"https://doi.org/10.15398/jlm.v0i1.52","url":null,"abstract":"This paper contains only known facts about open access, but they are put into a rather personal perspective that may help others to understand the importance of Open Access in science in general and in linguistics in particular. This paper tries to motivate Open Access publishing, with a particular focus on publishing books. In Section 1, I describe the problems in accessing relevant information in economically weak countries, the problem of underpayment in the humanities, and usage restrictions of traditionally published books. Section 2 explains the factors that contribute to book prices. Section 3 briefly describes Open Access publishing and print on demand services. In Section 4, I address some challenges for Open Access publishing and suggest ways to ensure quality control, proper typesetting, and efficient marketing. Section 5 discusses Open Access approaches of profitorientated publishers. Section 6 deals with Open Access and getting tenure and promotion, and Section 7 is about radical opinions about copyrights outside of academia. Version of 7th January 2012 (minor editorial changes).","PeriodicalId":403597,"journal":{"name":"J. Lang. Model.","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114216486","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. Lang. Model.Pub Date : 1900-01-01DOI: 10.15398/jlm.v3i2.113
Michael Hahn, F. Richter
{"title":"Henkin semantics for reasoning with natural language","authors":"Michael Hahn, F. Richter","doi":"10.15398/jlm.v3i2.113","DOIUrl":"https://doi.org/10.15398/jlm.v3i2.113","url":null,"abstract":"The frequency of intensional and non-first-order definable operators in natural languages constitutes a challenge for automated reasoning with the kind of logical translations that are deemed adequate by formal semanticists. Whereas linguists employ expressive higher-order logics in their theories of meaning, the most successful logical reasoning strategies with natural language to date rely on sophisticated first-order theorem provers and model builders. In order to bridge the fundamental mathematical gap between linguistic theory and computational practice, we present a general translation from a higher-order logic frequently employed in the linguistics literature, two-sorted Type Theory, to first-order logic under Henkin semantics. We investigate alternative formulations of the translation, discuss their properties, and evaluate the availability of linguistically relevant inferences with standard theorem provers in a test suite of inference problems stated in English. The results of the experiment indicate that translation from higher-order logic to first-order logic under Henkin semantics is a promising strategy for automated reasoning with natural languages. The paper is accompanied by the source code (cf. SUPP. FILES ) of the grammar and reasoning architecture described in the paper.","PeriodicalId":403597,"journal":{"name":"J. Lang. Model.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127777234","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. Lang. Model.Pub Date : 1900-01-01DOI: 10.15398/jlm.v9i1.257
N. Chomsky
{"title":"Simplicity and the form of grammars","authors":"N. Chomsky","doi":"10.15398/jlm.v9i1.257","DOIUrl":"https://doi.org/10.15398/jlm.v9i1.257","url":null,"abstract":"The goal of theory construction is explanation: for language, theory for particular languages (grammar) and for the faculty of language FoL (the innate endowment for language acquisition). A primitive notion of simplicity of grammars is number of symbols, but this is too crude. An improved measure distinguishes grammars that capture genuine properties of language from those that do not. The theory of FoL must meet the empirical conditions of learnability (under extreme poverty of stimulus), and evolvability (given the limited but not insignificant evidence available). Recent work provides promising insights into how these twin conditions may be satisfied.","PeriodicalId":403597,"journal":{"name":"J. Lang. Model.","volume":"155 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132401566","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}