Pierre Godard, L. Besacier, François Yvon, M. Adda-Decker, G. Adda, Hélène Maynard, Annie Rialland
{"title":"Adaptor Grammars for the Linguist: Word Segmentation Experiments for Very Low-Resource Languages","authors":"Pierre Godard, L. Besacier, François Yvon, M. Adda-Decker, G. Adda, Hélène Maynard, Annie Rialland","doi":"10.18653/v1/W18-5804","DOIUrl":"https://doi.org/10.18653/v1/W18-5804","url":null,"abstract":"Computational Language Documentation attempts to make the most recent research in speech and language technologies available to linguists working on language preservation and documentation. In this paper, we pursue two main goals along these lines. The first is to improve upon a strong baseline for the unsupervised word discovery task on two very low-resource Bantu languages, taking advantage of the expertise of linguists on these particular languages. The second consists in exploring the Adaptor Grammar framework as a decision and prediction tool for linguists studying a new language. We experiment 162 grammar configurations for each language and show that using Adaptor Grammars for word segmentation enables us to test hypotheses about a language. Specializing a generic grammar with language specific knowledge leads to great improvements for the word discovery task, ultimately achieving a leap of about 30% token F-score from the results of a strong baseline.","PeriodicalId":415625,"journal":{"name":"Proceedings of the Fifteenth Workshop on Computational Research in Phonetics, Phonology, and Morphology","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125057491","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xung, Syed-Amad A. Hussain, M. Elsner, Amanda Miller
{"title":"Lexical Networks in !Xung","authors":"Xung, Syed-Amad A. Hussain, M. Elsner, Amanda Miller","doi":"10.18653/v1/W18-5802","DOIUrl":"https://doi.org/10.18653/v1/W18-5802","url":null,"abstract":"We investigate the lexical network properties of the large phoneme inventory Southern African language Mangetti Dune !Xung as it compares to English and other commonly-studied languages. Lexical networks are graphs in which nodes (words) are linked to their minimal pairs; global properties of these networks are believed to mediate lexical access in the minds of speakers. We show that the network properties of !Xung are within the range found in previously-studied languages. By simulating data (”pseudolexicons”) with varying levels of phonotactic structure, we find that the lexical network properties of !Xung diverge from previously-studied languages when fewer phonotactic constraints are retained. We conclude that lexical network properties are representative of an underlying cognitive structure which is necessary for efficient word retrieval and that the phonotactics of !Xung may be shaped by a selective pressure which preserves network properties within this cognitively useful range.","PeriodicalId":415625,"journal":{"name":"Proceedings of the Fifteenth Workshop on Computational Research in Phonetics, Phonology, and Morphology","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114297842","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}