{"title":"Positional skipgrams for Bambara: a resource for corpus-based studies","authors":"K. Maslinsky","doi":"10.4000/mandenkan.2119","DOIUrl":null,"url":null,"abstract":"This article presents a new online dataset of linguistically rich n‑gram frequency data for Bambara based on the disambiguated part of the Bambara Reference Corpus. The n‑grams in the dataset are positional skipgrams that capture information about co-occurrence of lexical items with grammatical categories at various relative positions. These n‑grams were constructed with the aim to leverage those types of information that are available in the morphologically annotated corpus of Bambara given the limited amount of textual data. The methodology and data used for constructing n‑grams for Bambara are discussed, followed by brief illustrations of how the positional skipgrams data may be employed in corpus-based linguistic research.","PeriodicalId":42275,"journal":{"name":"Mandenkan-Bulletin Semestriel d Etudes Linguistiques Mande","volume":"122 1","pages":""},"PeriodicalIF":0.2000,"publicationDate":"2020-05-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Mandenkan-Bulletin Semestriel d Etudes Linguistiques Mande","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4000/mandenkan.2119","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"LINGUISTICS","Score":null,"Total":0}
引用次数: 1
Abstract
This article presents a new online dataset of linguistically rich n‑gram frequency data for Bambara based on the disambiguated part of the Bambara Reference Corpus. The n‑grams in the dataset are positional skipgrams that capture information about co-occurrence of lexical items with grammatical categories at various relative positions. These n‑grams were constructed with the aim to leverage those types of information that are available in the morphologically annotated corpus of Bambara given the limited amount of textual data. The methodology and data used for constructing n‑grams for Bambara are discussed, followed by brief illustrations of how the positional skipgrams data may be employed in corpus-based linguistic research.