Special Interest Group on Computational Morphology and Phonology Workshop最新文献

筛选
英文 中文
Unsupervised Learning of Morphology Using a Novel Directed Search Algorithm: Taking the First Step 使用一种新的定向搜索算法的无监督形态学学习:迈出第一步
Special Interest Group on Computational Morphology and Phonology Workshop Pub Date : 2002-07-11 DOI: 10.3115/1118647.1118649
Matthew G. Snover, G. Jarosz, M. Brent
{"title":"Unsupervised Learning of Morphology Using a Novel Directed Search Algorithm: Taking the First Step","authors":"Matthew G. Snover, G. Jarosz, M. Brent","doi":"10.3115/1118647.1118649","DOIUrl":"https://doi.org/10.3115/1118647.1118649","url":null,"abstract":"This paper describes a system for the unsupervised learning of morphological suffixes and stems from word lists. The system is composed of a generative probability model and a novel search algorithm. By examining morphologically rich subsets of an input lexicon, the search identifies highly productive paradigms. Quantitative results are shown by measuring the accuracy of the morphological relations identified. Experiments in English and Polish, as well as comparisons with other recent unsupervised morphology learning algorithms demonstrate the effectiveness of this technique.","PeriodicalId":186158,"journal":{"name":"Special Interest Group on Computational Morphology and Phonology Workshop","volume":"318 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131724556","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 49
Probabilistic Context-Free Grammars for Phonology 音系的概率上下文无关语法
Special Interest Group on Computational Morphology and Phonology Workshop Pub Date : 2002-07-11 DOI: 10.3115/1118647.1118655
K. Müller
{"title":"Probabilistic Context-Free Grammars for Phonology","authors":"K. Müller","doi":"10.3115/1118647.1118655","DOIUrl":"https://doi.org/10.3115/1118647.1118655","url":null,"abstract":"We present a phonological probabilistic context-free grammar, which describes the word and syllable structure of German words. The grammar is trained on a large corpus by a simple supervised method, and evaluated on a syllabification task achieving 96.88% word accuracy on word tokens, and 90.33% on word types. We added rules for English phonemes to the grammar, and trained the enriched grammar on an English corpus. Both grammars are evaluated qualitatively showing that probabilistic context-free grammars can contribute linguistic knowledge to phonology. Our formal approach is multilingual, while the training data is language-dependent.","PeriodicalId":186158,"journal":{"name":"Special Interest Group on Computational Morphology and Phonology Workshop","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124740464","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Modeling English Past Tense Intuitions with Minimal Generalization 用最小泛化建模英语过去时直觉
Special Interest Group on Computational Morphology and Phonology Workshop Pub Date : 2002-07-11 DOI: 10.3115/1118647.1118654
Adam Albright, B. Hayes
{"title":"Modeling English Past Tense Intuitions with Minimal Generalization","authors":"Adam Albright, B. Hayes","doi":"10.3115/1118647.1118654","DOIUrl":"https://doi.org/10.3115/1118647.1118654","url":null,"abstract":"We describe here a supervised learning model that, given paradigms of related words, learns the morphological and phonological rules needed to derive the paradigm. The model can use its rules to make guesses about how novel forms would be inflected, and has been tested experimentally against the intuitions of human speakers.","PeriodicalId":186158,"journal":{"name":"Special Interest Group on Computational Morphology and Phonology Workshop","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127332238","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 147
Using eigenvectors of the bigram graph to infer morpheme identity 利用双字图的特征向量推断语素同一性
Special Interest Group on Computational Morphology and Phonology Workshop Pub Date : 2002-07-01 DOI: 10.3115/1118647.1118652
M. Belkin, J. Goldsmith
{"title":"Using eigenvectors of the bigram graph to infer morpheme identity","authors":"M. Belkin, J. Goldsmith","doi":"10.3115/1118647.1118652","DOIUrl":"https://doi.org/10.3115/1118647.1118652","url":null,"abstract":"This paper describes the results of some experiments exploring statistical methods to infer syntactic categories from a raw corpus in an unsupervised fashion. It shares certain points in common with Brown et at (1992) and work that has grown out of that: it employs statistical techniques to derive categories based on what words occur adjacent to a given word. However, we use an eigenvector decomposition of a nearest-neighbor graph to produce a two-dimensional rendering of the words of a corpus in which words of the same syntactic category tend to form clusters and neighborhoods. We exploit this technique for extending the value of automatic learning of morphology. In particular, we look at the suffixes derived from a corpus by unsupervised learning of morphology, and we ask which of these suffixes have a consistent syntactic function (e.g., in English, -ed is primarily a mark of verbal past tense, does but -s marks both noun plurals and 3rd person present on verbs).","PeriodicalId":186158,"journal":{"name":"Special Interest Group on Computational Morphology and Phonology Workshop","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122416007","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 31
Unsupervised Discovery of Morphemes 语素的无监督发现
Special Interest Group on Computational Morphology and Phonology Workshop Pub Date : 2002-05-21 DOI: 10.3115/1118647.1118650
Mathias Creutz, K. Lagus
{"title":"Unsupervised Discovery of Morphemes","authors":"Mathias Creutz, K. Lagus","doi":"10.3115/1118647.1118650","DOIUrl":"https://doi.org/10.3115/1118647.1118650","url":null,"abstract":"We present two methods for unsupervised segmentation of words into morpheme-like units. The model utilized is especially suited for languages with a rich morphology, such as Finnish. The first method is based on the Minimum Description Length (MDL) principle and works online. In the second method, Maximum Likelihood (ML) optimization is used. The quality of the segmentations is measured using an evaluation method that compares the segmentations produced to an existing morphological analysis. Experiments on both Finnish and English corpora show that the presented methods perform well compared to a current state-of-the-art system.","PeriodicalId":186158,"journal":{"name":"Special Interest Group on Computational Morphology and Phonology Workshop","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127729752","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 402
Morphological reinflection with weighted finite-state transducers 基于加权有限状态传感器的形态反射
Special Interest Group on Computational Morphology and Phonology Workshop Pub Date : 1900-01-01 DOI: 10.18653/v1/2023.sigmorphon-1.15
Alice Kwak, Michael Hammond, Cheyenne Wing
{"title":"Morphological reinflection with weighted finite-state transducers","authors":"Alice Kwak, Michael Hammond, Cheyenne Wing","doi":"10.18653/v1/2023.sigmorphon-1.15","DOIUrl":"https://doi.org/10.18653/v1/2023.sigmorphon-1.15","url":null,"abstract":"This paper describes the submission by the University of Arizona to the SIGMORPHON 2023 Shared Task on typologically diverse morphological (re-)infection. In our submission, we investigate the role of frequency, length, and weighted transducers in addressing the challenge of morphological reinflection. We start with the non-neural baseline provided for the task and show how some improvement can be gained by integrating length and frequency in prefix selection. We also investigate using weighted finite-state transducers, jump-started from edit distance and directly augmented with frequency. Our specific technique is promising and quite simple, but we see only modest improvements for some languages here.","PeriodicalId":186158,"journal":{"name":"Special Interest Group on Computational Morphology and Phonology Workshop","volume":"170 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116060157","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Improving Automated Prediction of English Lexical Blends Through the Use of Observable Linguistic Features 利用可观察的语言特征改进英语词汇混合的自动预测
Special Interest Group on Computational Morphology and Phonology Workshop Pub Date : 1900-01-01 DOI: 10.18653/v1/2023.sigmorphon-1.10
Jarem Saunders
{"title":"Improving Automated Prediction of English Lexical Blends Through the Use of Observable Linguistic Features","authors":"Jarem Saunders","doi":"10.18653/v1/2023.sigmorphon-1.10","DOIUrl":"https://doi.org/10.18653/v1/2023.sigmorphon-1.10","url":null,"abstract":"The process of lexical blending is difficult to reliably predict. This difficulty has been shown by machine learning approaches in blend modeling, including attempts using then state-of-the-art LSTM deep neural networks trained on character embeddings, which were able to predict lexical blends given the ordered constituent words in less than half of cases, at maximum. This project introduces a novel model architecture which dramatically increases the correct prediction rates for lexical blends, using only Polynomial regression and Random Forest models. This is achieved by generating multiple possible blend candidates for each input word pairing and evaluating them based on observable linguistic features. The success of this model architecture illustrates the potential usefulness of observable linguistic features for problems that elude more advanced models which utilize only features discovered in the latent space.","PeriodicalId":186158,"journal":{"name":"Special Interest Group on Computational Morphology and Phonology Workshop","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121601881","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fine-tuning mSLAM for the SIGMORPHON 2022 Shared Task on Grapheme-to-Phoneme Conversion SIGMORPHON 2022在字素到音素转换上共享任务的微调mSLAM
Special Interest Group on Computational Morphology and Phonology Workshop Pub Date : 1900-01-01 DOI: 10.18653/v1/2023.sigmorphon-1.31
Dan Garrette
{"title":"Fine-tuning mSLAM for the SIGMORPHON 2022 Shared Task on Grapheme-to-Phoneme Conversion","authors":"Dan Garrette","doi":"10.18653/v1/2023.sigmorphon-1.31","DOIUrl":"https://doi.org/10.18653/v1/2023.sigmorphon-1.31","url":null,"abstract":"Grapheme-to-phoneme (G2P) conversion is a task that is inherently related to both written and spoken language. Therefore, our submission to the G2P shared task builds off of mSLAM (Bapna et al., 2022), a 600M parameter encoder model pretrained simultaneously on text from 101 languages and speech from 51 languages. For fine-tuning a G2P model, we combined mSLAM’s text encoder, which uses characters as its input tokens, with an uninitialized single-layer RNN-T decoder (Graves, 2012) whose vocabulary is the set of all 381 phonemes appearing in the shared task data. We took an explicitly multilingual approach to modeling the G2P tasks, fine-tuning and evaluating a single model that covered all the languages in each task, and adding language codes as prefixes to the input strings as a means of specifying the language of each example. Our models perform well in the shared task’s “high” setting (in which they were trained on 1,000 words from each language), though they do poorly in the “low” task setting (training on only 100 words from each language). Our models also perform reasonably in the “mixed” setting (training on 100 words in the target language and 1000 words in a related language), hinting that mSLAM’s multilingual pretraining may be enabling useful cross-lingual sharing.","PeriodicalId":186158,"journal":{"name":"Special Interest Group on Computational Morphology and Phonology Workshop","volume":"255 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115940513","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
LISN @ SIGMORPHON 2023 Shared Task on Interlinear Glossing 线性间光泽的共享任务
Special Interest Group on Computational Morphology and Phonology Workshop Pub Date : 1900-01-01 DOI: 10.18653/v1/2023.sigmorphon-1.21
Shu Okabe, François Yvon
{"title":"LISN @ SIGMORPHON 2023 Shared Task on Interlinear Glossing","authors":"Shu Okabe, François Yvon","doi":"10.18653/v1/2023.sigmorphon-1.21","DOIUrl":"https://doi.org/10.18653/v1/2023.sigmorphon-1.21","url":null,"abstract":"This paper describes LISN”’“s submission to the second track (open track) of the shared task on Interlinear Glossing for SIGMORPHON 2023. Our systems are based on Lost, a variation of linear Conditional Random Fields initially developed as a probabilistic translation model and then adapted to the glossing task. This model allows us to handle one of the main challenges posed by glossing, i.e. the fact that the list of potential labels for lexical morphemes is not fixed in advance and needs to be extended dynamically when labelling units are not seen in training. In such situations, we show how to make use of candidate lexical glosses found in the translation and discuss how such extension affects the training and inference procedures. The resulting automatic glossing systems prove to yield very competitive results, especially in low-resource settings.","PeriodicalId":186158,"journal":{"name":"Special Interest Group on Computational Morphology and Phonology Workshop","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132555931","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
SIGMORPHON 2022 Shared Task on Grapheme-to-Phoneme Conversion Submission Description: Sequence Labelling for G2P 基于G2P的序列标记
Special Interest Group on Computational Morphology and Phonology Workshop Pub Date : 1900-01-01 DOI: 10.18653/v1/2023.sigmorphon-1.28
Leander Girrbach
{"title":"SIGMORPHON 2022 Shared Task on Grapheme-to-Phoneme Conversion Submission Description: Sequence Labelling for G2P","authors":"Leander Girrbach","doi":"10.18653/v1/2023.sigmorphon-1.28","DOIUrl":"https://doi.org/10.18653/v1/2023.sigmorphon-1.28","url":null,"abstract":"This paper describes our participation in the Third SIGMORPHON Shared Task on Grapheme-to-Phoneme Conversion (Low-Resource and Cross-Lingual) (McCarthy et al.,2022). Our models rely on different sequence labelling methods. The main model predicts multiple phonemes from each grapheme and is trained using CTC loss (Graves et al., 2006). We find that sequence labelling methods yield worse performance than the baseline when enough data is available, but can still be used when very little data is available. Furthermore, we demonstrate that alignments learned by the sequence labelling models can be easily inspected.","PeriodicalId":186158,"journal":{"name":"Special Interest Group on Computational Morphology and Phonology Workshop","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126489351","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信