Special Interest Group on Computational Morphology and Phonology Workshop最新文献

筛选
英文 中文
Getting the ##life out of living: How Adequate Are Word-Pieces for Modelling Complex Morphology? 从生活中获得生命:对复杂形态建模的词块有多充分?
Special Interest Group on Computational Morphology and Phonology Workshop Pub Date : 2020-07-01 DOI: 10.18653/v1/2020.sigmorphon-1.24
Stav Klein, Reut Tsarfaty
{"title":"Getting the ##life out of living: How Adequate Are Word-Pieces for Modelling Complex Morphology?","authors":"Stav Klein, Reut Tsarfaty","doi":"10.18653/v1/2020.sigmorphon-1.24","DOIUrl":"https://doi.org/10.18653/v1/2020.sigmorphon-1.24","url":null,"abstract":"This work investigates the most basic units that underlie contextualized word embeddings, such as BERT — the so-called word pieces. In Morphologically-Rich Languages (MRLs) which exhibit morphological fusion and non-concatenative morphology, the different units of meaning within a word may be fused, intertwined, and cannot be separated linearly. Therefore, when using word-pieces in MRLs, we must consider that: (1) a linear segmentation into sub-word units might not capture the full morphological complexity of words; and (2) representations that leave morphological knowledge on sub-word units inaccessible might negatively affect performance. Here we empirically examine the capacity of word-pieces to capture morphology by investigating the task of multi-tagging in Modern Hebrew, as a proxy to evaluate the underlying segmentation. Our results show that, while models trained to predict multi-tags for complete words outperform models tuned to predict the distinct tags of WPs, we can improve the WPs tag prediction by purposefully constraining the word-pieces to reflect their internal functions. We suggest that linguistically-informed word-pieces schemes, that make the morphological structure explicit, might boost performance for MRLs.","PeriodicalId":186158,"journal":{"name":"Special Interest Group on Computational Morphology and Phonology Workshop","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127885696","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 36
Data Augmentation for Transformer-based G2P 基于变压器的G2P数据增强
Special Interest Group on Computational Morphology and Phonology Workshop Pub Date : 2020-07-01 DOI: 10.18653/v1/2020.sigmorphon-1.21
Zach Ryan, Mans Hulden
{"title":"Data Augmentation for Transformer-based G2P","authors":"Zach Ryan, Mans Hulden","doi":"10.18653/v1/2020.sigmorphon-1.21","DOIUrl":"https://doi.org/10.18653/v1/2020.sigmorphon-1.21","url":null,"abstract":"The Transformer model has been shown to outperform other neural seq2seq models in several character-level tasks. It is unclear, however, if the Transformer would benefit as much as other seq2seq models from data augmentation strategies in the low-resource setting. In this paper we explore strategies for data augmentation in the g2p task together with the Transformer model. Our results show that a relatively simple alignment-based strategy of identifying consistent input-output subsequences in grapheme-phoneme data coupled together with a subsequent splicing together of such pieces to generate hallucinated data works well in the low-resource setting, often delivering substantial performance improvement over a standard Transformer model.","PeriodicalId":186158,"journal":{"name":"Special Interest Group on Computational Morphology and Phonology Workshop","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122478523","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
The UniMelb Submission to the SIGMORPHON 2020 Shared Task 0: Typologically Diverse Morphological Inflection UniMelb提交给SIGMORPHON 2020共享任务0:类型学上多样的形态变化
Special Interest Group on Computational Morphology and Phonology Workshop Pub Date : 2020-07-01 DOI: 10.18653/v1/2020.sigmorphon-1.20
Andreas Scherbakov
{"title":"The UniMelb Submission to the SIGMORPHON 2020 Shared Task 0: Typologically Diverse Morphological Inflection","authors":"Andreas Scherbakov","doi":"10.18653/v1/2020.sigmorphon-1.20","DOIUrl":"https://doi.org/10.18653/v1/2020.sigmorphon-1.20","url":null,"abstract":"The paper describes the University of Melbourne’s submission to the SIGMORPHON 2020 Shared Task 0: Typologically Diverse Morphological Inflection. Our team submitted three systems in total, two neural and one non-neural. Our analysis of systems’ performance shows positive effects of newly introduced data hallucination technique that we employed in one of neural systems, especially in low-resource scenarios. A non-neural system based on observed inflection patterns shows optimistic results even in its simple implementation (>75% accuracy for 50% of languages). With possible improvement within the same modeling principle, accuracy might grow to values above 90%.","PeriodicalId":186158,"journal":{"name":"Special Interest Group on Computational Morphology and Phonology Workshop","volume":"160 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127315658","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
SIGMORPHON 2020 Task 0 System Description: ETH Zürich Team SIGMORPHON 2020 Task 0系统描述:ETH z<e:1> rich Team
Special Interest Group on Computational Morphology and Phonology Workshop Pub Date : 2020-07-01 DOI: 10.18653/v1/2020.sigmorphon-1.10
Martina Forster, Clara Meister
{"title":"SIGMORPHON 2020 Task 0 System Description: ETH Zürich Team","authors":"Martina Forster, Clara Meister","doi":"10.18653/v1/2020.sigmorphon-1.10","DOIUrl":"https://doi.org/10.18653/v1/2020.sigmorphon-1.10","url":null,"abstract":"This paper presents our system for the SIGMORPHON 2020 Shared Task. We build off of the baseline systems, performing exact inference on models trained on language family data. Our systems return the globally best solution under these models. Our two systems achieve 80.9% and 75.6% accuracy on the test set. We ultimately find that, in this setting, exact inference does not seem to help or hinder the performance of morphological inflection generators, which stands in contrast to its affect on Neural Machine Translation (NMT) models.","PeriodicalId":186158,"journal":{"name":"Special Interest Group on Computational Morphology and Phonology Workshop","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114902020","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Leveraging Principal Parts for Morphological Inflection 利用主成分进行形态变化
Special Interest Group on Computational Morphology and Phonology Workshop Pub Date : 2020-07-01 DOI: 10.18653/v1/2020.sigmorphon-1.17
L. Liu, Mans Hulden
{"title":"Leveraging Principal Parts for Morphological Inflection","authors":"L. Liu, Mans Hulden","doi":"10.18653/v1/2020.sigmorphon-1.17","DOIUrl":"https://doi.org/10.18653/v1/2020.sigmorphon-1.17","url":null,"abstract":"This paper presents the submission by the CU Ling team from the University of Colorado to SIGMORPHON 2020 shared task 0 on morphological inflection. The task is to generate the target inflected word form given a lemma form and a target morphosyntactic description. Our system uses the Transformer architecture. Our overall approach is to treat the morphological inflection task as a paradigm cell filling problem and to design the system to leverage principal parts information for better morphological inflection when the training data is limited. We train one model for each language separately without external data. The overall average performance of our submission ranks the first in both average accuracy and Levenshtein distance from the gold inflection among all submissions including those using external resources.","PeriodicalId":186158,"journal":{"name":"Special Interest Group on Computational Morphology and Phonology Workshop","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130785107","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Multi-Tiered Strictly Local Functions 多层严格局部函数
Special Interest Group on Computational Morphology and Phonology Workshop Pub Date : 2020-07-01 DOI: 10.18653/v1/2020.sigmorphon-1.29
Phillip Burness, Kevin McMullin
{"title":"Multi-Tiered Strictly Local Functions","authors":"Phillip Burness, Kevin McMullin","doi":"10.18653/v1/2020.sigmorphon-1.29","DOIUrl":"https://doi.org/10.18653/v1/2020.sigmorphon-1.29","url":null,"abstract":"Tier-based Strictly Local functions, as they have so far been defined, are equipped with just a single tier. In light of this fact, they are currently incapable of modelling simultaneous phonological processes that would require different tiers. In this paper we consider whether and how we can allow a single function to operate over more than one tier. We conclude that multiple tiers can and should be permitted, but that the relationships between them must be restricted in some way to avoid overgeneration. The particular restriction that we propose comes in two parts. First, each input element is associated with a set of tiers that on their own can fully determine what the element is mapped to. Second, the set of tiers associated to a given input element must form a strict superset-subset hierarchy. In this way, we can track multiple, related sources of information when deciding how to process a particular input element. We demonstrate that doing so enables simple and intuitive analyses to otherwise challenging phonological phenomena.","PeriodicalId":186158,"journal":{"name":"Special Interest Group on Computational Morphology and Phonology Workshop","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128991090","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
KU-CST at the SIGMORPHON 2020 Task 2 on Unsupervised Morphological Paradigm Completion KU-CST在SIGMORPHON 2020任务2中的无监督形态范式完成
Special Interest Group on Computational Morphology and Phonology Workshop Pub Date : 2020-07-01 DOI: 10.18653/v1/2020.sigmorphon-1.11
Manex Agirrezabal, Jürgen Wedekind
{"title":"KU-CST at the SIGMORPHON 2020 Task 2 on Unsupervised Morphological Paradigm Completion","authors":"Manex Agirrezabal, Jürgen Wedekind","doi":"10.18653/v1/2020.sigmorphon-1.11","DOIUrl":"https://doi.org/10.18653/v1/2020.sigmorphon-1.11","url":null,"abstract":"We present a model for the unsupervised dis- covery of morphological paradigms. The goal of this model is to induce morphological paradigms from the bible (raw text) and a list of lemmas. We have created a model that splits each lemma in a stem and a suffix, and then we try to create a plausible suffix list by con- sidering lemma pairs. Our model was not able to outperform the official baseline, and there is still room for improvement, but we believe that the ideas presented here are worth considering.","PeriodicalId":186158,"journal":{"name":"Special Interest Group on Computational Morphology and Phonology Workshop","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114136943","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Linguist vs. Machine: Rapid Development of Finite-State Morphological Grammars 语言学家与机器:有限状态形态语法的快速发展
Special Interest Group on Computational Morphology and Phonology Workshop Pub Date : 2020-07-01 DOI: 10.18653/v1/2020.sigmorphon-1.18
Sarah Beemer, Zak Boston, April Bukoski, Daniel Chen, P. Dickens, Andrew Gerlach, Torin Hopkins, Parth Anand Jawale, Chris Koski, Akanksha Malhotra, Piyush Mishra, S. Muradoglu, Lan Sang, Tyler Short, Sagarika Shreevastava, Eliza Spaulding, Testumichi Umada, Beilei Xiang, Changbing Yang, Mans Hulden
{"title":"Linguist vs. Machine: Rapid Development of Finite-State Morphological Grammars","authors":"Sarah Beemer, Zak Boston, April Bukoski, Daniel Chen, P. Dickens, Andrew Gerlach, Torin Hopkins, Parth Anand Jawale, Chris Koski, Akanksha Malhotra, Piyush Mishra, S. Muradoglu, Lan Sang, Tyler Short, Sagarika Shreevastava, Eliza Spaulding, Testumichi Umada, Beilei Xiang, Changbing Yang, Mans Hulden","doi":"10.18653/v1/2020.sigmorphon-1.18","DOIUrl":"https://doi.org/10.18653/v1/2020.sigmorphon-1.18","url":null,"abstract":"Sequence-to-sequence models have proven to be highly successful in learning morphological inflection from examples as the series of SIGMORPHON/CoNLL shared tasks have shown. It is usually assumed, however, that a linguist working with inflectional examples could in principle develop a gold standard-level morphological analyzer and generator that would surpass a trained neural network model in accuracy of predictions, but that it may require significant amounts of human labor. In this paper, we discuss an experiment where a group of people with some linguistic training develop 25+ grammars as part of the shared task and weigh the cost/benefit ratio of developing grammars by hand. We also present tools that can help linguists triage difficult complex morphophonological phenomena within a language and hypothesize inflectional class membership. We conclude that a significant development effort by trained linguists to analyze and model morphophonological patterns are required in order to surpass the accuracy of neural models.","PeriodicalId":186158,"journal":{"name":"Special Interest Group on Computational Morphology and Phonology Workshop","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114215390","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Exploring Neural Architectures And Techniques For Typologically Diverse Morphological Inflection 探索类型学上多样形态变化的神经结构和技术
Special Interest Group on Computational Morphology and Phonology Workshop Pub Date : 2020-07-01 DOI: 10.18653/v1/2020.sigmorphon-1.14
P. Jayarao, Siddhanth Pillay, P. Thombre, Aditi Chaudhary
{"title":"Exploring Neural Architectures And Techniques For Typologically Diverse Morphological Inflection","authors":"P. Jayarao, Siddhanth Pillay, P. Thombre, Aditi Chaudhary","doi":"10.18653/v1/2020.sigmorphon-1.14","DOIUrl":"https://doi.org/10.18653/v1/2020.sigmorphon-1.14","url":null,"abstract":"Morphological inflection in low resource languages is critical to augment existing corpora in Low Resource Languages, which can help develop several applications in these languages with very good social impact. We describe our attention-based encoder-decoder approach that we implement using LSTMs and Transformers as the base units. We also describe the ancillary techniques that we experimented with, such as hallucination, language vector injection, sparsemax loss and adversarial language network alongside our approach to select the related language(s) for training. We present the results we generated on the constrained as well as unconstrained SIGMORPHON 2020 dataset (CITATION). One of the primary goals of our paper was to study the contribution varied components described above towards the performance of our system and perform an analysis on the same.","PeriodicalId":186158,"journal":{"name":"Special Interest Group on Computational Morphology and Phonology Workshop","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114444865","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
One-Size-Fits-All Multilingual Models 一刀切的多语言模型
Special Interest Group on Computational Morphology and Phonology Workshop Pub Date : 2020-07-01 DOI: 10.18653/v1/2020.sigmorphon-1.4
Ben Peters, André F. T. Martins
{"title":"One-Size-Fits-All Multilingual Models","authors":"Ben Peters, André F. T. Martins","doi":"10.18653/v1/2020.sigmorphon-1.4","DOIUrl":"https://doi.org/10.18653/v1/2020.sigmorphon-1.4","url":null,"abstract":"This paper presents DeepSPIN’s submissions to Tasks 0 and 1 of the SIGMORPHON 2020 Shared Task. For both tasks, we present multilingual models, training jointly on data in all languages. We perform no language-specific hyperparameter tuning – each of our submissions uses the same model for all languages. Our basic architecture is the sparse sequence-to-sequence model with entmax attention and loss, which allows our models to learn sparse, local alignments while still being trainable with gradient-based techniques. For Task 1, we achieve strong performance with both RNN- and transformer-based sparse models. For Task 0, we extend our RNN-based model to a multi-encoder set-up in which separate modules encode the lemma and inflection sequences. Despite our models’ lack of language-specific tuning, they tie for first in Task 0 and place third in Task 1.","PeriodicalId":186158,"journal":{"name":"Special Interest Group on Computational Morphology and Phonology Workshop","volume":"181 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124541513","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信