Special Interest Group on Computational Morphology and Phonology Workshop最新文献

Colexifications for Bootstrapping Cross-lingual Datasets: The Case of Phonology, Concreteness, and Affectiveness 自举跨语言数据集的共化:音系、具体和情感的案例

Special Interest Group on Computational Morphology and Phonology Workshop Pub Date : 2023-06-05 DOI: 10.48550/arXiv.2306.02646

Yiyi Chen, Johannes Bjerva

{"title":"Colexifications for Bootstrapping Cross-lingual Datasets: The Case of Phonology, Concreteness, and Affectiveness","authors":"Yiyi Chen, Johannes Bjerva","doi":"10.48550/arXiv.2306.02646","DOIUrl":"https://doi.org/10.48550/arXiv.2306.02646","url":null,"abstract":"Colexification refers to the linguistic phenomenon where a single lexical form is used to convey multiple meanings. By studying cross-lingual colexifications, researchers have gained valuable insights into fields such as psycholinguistics and cognitive sciences (Jack- son et al., 2019; Xu et al., 2020; Karjus et al., 2021; Schapper and Koptjevskaja-Tamm, 2022; FranÃ§ois, 2022). While several multilingual colexification datasets exist, there is untapped potential in using this information to bootstrap datasets across such semantic features. In this paper, we aim to demonstrate how colexifications can be leveraged to create such cross-lingual datasets. We showcase curation procedures which result in a dataset covering 142 languages across 21 language families across the world. The dataset includes ratings of concreteness and affectiveness, mapped with phonemes and phonological features. We further analyze the dataset along different dimensions to demonstrate potential of the proposed procedures in facilitating further interdisciplinary research in psychology, cognitive science, and multilingual natural language processing (NLP). Based on initial investigations, we observe that i) colexifications that are closer in concreteness/affectiveness are more likely to colexify ; ii) certain initial/last phonemes are significantly correlated with concreteness/affectiveness intra language families, such as /k/ as the initial phoneme in both Turkic and Tai-Kadai correlated with concreteness, and /p/ in Dravidian and Sino-Tibetan correlated with Valence; iii) the type-to-token ratio (TTR) of phonemes are positively correlated with concreteness across several language families, while the length of phoneme segments are negatively correlated with concreteness; iv) certain phonological features are negatively correlated with concreteness across languages. The dataset is made public online for further research.","PeriodicalId":186158,"journal":{"name":"Special Interest Group on Computational Morphology and Phonology Workshop","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123786843","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Transliteration for Cross-Lingual Morphological Inflection 跨语言形态变化的音译

Special Interest Group on Computational Morphology and Phonology Workshop Pub Date : 2020-07-01 DOI: 10.18653/v1/2020.sigmorphon-1.22

Nikitha Murikinati, Antonios Anastasopoulos, Graham Neubig

引用次数: 15

CLUZH at SIGMORPHON 2020 Shared Task on Multilingual Grapheme-to-Phoneme Conversion 多语言字素到音素转换的共享任务

Special Interest Group on Computational Morphology and Phonology Workshop Pub Date : 2020-07-01 DOI: 10.18653/v1/2020.sigmorphon-1.19

Peter Makarov, S. Clematide

引用次数: 12

Frustratingly Easy Multilingual Grapheme-to-Phoneme Conversion 令人沮丧的简单多语言字母到音素转换

Special Interest Group on Computational Morphology and Phonology Workshop Pub Date : 2020-07-01 DOI: 10.18653/v1/2020.sigmorphon-1.13

Nikhil Prabhu, Katharina Kann

引用次数: 4

Joint learning of constraint weights and gradient inputs in Gradient Symbolic Computation with constrained optimization 基于约束优化的梯度符号计算中约束权值与梯度输入的联合学习

Special Interest Group on Computational Morphology and Phonology Workshop Pub Date : 2020-07-01 DOI: 10.18653/v1/2020.sigmorphon-1.27

Max Nelson

引用次数: 2

The SIGMORPHON 2020 Shared Task on Multilingual Grapheme-to-Phoneme Conversion 多语言字素到音素转换的SIGMORPHON 2020共享任务

Special Interest Group on Computational Morphology and Phonology Workshop Pub Date : 2020-07-01 DOI: 10.18653/v1/2020.sigmorphon-1.2

Kyle Gorman, Lucas F. E. Ashby, Aaron Goyzueta, Arya D. McCarthy, Shijie Wu, Daniel You

引用次数: 48

University of Illinois Submission to the SIGMORPHON 2020 Shared Task 0: Typologically Diverse Morphological Inflection 伊利诺伊大学提交给SIGMORPHON 2020共享任务0:类型学上多样的形态变化

Special Interest Group on Computational Morphology and Phonology Workshop Pub Date : 2020-07-01 DOI: 10.18653/v1/2020.sigmorphon-1.15

Marc E. Canby, A. Karipbayeva, B. Lunt, Sahand Mozaffari, Charlotte Yoder, J. Hockenmaier

引用次数: 4

The CMU-LTI submission to the SIGMORPHON 2020 Shared Task 0: Language-Specific Cross-Lingual Transfer CMU-LTI提交给SIGMORPHON 2020共享任务0:特定语言的跨语言迁移

Special Interest Group on Computational Morphology and Phonology Workshop Pub Date : 2020-07-01 DOI: 10.18653/v1/2020.sigmorphon-1.6

Nikitha Murikinati, Antonios Anastasopoulos

引用次数: 3

Grapheme-to-Phoneme Conversion with a Multilingual Transformer Model 使用多语言转换模型的字素到音素转换

Special Interest Group on Computational Morphology and Phonology Workshop Pub Date : 2020-07-01 DOI: 10.18653/v1/2020.sigmorphon-1.7

Omnia S. ElSaadany, Benjamin Suter

引用次数: 6

Low-Resource G2P and P2G Conversion with Synthetic Training Data 基于综合训练数据的低资源G2P和P2G转换

Special Interest Group on Computational Morphology and Phonology Workshop Pub Date : 2020-07-01 DOI: 10.18653/v1/2020.sigmorphon-1.12

B. Hauer, Amir Ahmad Habibi, Yixing Luan, Arnob Mallik, Grzegorz Kondrak

引用次数: 6