Proceedings of the 4th Workshop on Research in Computational Linguistic Typology and Multilingual NLP最新文献

筛选
英文 中文
Tweaking UD Annotations to Investigate the Placement of Determiners, Quantifiers and Numerals in the Noun Phrase 调整UD注释以研究名词短语中限定词、量词和数词的位置
Luigi Talamo
{"title":"Tweaking UD Annotations to Investigate the Placement of Determiners, Quantifiers and Numerals in the Noun Phrase","authors":"Luigi Talamo","doi":"10.18653/v1/2022.sigtyp-1.5","DOIUrl":"https://doi.org/10.18653/v1/2022.sigtyp-1.5","url":null,"abstract":"We describe a methodology to extract with finer accuracy word order patterns from texts automatically annotated with Universal Dependency (UD) trained parsers. We use the methodology to quantify the word order entropy of determiners, quantifiers and numerals in ten Indo-European languages, using UD-parsed texts from a parallel corpus of prosaic texts. Our results suggest that the combinations of different UD annotation layers, such as UD Relations, Universal Parts of Speech and lemma, and the introduction of language-specific lists of closed-category lemmata has the two-fold effect of improving the quality of analysis and unveiling hidden areas of variability in word order patterns.","PeriodicalId":255232,"journal":{"name":"Proceedings of the 4th Workshop on Research in Computational Linguistic Typology and Multilingual NLP","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115344856","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bayesian Phylogenetic Cognate Prediction 贝叶斯系统发育同源预测
Gerhard Jäger
{"title":"Bayesian Phylogenetic Cognate Prediction","authors":"Gerhard Jäger","doi":"10.18653/v1/2022.sigtyp-1.8","DOIUrl":"https://doi.org/10.18653/v1/2022.sigtyp-1.8","url":null,"abstract":"In Jäger (2019) a computational framework was defined to start from parallel word lists of related languages and infer the corresponding vocabulary of the shared proto-language. The SIGTYP 2022 Shared Task is closely related. The main difference is that what is to be reconstructed is not the proto-form but an unknown word from an extant language. The system described here is a re-implementation of the tools used in the mentioned paper, adapted to the current task.","PeriodicalId":255232,"journal":{"name":"Proceedings of the 4th Workshop on Research in Computational Linguistic Typology and Multilingual NLP","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125774189","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Mockingbird at the SIGTYP 2022 Shared Task: Two Types of Models forthe Prediction of Cognate Reflexes 模仿鸟在SIGTYP 2022共享任务:同源反射预测的两种模型
Christo Kirov, R. Sproat, Alexander Gutkin
{"title":"Mockingbird at the SIGTYP 2022 Shared Task: Two Types of Models forthe Prediction of Cognate Reflexes","authors":"Christo Kirov, R. Sproat, Alexander Gutkin","doi":"10.18653/v1/2022.sigtyp-1.9","DOIUrl":"https://doi.org/10.18653/v1/2022.sigtyp-1.9","url":null,"abstract":"The SIGTYP 2022 shared task concerns the problem of word reflex generation in a target language, given cognate words from a subset of related languages. We present two systems to tackle this problem, covering two very different modeling approaches. The first model extends transformer-based encoder-decoder sequence-to-sequence modeling, by encoding all available input cognates in parallel, and having the decoder attend to the resulting joint representation during inference. The second approach takes inspiration from the field of image restoration, where models are tasked with recovering pixels in an image that have been masked out. For reflex generation, the missing reflexes are treated as “masked pixels” in an “image” which is a representation of an entire cognate set across a language family. As in the image restoration case, cognate restoration is performed with a convolutional network.","PeriodicalId":255232,"journal":{"name":"Proceedings of the 4th Workshop on Research in Computational Linguistic Typology and Multilingual NLP","volume":"156 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116413317","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Multilingualism Encourages Recursion: a Transfer Study with mBERT 多语促进递归:基于mBERT的迁移研究
Andrea de Varda, Roberto Zamparelli
{"title":"Multilingualism Encourages Recursion: a Transfer Study with mBERT","authors":"Andrea de Varda, Roberto Zamparelli","doi":"10.18653/v1/2022.sigtyp-1.1","DOIUrl":"https://doi.org/10.18653/v1/2022.sigtyp-1.1","url":null,"abstract":"The present work constitutes an attempt to investigate the relational structures learnt by mBERT, a multilingual transformer-based network, with respect to different cross-linguistic regularities proposed in the fields of theoretical and quantitative linguistics. We pursued this objective by relying on a zero-shot transfer experiment, evaluating the model’s ability to generalize its native task to artificial languages that could either respect or violate some proposed language universal, and comparing its performance to the output of BERT, a monolingual model with an identical configuration. We created four artificial corpora through a Probabilistic Context-Free Grammar by manipulating the distribution of tokens and the structure of their dependency relations. We showed that while both models were favoured by a Zipfian distribution of the tokens and by the presence of head-dependency type structures, the multilingual transformer network exhibited a stronger reliance on hierarchical cues compared to its monolingual counterpart.","PeriodicalId":255232,"journal":{"name":"Proceedings of the 4th Workshop on Research in Computational Linguistic Typology and Multilingual NLP","volume":"105 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132274460","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
PaVeDa - Pavia Verbs Database: Challenges and Perspectives PaVeDa - Pavia动词数据库:挑战和前景
C. Zanchi, S. Luraghi, Claudia Roberta Combei
{"title":"PaVeDa - Pavia Verbs Database: Challenges and Perspectives","authors":"C. Zanchi, S. Luraghi, Claudia Roberta Combei","doi":"10.18653/v1/2022.sigtyp-1.14","DOIUrl":"https://doi.org/10.18653/v1/2022.sigtyp-1.14","url":null,"abstract":"This paper describes an ongoing endeavor to construct Pavia Verbs Database (PaVeDa) – an open-access typological resource that builds upon previous work on verb argument structure, in particular the Valency Patterns Leipzig (ValPaL) project (Hartmann et al., 2013). The PaVeDa database features four major innovations as compared to the ValPaL database: (i) it includes data from ancient languages enabling diachronic research; (ii) it expands the language sample to language families that are not represented in the ValPaL; (iii) it is linked to external corpora that are used as sources of usage-based examples of stored patterns; (iv) it introduces a new cross-linguistic layer of annotation for valency patterns which allows for contrastive data visualization.","PeriodicalId":255232,"journal":{"name":"Proceedings of the 4th Workshop on Research in Computational Linguistic Typology and Multilingual NLP","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126646108","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Typological Word Order Correlations with Logistic Brownian Motion 类型语序与Logistic布朗运动的相关性
Kai Hartung, Gerhard Jäger, Sören Gröttrup, Munir Georges
{"title":"Typological Word Order Correlations with Logistic Brownian Motion","authors":"Kai Hartung, Gerhard Jäger, Sören Gröttrup, Munir Georges","doi":"10.18653/v1/2022.sigtyp-1.3","DOIUrl":"https://doi.org/10.18653/v1/2022.sigtyp-1.3","url":null,"abstract":"In this study we address the question to what extent syntactic word-order traits of different languages have evolved under correlation and whether such dependencies can be found universally across all languages or restricted to specific language families.To do so, we use logistic Brownian Motion under a Bayesian framework to model the trait evolution for 768 languages from 34 language families. We test for trait correlations both in single families and universally over all families.Separate models reveal no universal correlation patterns and Bayes Factor analysis of models over all covered families also strongly indicate lineage specific correlation patters instead of universal dependencies.","PeriodicalId":255232,"journal":{"name":"Proceedings of the 4th Workshop on Research in Computational Linguistic Typology and Multilingual NLP","volume":"141 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132589890","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The SIGTYP 2022 Shared Task on the Prediction of Cognate Reflexes 关于同源反射预测的SIGTYP 2022共享任务
Johann-Mattis List, Ekaterina Vylomova, Robert Forkel, Nathan Hill, Ryan Cotterell
{"title":"The SIGTYP 2022 Shared Task on the Prediction of Cognate Reflexes","authors":"Johann-Mattis List, Ekaterina Vylomova, Robert Forkel, Nathan Hill, Ryan Cotterell","doi":"10.18653/v1/2022.sigtyp-1.7","DOIUrl":"https://doi.org/10.18653/v1/2022.sigtyp-1.7","url":null,"abstract":"This study describes the structure and the results of the SIGTYP 2022 shared task on the prediction of cognate reflexes from multilingual wordlists. We asked participants to submit systems that would predict words in individual languages with the help of cognate words from related languages. Training and surprise data were based on standardized multilingual wordlists from several language families. Four teams submitted a total of eight systems, including both neural and non-neural systems, as well as systems adjusted to the task and systems using more general settings. While all systems showed a rather promising performance, reflecting the overwhelming regularity of sound change, the best performance throughout was achieved by a system based on convolutional networks originally designed for image restoration.","PeriodicalId":255232,"journal":{"name":"Proceedings of the 4th Workshop on Research in Computational Linguistic Typology and Multilingual NLP","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129255233","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
How Universal is Metonymy? Results from a Large-Scale Multilingual Analysis 转喻有多普遍?来自大规模多语言分析的结果
Temuulen Khishigsuren, Gábor Bella, T. Brochhagen, Daariimaa Marav, Fausto Giunchiglia, Khuyagbaatar Batsuren
{"title":"How Universal is Metonymy? Results from a Large-Scale Multilingual Analysis","authors":"Temuulen Khishigsuren, Gábor Bella, T. Brochhagen, Daariimaa Marav, Fausto Giunchiglia, Khuyagbaatar Batsuren","doi":"10.18653/v1/2022.sigtyp-1.13","DOIUrl":"https://doi.org/10.18653/v1/2022.sigtyp-1.13","url":null,"abstract":"Metonymy is regarded by most linguists as a universal cognitive phenomenon, especially since the emergence of the theory of conceptual mappings. However, the field data backing up claims of universality has not been large enough so far to provide conclusive evidence. We introduce a large-scale analysis of metonymy based on a lexical corpus of over 20 thousand metonymy instances from 189 languages and 69 genera. No prior study, to our knowledge, is based on linguistic coverage as broad as ours. Drawing on corpus analysis, evidence of universality is found at three levels: systematic metonymy in general, particular metonymy patterns, and specific metonymy concepts.","PeriodicalId":255232,"journal":{"name":"Proceedings of the 4th Workshop on Research in Computational Linguistic Typology and Multilingual NLP","volume":"404 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131507555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Investigating Information-Theoretic Properties of the Typology of Spatial Demonstratives 空间指示语类型的信息论性质研究
Sihan Chen, Richard Futrell, Kyle Mahowald
{"title":"Investigating Information-Theoretic Properties of the Typology of Spatial Demonstratives","authors":"Sihan Chen, Richard Futrell, Kyle Mahowald","doi":"10.18653/v1/2022.sigtyp-1.12","DOIUrl":"https://doi.org/10.18653/v1/2022.sigtyp-1.12","url":null,"abstract":"Using data from Nintemann et al. (2020), we explore the variability in complexity and informativity across spatial demonstrative systems using spatial deictic lexicons from 223 languages. We argue from an information-theoretic perspective (Shannon, 1948) that spatial deictic lexicons are efficient in communication, balancing informativity and complexity. Specifically, we find that under an appropriate choice of cost function and need probability over meanings, among all the 21146 theoretically possible spatial deictic lexicons, those adopted by real languages lie near an efficient frontier. Moreover, we find that the conditions that the need probability and the cost function need to satisfy are consistent with the cognitive science literature regarding the source-goal asymmetry. We also show that the data are better explained by introducing a notion of systematicity, which is not currently accounted for in Information Bottleneck approaches to linguistic efficiency.","PeriodicalId":255232,"journal":{"name":"Proceedings of the 4th Workshop on Research in Computational Linguistic Typology and Multilingual NLP","volume":"383 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129853442","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cross-linguistic Comparison of Linguistic Feature Encoding in BERT Models for Typologically Different Languages 不同类型语言BERT模型中语言特征编码的跨语言比较
Yulia Otmakhova, Karin M. Verspoor, Jey Han Lau
{"title":"Cross-linguistic Comparison of Linguistic Feature Encoding in BERT Models for Typologically Different Languages","authors":"Yulia Otmakhova, Karin M. Verspoor, Jey Han Lau","doi":"10.18653/v1/2022.sigtyp-1.4","DOIUrl":"https://doi.org/10.18653/v1/2022.sigtyp-1.4","url":null,"abstract":"Though recently there have been an increased interest in how pre-trained language models encode different linguistic features, there is still a lack of systematic comparison between languages with different morphology and syntax. In this paper, using BERT as an example of a pre-trained model, we compare how three typologically different languages (English, Korean, and Russian) encode morphology and syntax features across different layers. In particular, we contrast languages which differ in a particular aspect, such as flexibility of word order, head directionality, morphological type, presence of grammatical gender, and morphological richness, across four different tasks.","PeriodicalId":255232,"journal":{"name":"Proceedings of the 4th Workshop on Research in Computational Linguistic Typology and Multilingual NLP","volume":"106 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130997498","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信