Moving to continuous classifications of bilingualism through machine learning trained on language production

IF 2.5 1区 文学 Q1 LINGUISTICS
M. I. Coco, G. Smith, R. Spelorzi, M. Garraffa
{"title":"Moving to continuous classifications of bilingualism through machine learning trained on language production","authors":"M. I. Coco, G. Smith, R. Spelorzi, M. Garraffa","doi":"10.1017/s1366728924000361","DOIUrl":null,"url":null,"abstract":"<p>Recent conceptualisations of bilingualism are moving away from strict categorisations, towards continuous approaches. This study supports this trend by combining empirical psycholinguistics data with machine learning classification modelling. Support vector classifiers were trained on two datasets of coded productions by Italian speakers to predict the class they belonged to (“monolingual”, “attriters” and “heritage”). All classes can be predicted above chance (&gt;33%), even if the classifier's performance substantially varies, with monolinguals identified much better (<span>f</span>-score &gt;70%) than attriters (<span>f</span>-score &lt;50%), which are instead the most confusable class. Further analyses of the classification errors expressed in the confusion matrices qualify that attriters are identified as heritage speakers nearly as often as they are correctly classified. Cluster clitics are the most identifying features for the classification performance. Overall, this study supports a conceptualisation of bilingualism as a continuum of linguistic behaviours rather than sets of a priori established classes.</p>","PeriodicalId":8758,"journal":{"name":"Bilingualism: Language and Cognition","volume":null,"pages":null},"PeriodicalIF":2.5000,"publicationDate":"2024-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bilingualism: Language and Cognition","FirstCategoryId":"98","ListUrlMain":"https://doi.org/10.1017/s1366728924000361","RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"LINGUISTICS","Score":null,"Total":0}
引用次数: 0

Abstract

Recent conceptualisations of bilingualism are moving away from strict categorisations, towards continuous approaches. This study supports this trend by combining empirical psycholinguistics data with machine learning classification modelling. Support vector classifiers were trained on two datasets of coded productions by Italian speakers to predict the class they belonged to (“monolingual”, “attriters” and “heritage”). All classes can be predicted above chance (>33%), even if the classifier's performance substantially varies, with monolinguals identified much better (f-score >70%) than attriters (f-score <50%), which are instead the most confusable class. Further analyses of the classification errors expressed in the confusion matrices qualify that attriters are identified as heritage speakers nearly as often as they are correctly classified. Cluster clitics are the most identifying features for the classification performance. Overall, this study supports a conceptualisation of bilingualism as a continuum of linguistic behaviours rather than sets of a priori established classes.

通过对语言生产进行机器学习训练,实现双语的连续分类
最近对双语的概念化正在从严格的分类转向连续的方法。本研究将心理语言学实证数据与机器学习分类建模相结合,支持了这一趋势。支持向量分类器在两个数据集上进行了训练,这些数据集包含了意大利语使用者的编码作品,用于预测他们所属的类别("单语"、"外来语 "和 "遗产")。所有类别的预测结果都高于概率(33%),即使分类器的性能差异很大,单语者的识别率(f-score >70%)远高于后裔(f-score <50%),后裔反而是最容易混淆的类别。对混淆矩阵中的分类错误进行进一步分析后发现,外来语使用者被识别为遗产语使用者的频率几乎与他们被正确分类的频率相同。聚类词是最能识别分类结果的特征。总之,这项研究支持将双语作为语言行为的连续体,而不是先验的既定类别。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
8.90
自引率
16.70%
发文量
86
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信