Classifying respondent comments from the 2021 Canadian Census of Population using machine learning methods1

Q3 Decision Sciences
Joanne Yoon
{"title":"Classifying respondent comments from the 2021 Canadian Census of Population using machine learning methods1","authors":"Joanne Yoon","doi":"10.3233/sji-230063","DOIUrl":null,"url":null,"abstract":"To improve the analysis of respondent comments from the Canadian Census of Population, data scientists at Statistics Canada compared and evaluated traditional machine learning, deep learning and transformer-based techniques. Cross-lingual Language Model-Robustly Optimized Bidirectional Encoder Representations from Transformers (XLM-R), a cross-lingual language model, fine-tuned on census respondent comments yield the best result of 89.91% F1 score overall despite language and class imbalances. Following the evaluation, the fine-tuned model was implemented successfully to objectively categorize comments from the 2021 Census of Population, with high accuracy. As a result, feedback from respondents was directed to the appropriate subject matter analysts, for them to analyze post-collection.","PeriodicalId":55877,"journal":{"name":"Statistical Journal of the IAOS","volume":"46 6","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Statistical Journal of the IAOS","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3233/sji-230063","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Decision Sciences","Score":null,"Total":0}
引用次数: 0

Abstract

To improve the analysis of respondent comments from the Canadian Census of Population, data scientists at Statistics Canada compared and evaluated traditional machine learning, deep learning and transformer-based techniques. Cross-lingual Language Model-Robustly Optimized Bidirectional Encoder Representations from Transformers (XLM-R), a cross-lingual language model, fine-tuned on census respondent comments yield the best result of 89.91% F1 score overall despite language and class imbalances. Following the evaluation, the fine-tuned model was implemented successfully to objectively categorize comments from the 2021 Census of Population, with high accuracy. As a result, feedback from respondents was directed to the appropriate subject matter analysts, for them to analyze post-collection.
使用机器学习方法对 2021 年加拿大人口普查受访者的意见进行分类1
为了改进对加拿大人口普查受访者意见的分析,加拿大统计局的数据科学家对传统的机器学习、深度学习和基于变换器的技术进行了比较和评估。尽管存在语言和类别不平衡的问题,但对人口普查受访者评论进行微调的跨语言语言模型--基于变换器的双向编码器表征(XLM-R)取得了 89.91% 的 F1 总分的最佳结果。评估结束后,经过微调的模型被成功用于对 2021 年人口普查的评论进行客观分类,准确率很高。因此,受访者的反馈意见被转给了相应的主题分析师,以便他们在收集后进行分析。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Statistical Journal of the IAOS
Statistical Journal of the IAOS Economics, Econometrics and Finance-Economics and Econometrics
CiteScore
1.30
自引率
0.00%
发文量
116
期刊介绍: This is the flagship journal of the International Association for Official Statistics and is expected to be widely circulated and subscribed to by individuals and institutions in all parts of the world. The main aim of the Journal is to support the IAOS mission by publishing articles to promote the understanding and advancement of official statistics and to foster the development of effective and efficient official statistical services on a global basis. Papers are expected to be of wide interest to readers. Such papers may or may not contain strictly original material. All papers are refereed.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信