Do boys and girls write the same? Analysis of n-grams of morphological categories (¿Niños y niñas escriben igual? Análisis de n-gramas de categorías morfológicas)

IF 1.1 4区 教育学 Q3 EDUCATION & EDUCATIONAL RESEARCH
Sheila Queralt, Jordi Cicres
{"title":"Do boys and girls write the same? Analysis of n-grams of morphological categories (¿Niños y niñas escriben igual? Análisis de n-gramas de categorías morfológicas)","authors":"Sheila Queralt, Jordi Cicres","doi":"10.1080/11356405.2022.2121130","DOIUrl":null,"url":null,"abstract":"ABSTRACT The objective of this study is to characterize writing samples in Catalan written by boys and girls in primary school (from seven to 12 years old) using syntactic patterns. The corpus contains 169 writings divided by sex (76 boys and 93 girls) with an average of 200 words and a total length of 33,763 words. From this corpus, we calculated the 40 n-grams of the most frequent morphological categories (bigrams, trigrams). The data were statistically analysed using ANOVA and Linear Discriminant Analysis, and the accuracy in predicting the writer’s gender in a cross-validation experiment was 60.4% using both bigrams and trigrams. When the children’s age was taken into account, the percentage of accuracy was higher than 70% in both the original classification and the cross-validation. The identification of the most discriminating bigrams and trigrams allowed us to determine that girls show a greater expressive capacity and superior syntactic maturity, and greater lexical and syntactic richness.","PeriodicalId":51688,"journal":{"name":"Culture and Education","volume":"11 1","pages":"33 - 63"},"PeriodicalIF":1.1000,"publicationDate":"2022-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Culture and Education","FirstCategoryId":"95","ListUrlMain":"https://doi.org/10.1080/11356405.2022.2121130","RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"EDUCATION & EDUCATIONAL RESEARCH","Score":null,"Total":0}
引用次数: 0

Abstract

ABSTRACT The objective of this study is to characterize writing samples in Catalan written by boys and girls in primary school (from seven to 12 years old) using syntactic patterns. The corpus contains 169 writings divided by sex (76 boys and 93 girls) with an average of 200 words and a total length of 33,763 words. From this corpus, we calculated the 40 n-grams of the most frequent morphological categories (bigrams, trigrams). The data were statistically analysed using ANOVA and Linear Discriminant Analysis, and the accuracy in predicting the writer’s gender in a cross-validation experiment was 60.4% using both bigrams and trigrams. When the children’s age was taken into account, the percentage of accuracy was higher than 70% in both the original classification and the cross-validation. The identification of the most discriminating bigrams and trigrams allowed us to determine that girls show a greater expressive capacity and superior syntactic maturity, and greater lexical and syntactic richness.
男孩和女孩写同样的东西吗?n克形态分类分析(男孩和女孩写一样吗?形态类别n-gram分析)
摘要:本研究的目的是用语法模式来描述小学(7至12岁)男孩和女孩写的加泰罗尼亚语写作样本。该语料库包含169个按性别划分的作品(76个男生和93个女生),平均200个单词,总长度为33763个单词。从这个语料库中,我们计算了40个n-grams的最常见的形态类别(双元,三元)。采用方差分析(ANOVA)和线性判别分析(Linear Discriminant Analysis)对数据进行统计分析,双组和三组交叉验证实验预测作者性别的准确率为60.4%。当考虑儿童的年龄时,原始分类和交叉验证的准确率百分比均高于70%。鉴别最具辨别力的双字和三字使我们确定女孩表现出更大的表达能力和更优越的句法成熟度,以及更大的词汇和句法丰富性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Culture and Education
Culture and Education EDUCATION & EDUCATIONAL RESEARCH-
CiteScore
2.00
自引率
9.10%
发文量
41
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信