Developing and validating a mid-frequency word list for chemistry: a corpus-based approach using big data

IF 1.5 Q2 EDUCATION & EDUCATIONAL RESEARCH
Ismail Xodabande, Mahmood Reza Atai, Mohammad R. Hashemi, Paul Thompson
{"title":"Developing and validating a mid-frequency word list for chemistry: a corpus-based approach using big data","authors":"Ismail Xodabande, Mahmood Reza Atai, Mohammad R. Hashemi, Paul Thompson","doi":"10.1186/s40862-023-00205-5","DOIUrl":null,"url":null,"abstract":"Abstract Given the importance of specialized vocabulary in scientific communication and academic discourse, there is a growing need to create wordlists to address the vocabulary-learning needs of university students and researchers in different subject areas. The current study analyzed a corpus of chemistry research articles (with 278 million running words) to establish a mid-frequency vocabulary list for this field. Using frequency, range, and dispersion criteria, the study identified 560 lemmas in the fourth to the ninth British National Corpus/Corpus of Contemporary American English (BNC/COCA) lists that provided 6.4% coverage of all words in the corpus. The list was validated using specialized and general corpora, and the results confirmed the value and relevance of the items for chemistry. Moreover, for using the list for pedagogical goals, the vocabulary items were divided into five bands based on their coverage and importance. The 100 words in the first band were the most important mid-frequent vocabulary in chemistry, as they provided 3.05% coverage. The study highlights the significant contribution of mid-frequency words in research articles and the findings have implications for using large corpora as a big data source in identifying specialized and field-specific vocabulary.","PeriodicalId":36383,"journal":{"name":"Asian-Pacific Journal of Second and Foreign Language Education","volume":"20 1","pages":"0"},"PeriodicalIF":1.5000,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Asian-Pacific Journal of Second and Foreign Language Education","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1186/s40862-023-00205-5","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"EDUCATION & EDUCATIONAL RESEARCH","Score":null,"Total":0}
引用次数: 0

Abstract

Abstract Given the importance of specialized vocabulary in scientific communication and academic discourse, there is a growing need to create wordlists to address the vocabulary-learning needs of university students and researchers in different subject areas. The current study analyzed a corpus of chemistry research articles (with 278 million running words) to establish a mid-frequency vocabulary list for this field. Using frequency, range, and dispersion criteria, the study identified 560 lemmas in the fourth to the ninth British National Corpus/Corpus of Contemporary American English (BNC/COCA) lists that provided 6.4% coverage of all words in the corpus. The list was validated using specialized and general corpora, and the results confirmed the value and relevance of the items for chemistry. Moreover, for using the list for pedagogical goals, the vocabulary items were divided into five bands based on their coverage and importance. The 100 words in the first band were the most important mid-frequent vocabulary in chemistry, as they provided 3.05% coverage. The study highlights the significant contribution of mid-frequency words in research articles and the findings have implications for using large corpora as a big data source in identifying specialized and field-specific vocabulary.

Abstract Image

开发和验证化学中频词表:基于语料库的大数据方法
鉴于专业词汇在科学交流和学术话语中的重要性,越来越需要创建词汇表来满足不同学科领域的大学生和研究人员的词汇学习需求。本研究分析了化学研究文章的语料库(含2.78亿运行词),建立了该领域的中频词汇表。使用频率、范围和分散标准,该研究确定了第4到第9个英国国家语料库/当代美国英语语料库(BNC/COCA)列表中的560个词,这些词占语料库中所有单词的6.4%。使用专业和通用语料库验证了该列表,结果证实了化学项目的价值和相关性。此外,为了实现教学目标,根据词汇的覆盖范围和重要性,将词汇表分为五个等级。第一个频带的100个单词是化学中最重要的中频词汇,覆盖率为3.05%。该研究强调了中频词在研究文章中的重要贡献,研究结果对使用大型语料库作为识别专业和特定领域词汇的大数据源具有启示意义。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Asian-Pacific Journal of Second and Foreign Language Education
Asian-Pacific Journal of Second and Foreign Language Education Arts and Humanities-Language and Linguistics
CiteScore
2.90
自引率
5.60%
发文量
40
审稿时长
5 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信