论语料库衍生医学词表的可复制性

IF 2.1

Applied Corpus Linguistics Pub Date : 2025-04-16 DOI:10.1016/j.acorp.2025.100130

Cosmin Mihail Florescu, Ryosuke L. Ohniwa

{"title":"论语料库衍生医学词表的可复制性","authors":"Cosmin Mihail Florescu, Ryosuke L. Ohniwa","doi":"10.1016/j.acorp.2025.100130","DOIUrl":null,"url":null,"abstract":"<div><div>Several English medical vocabulary lists have been developed using corpora compiled from a variety of medical texts including research articles and medical textbooks. List items have been identified for inclusion using criteria mostly adopted from previous studies focused on academic vocabulary. This study aims to employ a systematic approach in compiling a corpus to create a medical word list for learners of English aiming to study or practice medicine in an English-speaking country. A large corpus of medical textbooks (CoMeT; 28,384,681 running words) was created using SketchEngine and analyzed to extract high-frequency lemmas. Keyness and dispersion values for each lemma were plotted in a histogram to visualize clustering patterns. This visual map was used to determine threshold values separating a medical vocabulary subset from a general vocabulary subset. The replicability of the findings was evaluated using two corpora (one medical, one non-medical) different from CoMeT. The newly developed list (Core Medical List; CoMeL) comprising a total of 2881 lemmas was found to include significantly more medicine-specific words and to have higher replicability compared to existing lists. CoMeL may assist learners and educators in English for Medical Purposes programs, including those aiming to undertake challenging medical licensing examinations in English-speaking countries.</div></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"5 2","pages":"Article 100130"},"PeriodicalIF":2.1000,"publicationDate":"2025-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"On the replicability of corpus-derived medical word lists\",\"authors\":\"Cosmin Mihail Florescu, Ryosuke L. Ohniwa\",\"doi\":\"10.1016/j.acorp.2025.100130\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Several English medical vocabulary lists have been developed using corpora compiled from a variety of medical texts including research articles and medical textbooks. List items have been identified for inclusion using criteria mostly adopted from previous studies focused on academic vocabulary. This study aims to employ a systematic approach in compiling a corpus to create a medical word list for learners of English aiming to study or practice medicine in an English-speaking country. A large corpus of medical textbooks (CoMeT; 28,384,681 running words) was created using SketchEngine and analyzed to extract high-frequency lemmas. Keyness and dispersion values for each lemma were plotted in a histogram to visualize clustering patterns. This visual map was used to determine threshold values separating a medical vocabulary subset from a general vocabulary subset. The replicability of the findings was evaluated using two corpora (one medical, one non-medical) different from CoMeT. The newly developed list (Core Medical List; CoMeL) comprising a total of 2881 lemmas was found to include significantly more medicine-specific words and to have higher replicability compared to existing lists. CoMeL may assist learners and educators in English for Medical Purposes programs, including those aiming to undertake challenging medical licensing examinations in English-speaking countries.</div></div>\",\"PeriodicalId\":72254,\"journal\":{\"name\":\"Applied Corpus Linguistics\",\"volume\":\"5 2\",\"pages\":\"Article 100130\"},\"PeriodicalIF\":2.1000,\"publicationDate\":\"2025-04-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Applied Corpus Linguistics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2666799125000139\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Corpus Linguistics","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666799125000139","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

一些英语医学词汇表已经开发使用语料库汇编从各种医学文本，包括研究文章和医学教科书。列表项目已确定纳入使用的标准，大多采用从以往的研究集中在学术词汇。本研究旨在采用系统的方法编制语料库，为在英语国家学习或实践医学的英语学习者创建一个医学词汇表。大量医学教科书(CoMeT；28,384,681个运行词)使用SketchEngine创建并分析以提取高频引理。每个引理的Keyness和dispersion值绘制在直方图中，以可视化聚类模式。该可视化地图用于确定将医学词汇子集与一般词汇子集分开的阈值。使用与CoMeT不同的两个语料库（一个是医学语料库，一个是非医学语料库）评估了研究结果的可重复性。新制定的清单(核心医疗清单；CoMeL)，共包含2881个引词，发现与现有列表相比，该列表包含了更多的医学特定词，并且具有更高的可复制性。CoMeL可以帮助学习者和教育工作者学习医学英语课程，包括那些旨在在英语国家进行具有挑战性的医学执照考试的学生。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

On the replicability of corpus-derived medical word lists

Several English medical vocabulary lists have been developed using corpora compiled from a variety of medical texts including research articles and medical textbooks. List items have been identified for inclusion using criteria mostly adopted from previous studies focused on academic vocabulary. This study aims to employ a systematic approach in compiling a corpus to create a medical word list for learners of English aiming to study or practice medicine in an English-speaking country. A large corpus of medical textbooks (CoMeT; 28,384,681 running words) was created using SketchEngine and analyzed to extract high-frequency lemmas. Keyness and dispersion values for each lemma were plotted in a histogram to visualize clustering patterns. This visual map was used to determine threshold values separating a medical vocabulary subset from a general vocabulary subset. The replicability of the findings was evaluated using two corpora (one medical, one non-medical) different from CoMeT. The newly developed list (Core Medical List; CoMeL) comprising a total of 2881 lemmas was found to include significantly more medicine-specific words and to have higher replicability compared to existing lists. CoMeL may assist learners and educators in English for Medical Purposes programs, including those aiming to undertake challenging medical licensing examinations in English-speaking countries.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Applied Corpus Linguistics Linguistics and Language

CiteScore

1.30

自引率

0.00%

发文量

审稿时长

70 days