{"title":"论语料库衍生医学词表的可复制性","authors":"Cosmin Mihail Florescu, Ryosuke L. Ohniwa","doi":"10.1016/j.acorp.2025.100130","DOIUrl":null,"url":null,"abstract":"<div><div>Several English medical vocabulary lists have been developed using corpora compiled from a variety of medical texts including research articles and medical textbooks. List items have been identified for inclusion using criteria mostly adopted from previous studies focused on academic vocabulary. This study aims to employ a systematic approach in compiling a corpus to create a medical word list for learners of English aiming to study or practice medicine in an English-speaking country. A large corpus of medical textbooks (CoMeT; 28,384,681 running words) was created using SketchEngine and analyzed to extract high-frequency lemmas. Keyness and dispersion values for each lemma were plotted in a histogram to visualize clustering patterns. This visual map was used to determine threshold values separating a medical vocabulary subset from a general vocabulary subset. The replicability of the findings was evaluated using two corpora (one medical, one non-medical) different from CoMeT. The newly developed list (Core Medical List; CoMeL) comprising a total of 2881 lemmas was found to include significantly more medicine-specific words and to have higher replicability compared to existing lists. CoMeL may assist learners and educators in English for Medical Purposes programs, including those aiming to undertake challenging medical licensing examinations in English-speaking countries.</div></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"5 2","pages":"Article 100130"},"PeriodicalIF":2.1000,"publicationDate":"2025-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"On the replicability of corpus-derived medical word lists\",\"authors\":\"Cosmin Mihail Florescu, Ryosuke L. Ohniwa\",\"doi\":\"10.1016/j.acorp.2025.100130\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Several English medical vocabulary lists have been developed using corpora compiled from a variety of medical texts including research articles and medical textbooks. List items have been identified for inclusion using criteria mostly adopted from previous studies focused on academic vocabulary. This study aims to employ a systematic approach in compiling a corpus to create a medical word list for learners of English aiming to study or practice medicine in an English-speaking country. A large corpus of medical textbooks (CoMeT; 28,384,681 running words) was created using SketchEngine and analyzed to extract high-frequency lemmas. Keyness and dispersion values for each lemma were plotted in a histogram to visualize clustering patterns. This visual map was used to determine threshold values separating a medical vocabulary subset from a general vocabulary subset. The replicability of the findings was evaluated using two corpora (one medical, one non-medical) different from CoMeT. The newly developed list (Core Medical List; CoMeL) comprising a total of 2881 lemmas was found to include significantly more medicine-specific words and to have higher replicability compared to existing lists. CoMeL may assist learners and educators in English for Medical Purposes programs, including those aiming to undertake challenging medical licensing examinations in English-speaking countries.</div></div>\",\"PeriodicalId\":72254,\"journal\":{\"name\":\"Applied Corpus Linguistics\",\"volume\":\"5 2\",\"pages\":\"Article 100130\"},\"PeriodicalIF\":2.1000,\"publicationDate\":\"2025-04-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Applied Corpus Linguistics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2666799125000139\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Corpus Linguistics","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666799125000139","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
On the replicability of corpus-derived medical word lists
Several English medical vocabulary lists have been developed using corpora compiled from a variety of medical texts including research articles and medical textbooks. List items have been identified for inclusion using criteria mostly adopted from previous studies focused on academic vocabulary. This study aims to employ a systematic approach in compiling a corpus to create a medical word list for learners of English aiming to study or practice medicine in an English-speaking country. A large corpus of medical textbooks (CoMeT; 28,384,681 running words) was created using SketchEngine and analyzed to extract high-frequency lemmas. Keyness and dispersion values for each lemma were plotted in a histogram to visualize clustering patterns. This visual map was used to determine threshold values separating a medical vocabulary subset from a general vocabulary subset. The replicability of the findings was evaluated using two corpora (one medical, one non-medical) different from CoMeT. The newly developed list (Core Medical List; CoMeL) comprising a total of 2881 lemmas was found to include significantly more medicine-specific words and to have higher replicability compared to existing lists. CoMeL may assist learners and educators in English for Medical Purposes programs, including those aiming to undertake challenging medical licensing examinations in English-speaking countries.