{"title":"Modeling Education Studies Indexed in Web of Science Using Natural Language Processing","authors":"Tuncer Akbay","doi":"10.52911/itall.1193460","DOIUrl":null,"url":null,"abstract":"Easier access to information and resources allowed researchers to conduct more studies and publish most of them electronically. They are indexed in scholarly citation databases such as Web of Science and Scopus. These databases index huge volumes of research reports. Even though they offer search engine filtering options, it is still hard to locate the publications in which their contents are closely related. Artificial intelligence technologies, such as Natural Language Processing, allow documents to be categorized based on their content. Top2Vec is an unsupervised topic modeling algorithm that enables users to categorize documents semantically. The purpose of the current study is twofold: (1) to provide users with the ability to group documents applying Natural Language Processing techniques, and (2) to reveal the topics with the highest number of articles indexed in the ‘education scientific disciplines’ category within the Web of Science Core Collection scholarly database in 2021. Colab notebook used to type Python codes for executing Top2Vec algorithm. This study yielded 68 distinct topics among the 8125 articles published in 2021 and indexed in the Web of Science database under the Education Scientific Disciplines category. After modeled topics were ranked from the topic having the largest number of documents (i.e., N=549) to the topic having the least number of documents (i.e., N=29), the first eight topics' findings were presented and discussed. These eight most studies topics are listed as follows: Physics (N=549), online education and covid (N=438), Chemistry (N=381), Math and Reasoning (N=377), Psychology and Emotions (N=257), Educational Diversity (N=228), Health and Life (N=223), Mentoring and Leadership (N=204).","PeriodicalId":340105,"journal":{"name":"Öğretim Teknolojisi ve Hayat Boyu Öğrenme Dergisi - Instructional Technology and Lifelong Learning","volume":"14 3","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Öğretim Teknolojisi ve Hayat Boyu Öğrenme Dergisi - Instructional Technology and Lifelong Learning","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.52911/itall.1193460","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Easier access to information and resources allowed researchers to conduct more studies and publish most of them electronically. They are indexed in scholarly citation databases such as Web of Science and Scopus. These databases index huge volumes of research reports. Even though they offer search engine filtering options, it is still hard to locate the publications in which their contents are closely related. Artificial intelligence technologies, such as Natural Language Processing, allow documents to be categorized based on their content. Top2Vec is an unsupervised topic modeling algorithm that enables users to categorize documents semantically. The purpose of the current study is twofold: (1) to provide users with the ability to group documents applying Natural Language Processing techniques, and (2) to reveal the topics with the highest number of articles indexed in the ‘education scientific disciplines’ category within the Web of Science Core Collection scholarly database in 2021. Colab notebook used to type Python codes for executing Top2Vec algorithm. This study yielded 68 distinct topics among the 8125 articles published in 2021 and indexed in the Web of Science database under the Education Scientific Disciplines category. After modeled topics were ranked from the topic having the largest number of documents (i.e., N=549) to the topic having the least number of documents (i.e., N=29), the first eight topics' findings were presented and discussed. These eight most studies topics are listed as follows: Physics (N=549), online education and covid (N=438), Chemistry (N=381), Math and Reasoning (N=377), Psychology and Emotions (N=257), Educational Diversity (N=228), Health and Life (N=223), Mentoring and Leadership (N=204).
更容易获得信息和资源,使研究人员能够进行更多的研究,并以电子方式发表大部分研究成果。它们被诸如Web of Science和Scopus等学术引文数据库编入索引。这些数据库索引了大量的研究报告。即使他们提供搜索引擎过滤选项,仍然很难找到与其内容密切相关的出版物。人工智能技术,如自然语言处理,允许根据文档的内容对其进行分类。Top2Vec是一种无监督主题建模算法,使用户能够对文档进行语义分类。本研究的目的有两个:(1)为用户提供应用自然语言处理技术对文档进行分组的能力;(2)揭示2021年Web of Science Core Collection学术数据库中“教育科学学科”类别中被索引的文章数量最多的主题。用于输入执行Top2Vec算法的Python代码的Colab笔记本。这项研究从2021年发表的8125篇文章中得出了68个不同的主题,并在Web of Science数据库的教育科学学科类别下被索引。将建模的主题从拥有最多文档的主题(即N=549)到拥有最少文档的主题(即N=29)进行排序后,展示并讨论了前八个主题的发现。这八个最受欢迎的研究主题如下:物理(N=549)、在线教育和covid (N=438)、化学(N=381)、数学和推理(N=377)、心理学和情感(N=257)、教育多样性(N=228)、健康与生活(N=223)、指导与领导力(N=204)。