Predicting access mode of multidisciplinary and library and information sciences journals using machine learning

IF 1.6 Q2 INFORMATION SCIENCE & LIBRARY SCIENCE
H. Okagbue, C. A. Nzeadibe, J. A. Teixeira da Silva
{"title":"Predicting access mode of multidisciplinary and library and information sciences journals using machine learning","authors":"H. Okagbue, C. A. Nzeadibe, J. A. Teixeira da Silva","doi":"10.1080/09737766.2021.2009745","DOIUrl":null,"url":null,"abstract":"Academics and librarians might want to identify whether a journal is open access (OA) or subscription-based. While indexes and digital libraries might provide such information for known collections, it is possible that the access mode of a journal or body of journals might be unknown a priori. In this short analysis, a machine learning-based method is used to classify a journal’s access mode, OA or subscription, using its CiteScore and Journal Impact Factor (JIF). Using an initial pool of 91 multidisciplinary journals with a CiteScore, 38 journals with both a JIF and a CiteScore were selected (24 = OA; 14 = subscription). Using a data mining tool (Orange), ten machine learning models were applied (k nearest neighbor (kNN), Tree, support vector machine (SVM), Random forest, Neural network, Naïve Bayes, Logistic regression, Adaptive boosting (Adaboost)), Gradient Boosting (Scikit-learn) (GBS) and Gradient Boosting (catboost) (GBC). Adaboost, GBS and GBC showed the highest (100%) precision, sensitivity, and specificity. The 3 models correctly classify the access mode with zero error. The 3 optimum models were validated using then to predict the access mode of 54 (7 = OA; 47 = subscription) library and information science (LIS) journals and Adaboost and GBS gave perfect results with no misclassification. With these model, the access mode of multidisciplinary and LIS journals can be accurately and correctly predicted using only JIF-CiteScore data. Libraries in low-resource settings will benefit from the implementation of this research by designing a decision support system for the selection of journals.","PeriodicalId":10501,"journal":{"name":"COLLNET Journal of Scientometrics and Information Management","volume":"16 1","pages":"117 - 124"},"PeriodicalIF":1.6000,"publicationDate":"2022-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"COLLNET Journal of Scientometrics and Information Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1080/09737766.2021.2009745","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"INFORMATION SCIENCE & LIBRARY SCIENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Academics and librarians might want to identify whether a journal is open access (OA) or subscription-based. While indexes and digital libraries might provide such information for known collections, it is possible that the access mode of a journal or body of journals might be unknown a priori. In this short analysis, a machine learning-based method is used to classify a journal’s access mode, OA or subscription, using its CiteScore and Journal Impact Factor (JIF). Using an initial pool of 91 multidisciplinary journals with a CiteScore, 38 journals with both a JIF and a CiteScore were selected (24 = OA; 14 = subscription). Using a data mining tool (Orange), ten machine learning models were applied (k nearest neighbor (kNN), Tree, support vector machine (SVM), Random forest, Neural network, Naïve Bayes, Logistic regression, Adaptive boosting (Adaboost)), Gradient Boosting (Scikit-learn) (GBS) and Gradient Boosting (catboost) (GBC). Adaboost, GBS and GBC showed the highest (100%) precision, sensitivity, and specificity. The 3 models correctly classify the access mode with zero error. The 3 optimum models were validated using then to predict the access mode of 54 (7 = OA; 47 = subscription) library and information science (LIS) journals and Adaboost and GBS gave perfect results with no misclassification. With these model, the access mode of multidisciplinary and LIS journals can be accurately and correctly predicted using only JIF-CiteScore data. Libraries in low-resource settings will benefit from the implementation of this research by designing a decision support system for the selection of journals.
利用机器学习预测多学科和图书馆信息科学期刊的访问模式
学者和图书管理员可能想要确定期刊是开放获取(OA)还是基于订阅。虽然索引和数字图书馆可能为已知的馆藏提供这类信息,但一本期刊或一组期刊的访问模式可能是先验未知的。在这个简短的分析中,使用基于机器学习的方法来分类期刊的访问模式,OA或订阅,使用其CiteScore和期刊影响因子(JIF)。从具有CiteScore的91种多学科期刊的初始池中,选择了38种同时具有JIF和CiteScore的期刊(24 = OA;14 =订阅)。使用数据挖掘工具(Orange),应用了10种机器学习模型(k最近邻(kNN),树,支持向量机(SVM),随机森林,神经网络,Naïve贝叶斯,逻辑回归,自适应增强(Adaboost)),梯度增强(Scikit-learn) (GBS)和梯度增强(catboost) (GBC))。Adaboost、GBS和GBC显示最高(100%)的精度、灵敏度和特异性。3种模型对接入方式进行了正确的分类,误差为零。用3个最优模型对54 (7 = OA;47 =订阅)图书馆和信息科学(LIS)期刊和Adaboost和GBS给出了完美的结果,没有错误分类。利用该模型,仅使用JIF-CiteScore数据就可以准确、准确地预测多学科和LIS期刊的存取模式。资源匮乏地区的图书馆可以通过设计期刊选择决策支持系统从本研究的实施中获益。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
COLLNET Journal of Scientometrics and Information Management
COLLNET Journal of Scientometrics and Information Management INFORMATION SCIENCE & LIBRARY SCIENCE-
自引率
0.00%
发文量
11
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信