Hierarchical Bayesian classification methods to identify topics by journal quartile with an application in biological sciences

S. Restrepo, E. Horst, Juan Diego Zambrano, L. Gunn, German Molina, Carlos Andres Salazar
{"title":"Hierarchical Bayesian classification methods to identify topics by journal quartile with an application in biological sciences","authors":"S. Restrepo, E. Horst, Juan Diego Zambrano, L. Gunn, German Molina, Carlos Andres Salazar","doi":"10.3233/efi-211546","DOIUrl":null,"url":null,"abstract":"This manuscript builds on a novel, automatic, freely-available Bayesian approach to extract information in abstracts and titles to classify research topics by quartile. This approach is demonstrated for all N= 149,129 ISI-indexed publications in biological sciences journals during 2017. A Bayesian multinomial inverse regression approach is used to extract rankings of topics without the need of a pre-defined dictionary. Bigrams are used for extraction of research topics across manuscripts, and rankings of research topics are constructed by quartile. Worldwide and local results (e.g., comparison between two peer/aspirational research institutions in Colombia) are provided, and differences are explored both at the global and local levels. Some topics persist across quartiles, while the relevance of others is quartile-specific. Challenges in sustainable development appear as more prevalent in top quartile journals across institutions, while the two Colombian institutions favour plant and microorganism research. This approach can reduce information inequities, by allowing young/incipient researchers in biological sciences, especially within lower income countries or universities with limited resources, to freely assess the state of the literature and the relative likelihood of publication in higher impact journals by research topic. This can also serve institutions of higher education to identify missing research topics and areas of competitive advantage.","PeriodicalId":84661,"journal":{"name":"Environmental education and information","volume":"5 19","pages":"93-112"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Environmental education and information","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3233/efi-211546","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

This manuscript builds on a novel, automatic, freely-available Bayesian approach to extract information in abstracts and titles to classify research topics by quartile. This approach is demonstrated for all N= 149,129 ISI-indexed publications in biological sciences journals during 2017. A Bayesian multinomial inverse regression approach is used to extract rankings of topics without the need of a pre-defined dictionary. Bigrams are used for extraction of research topics across manuscripts, and rankings of research topics are constructed by quartile. Worldwide and local results (e.g., comparison between two peer/aspirational research institutions in Colombia) are provided, and differences are explored both at the global and local levels. Some topics persist across quartiles, while the relevance of others is quartile-specific. Challenges in sustainable development appear as more prevalent in top quartile journals across institutions, while the two Colombian institutions favour plant and microorganism research. This approach can reduce information inequities, by allowing young/incipient researchers in biological sciences, especially within lower income countries or universities with limited resources, to freely assess the state of the literature and the relative likelihood of publication in higher impact journals by research topic. This can also serve institutions of higher education to identify missing research topics and areas of competitive advantage.
层次贝叶斯分类方法在期刊四分位数主题识别中的应用
这份手稿建立在一个新颖的,自动的,免费可用的贝叶斯方法来提取摘要和标题中的信息,按四分位数对研究主题进行分类。该方法在2017年生物科学期刊上所有N= 149,129篇isi索引出版物中得到了验证。使用贝叶斯多项式逆回归方法提取主题排名,而不需要预先定义字典。双图用于提取手稿中的研究主题,研究主题的排名由四分位数构成。提供了世界和地方的结果(例如,比较哥伦比亚两个同行/有抱负的研究机构),并探讨了全球和地方两级的差异。有些主题跨越四分位数持续存在,而其他主题的相关性则是四分位数特定的。可持续发展方面的挑战似乎在各机构排名前四分之一的期刊上更为普遍,而这两所哥伦比亚机构更青睐植物和微生物研究。这种方法可以减少信息不平等,因为它允许生物科学领域的年轻/刚起步的研究人员,特别是低收入国家或资源有限的大学的研究人员,自由地评估文献的状况和按研究主题在高影响力期刊上发表的相对可能性。这也可以帮助高等教育机构确定缺失的研究课题和竞争优势领域。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信