基于谷歌学术研究人员兴趣的主题建模

S. Shtovba, M. Petrychko
{"title":"基于谷歌学术研究人员兴趣的主题建模","authors":"S. Shtovba, M. Petrychko","doi":"10.20535/srit.2308-8893.2021.2.09","DOIUrl":null,"url":null,"abstract":"The article proposes an algorithm for topic modeling of researchers based on their interests from Google Scholar profiles. The algorithm uses the set of fields of research from research classification system ANZSRC. An information resource for topic modeling is a corpus of categorized publications from Dimensions. Interests from researchers’ profiles are used as search queries to Dimensions that outputs distributions of documents over categories. To reduce information noise these distributions are taken through a few stages of processing. The article also compares the results of topic modeling based on interests from Google Scholar profiles and based on a categorized list of publications from Dimensions. The comparison is done using modified Czekanowski metric that takes into account the similarity between categories. The results of comparing the topic modeling outputs based on different information sources show a good match.","PeriodicalId":330635,"journal":{"name":"System research and information technologies","volume":"48 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Topic modeling of researchers based on their interests from Google Scholar\",\"authors\":\"S. Shtovba, M. Petrychko\",\"doi\":\"10.20535/srit.2308-8893.2021.2.09\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The article proposes an algorithm for topic modeling of researchers based on their interests from Google Scholar profiles. The algorithm uses the set of fields of research from research classification system ANZSRC. An information resource for topic modeling is a corpus of categorized publications from Dimensions. Interests from researchers’ profiles are used as search queries to Dimensions that outputs distributions of documents over categories. To reduce information noise these distributions are taken through a few stages of processing. The article also compares the results of topic modeling based on interests from Google Scholar profiles and based on a categorized list of publications from Dimensions. The comparison is done using modified Czekanowski metric that takes into account the similarity between categories. The results of comparing the topic modeling outputs based on different information sources show a good match.\",\"PeriodicalId\":330635,\"journal\":{\"name\":\"System research and information technologies\",\"volume\":\"48 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-09-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"System research and information technologies\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.20535/srit.2308-8893.2021.2.09\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"System research and information technologies","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.20535/srit.2308-8893.2021.2.09","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

本文提出了一种基于b谷歌学者档案中研究人员兴趣的主题建模算法。该算法使用研究分类系统ANZSRC中的研究领域集。主题建模的信息资源是来自Dimensions的分类出版物的语料库。研究人员个人资料中的兴趣被用作对维度的搜索查询,维度输出不同类别的文档分布。为了减少信息噪声,对这些分布进行了几个阶段的处理。本文还比较了基于谷歌Scholar概要文件的兴趣和基于Dimensions出版物分类列表的主题建模结果。比较是使用改进的切卡诺夫斯基度量来完成的,该度量考虑了类别之间的相似性。通过对不同信息源的主题建模输出进行比较,结果显示出较好的匹配性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Topic modeling of researchers based on their interests from Google Scholar
The article proposes an algorithm for topic modeling of researchers based on their interests from Google Scholar profiles. The algorithm uses the set of fields of research from research classification system ANZSRC. An information resource for topic modeling is a corpus of categorized publications from Dimensions. Interests from researchers’ profiles are used as search queries to Dimensions that outputs distributions of documents over categories. To reduce information noise these distributions are taken through a few stages of processing. The article also compares the results of topic modeling based on interests from Google Scholar profiles and based on a categorized list of publications from Dimensions. The comparison is done using modified Czekanowski metric that takes into account the similarity between categories. The results of comparing the topic modeling outputs based on different information sources show a good match.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信