The Use of Probabilistic Latent Semantic Analysis To Identify Scientific Subject Spaces and To Evaluate the Completeness of Covering the Results of Dissertation Studies

P. Lizunov, A. Biloshchytskyi, A. Kuchansky, Y. Andrashko, S. Biloshchytska
{"title":"The Use of Probabilistic Latent Semantic Analysis To Identify Scientific Subject Spaces and To Evaluate the Completeness of Covering the Results of Dissertation Studies","authors":"P. Lizunov, A. Biloshchytskyi, A. Kuchansky, Y. Andrashko, S. Biloshchytska","doi":"10.15587/1729-4061.2020.209886","DOIUrl":null,"url":null,"abstract":"The study considers the possibilities of using latent semantic analysis for the tasks of identifying scientific subject spaces and evaluating the completeness of covering the results of dissertation research by science degree seekers. A probabilistic thematic model was built to make it possible to cluster the publications of scholars in scientific areas, taking into account the citation network, which was an important step for solving the problem of identifying scientific subject spaces. As a result of constructing the model, the problem of increasing instability of clustering the citation graph in connection with a decrease in the number of clusters was solved. This problem would arise when combining clusters built on the basis of citation graph clustering, taking into account the similarity of abstracts of scientific publications. In the article, the presentation of text documents is described based on a probabilistic thematic model using n-grams. A probabilistic thematic model was built for the task of determining the completeness of covering the materials of an author’s dissertation research in scientific publications. The approximate values of the threshold coefficients were calculated to evaluate whether the articles of an author included the research provisions that were reflected in the text of the author’s abstract of the dissertation. The probabilistic thematic model for an author’s publications was practised on the basis of the BigARTM tool. Using the constructed model and with the help of a special regularizer, a matrix was found to evaluate the relevance of topics specified by the segments of an author’s dissertation abstracts to documents that are produced by the author’s publications. Important aspects of the possibilities of using latent semantic analysis were studied to identify tasks of scientific subject spaces and to reveal the completeness of covering the results of dissertation research science degree seekers.","PeriodicalId":89488,"journal":{"name":"The electronic journal of human sexuality","volume":"2012 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2020-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"16","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The electronic journal of human sexuality","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.15587/1729-4061.2020.209886","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 16

Abstract

The study considers the possibilities of using latent semantic analysis for the tasks of identifying scientific subject spaces and evaluating the completeness of covering the results of dissertation research by science degree seekers. A probabilistic thematic model was built to make it possible to cluster the publications of scholars in scientific areas, taking into account the citation network, which was an important step for solving the problem of identifying scientific subject spaces. As a result of constructing the model, the problem of increasing instability of clustering the citation graph in connection with a decrease in the number of clusters was solved. This problem would arise when combining clusters built on the basis of citation graph clustering, taking into account the similarity of abstracts of scientific publications. In the article, the presentation of text documents is described based on a probabilistic thematic model using n-grams. A probabilistic thematic model was built for the task of determining the completeness of covering the materials of an author’s dissertation research in scientific publications. The approximate values of the threshold coefficients were calculated to evaluate whether the articles of an author included the research provisions that were reflected in the text of the author’s abstract of the dissertation. The probabilistic thematic model for an author’s publications was practised on the basis of the BigARTM tool. Using the constructed model and with the help of a special regularizer, a matrix was found to evaluate the relevance of topics specified by the segments of an author’s dissertation abstracts to documents that are produced by the author’s publications. Important aspects of the possibilities of using latent semantic analysis were studied to identify tasks of scientific subject spaces and to reveal the completeness of covering the results of dissertation research science degree seekers.
利用概率潜在语义分析识别科学主题空间并评估论文研究结果的完整性
该研究考虑了使用潜在语义分析来识别科学主题空间和评估科学学位申请者覆盖论文研究结果的完整性的可能性。建立概率专题模型,在考虑引文网络的情况下,对科学领域学者的论文进行聚类,是解决科学学科空间识别问题的重要一步。通过构建该模型,解决了引文图聚类数量减少导致聚类不稳定性增加的问题。考虑到科学出版物摘要的相似性,在引用图聚类的基础上构建聚类时,会出现这个问题。在本文中,文本文档的表示是基于使用n-gram的概率主题模型来描述的。建立了一个概率专题模型,以确定在科学出版物中覆盖作者的论文研究材料的完整性。计算阈值系数的近似值,以评估作者的文章是否包括作者论文摘要文本中反映的研究规定。在BigARTM工具的基础上,对作者出版物的概率主题模型进行了实践。使用构建的模型并在特殊正则化器的帮助下,找到了一个矩阵来评估作者论文摘要部分指定的主题与作者出版物产生的文档的相关性。研究了使用潜在语义分析的可能性的重要方面,以确定科学主题空间的任务,并揭示了覆盖论文研究结果的完整性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信