HSTOOL中科技文献的集群管理

2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA) Pub Date : 2022-12-01 DOI:10.1109/ICMLA55696.2022.00062

J. Schubert, U. W. Bolin

{"title":"HSTOOL中科技文献的集群管理","authors":"J. Schubert, U. W. Bolin","doi":"10.1109/ICMLA55696.2022.00062","DOIUrl":null,"url":null,"abstract":"In this paper, we expand a methodology for horizon scanning of scientific literature to discover scientific trends. In this methodology, scientific articles are automatically clustered within a broadly defined field of research based on the topic. We develop a new method to allow an analyst to handle the large number of clusters that result from the automatic clustering of articles. The method is based on estimating an information-theoretical distance between all possible pairs of clusters. Each of the scientific articles has a probability distribution of affiliation over all possible clusters arising from the clustering process. Using these, we investigate possible pairwise mergers between all pairs of existing clusters and calculate the entropies of the probability distributions of all articles after each possible merger of two clusters. These entropies are visualized in a dendritic tree and a cluster graph. The merger with minimal total entropy is the proposed cluster pair to be merged.","PeriodicalId":128160,"journal":{"name":"2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Cluster Management of Scientific Literature in HSTOOL\",\"authors\":\"J. Schubert, U. W. Bolin\",\"doi\":\"10.1109/ICMLA55696.2022.00062\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we expand a methodology for horizon scanning of scientific literature to discover scientific trends. In this methodology, scientific articles are automatically clustered within a broadly defined field of research based on the topic. We develop a new method to allow an analyst to handle the large number of clusters that result from the automatic clustering of articles. The method is based on estimating an information-theoretical distance between all possible pairs of clusters. Each of the scientific articles has a probability distribution of affiliation over all possible clusters arising from the clustering process. Using these, we investigate possible pairwise mergers between all pairs of existing clusters and calculate the entropies of the probability distributions of all articles after each possible merger of two clusters. These entropies are visualized in a dendritic tree and a cluster graph. The merger with minimal total entropy is the proposed cluster pair to be merged.\",\"PeriodicalId\":128160,\"journal\":{\"name\":\"2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA)\",\"volume\":\"11 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICMLA55696.2022.00062\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMLA55696.2022.00062","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

在本文中，我们扩展了一种科学文献水平扫描的方法，以发现科学趋势。在这种方法中，科学文章根据主题自动聚集在一个广泛定义的研究领域内。我们开发了一种新方法，允许分析人员处理由文章自动聚类产生的大量聚类。该方法基于估计所有可能的簇对之间的信息理论距离。每一篇科学文章在聚类过程中产生的所有可能的聚类中都有一个隶属关系的概率分布。利用这些，我们研究了所有现有簇对之间可能的成对合并，并计算了两个簇每次可能合并后所有文章的概率分布的熵。这些熵用树突树和聚类图来表示。总熵最小的合并是我们提出的待合并簇对。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Cluster Management of Scientific Literature in HSTOOL

In this paper, we expand a methodology for horizon scanning of scientific literature to discover scientific trends. In this methodology, scientific articles are automatically clustered within a broadly defined field of research based on the topic. We develop a new method to allow an analyst to handle the large number of clusters that result from the automatic clustering of articles. The method is based on estimating an information-theoretical distance between all possible pairs of clusters. Each of the scientific articles has a probability distribution of affiliation over all possible clusters arising from the clustering process. Using these, we investigate possible pairwise mergers between all pairs of existing clusters and calculate the entropies of the probability distributions of all articles after each possible merger of two clusters. These entropies are visualized in a dendritic tree and a cluster graph. The merger with minimal total entropy is the proposed cluster pair to be merged.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA)

自引率

0.00%

发文量