{"title":"HSTOOL中科技文献的集群管理","authors":"J. Schubert, U. W. Bolin","doi":"10.1109/ICMLA55696.2022.00062","DOIUrl":null,"url":null,"abstract":"In this paper, we expand a methodology for horizon scanning of scientific literature to discover scientific trends. In this methodology, scientific articles are automatically clustered within a broadly defined field of research based on the topic. We develop a new method to allow an analyst to handle the large number of clusters that result from the automatic clustering of articles. The method is based on estimating an information-theoretical distance between all possible pairs of clusters. Each of the scientific articles has a probability distribution of affiliation over all possible clusters arising from the clustering process. Using these, we investigate possible pairwise mergers between all pairs of existing clusters and calculate the entropies of the probability distributions of all articles after each possible merger of two clusters. These entropies are visualized in a dendritic tree and a cluster graph. The merger with minimal total entropy is the proposed cluster pair to be merged.","PeriodicalId":128160,"journal":{"name":"2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Cluster Management of Scientific Literature in HSTOOL\",\"authors\":\"J. Schubert, U. W. Bolin\",\"doi\":\"10.1109/ICMLA55696.2022.00062\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we expand a methodology for horizon scanning of scientific literature to discover scientific trends. In this methodology, scientific articles are automatically clustered within a broadly defined field of research based on the topic. We develop a new method to allow an analyst to handle the large number of clusters that result from the automatic clustering of articles. The method is based on estimating an information-theoretical distance between all possible pairs of clusters. Each of the scientific articles has a probability distribution of affiliation over all possible clusters arising from the clustering process. Using these, we investigate possible pairwise mergers between all pairs of existing clusters and calculate the entropies of the probability distributions of all articles after each possible merger of two clusters. These entropies are visualized in a dendritic tree and a cluster graph. The merger with minimal total entropy is the proposed cluster pair to be merged.\",\"PeriodicalId\":128160,\"journal\":{\"name\":\"2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA)\",\"volume\":\"11 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICMLA55696.2022.00062\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMLA55696.2022.00062","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Cluster Management of Scientific Literature in HSTOOL
In this paper, we expand a methodology for horizon scanning of scientific literature to discover scientific trends. In this methodology, scientific articles are automatically clustered within a broadly defined field of research based on the topic. We develop a new method to allow an analyst to handle the large number of clusters that result from the automatic clustering of articles. The method is based on estimating an information-theoretical distance between all possible pairs of clusters. Each of the scientific articles has a probability distribution of affiliation over all possible clusters arising from the clustering process. Using these, we investigate possible pairwise mergers between all pairs of existing clusters and calculate the entropies of the probability distributions of all articles after each possible merger of two clusters. These entropies are visualized in a dendritic tree and a cluster graph. The merger with minimal total entropy is the proposed cluster pair to be merged.