{"title":"标注科学文章的层次聚类","authors":"Irina Peganova, A. Rebrova, Y. Nedumov","doi":"10.1109/IVMEM.2019.00010","DOIUrl":null,"url":null,"abstract":"Exploration of document collections is a complex task. One way to do this is to cluster the initial collection hierarchically and then label each cluster with a set of extracted terms. Good labelling should help exploration. We focus on the scientific domain and particularly on collections of abstracts of articles. Abstract is commonly a brief of a paper that outlines the research area, the challenge, the proposed solution and the results; so it could be used instead of a full article despite the difficulties related to its shortness. In this paper, we propose a new method HCBasic for labelling hierarchical clusters. It is particularly tuned for articles' abstracts and compared to three other methods: MTWL, hierMTWL and ComboBasic. To evaluate the quality of the labelling algorithms we did A/B testing in which eight volunteers searched for the articles that they were familiar with in the labelled cluster tree. We show that there is no single winner in terms of quality, and different methods are preferable in different cases.","PeriodicalId":166102,"journal":{"name":"2019 Ivannikov Memorial Workshop (IVMEM)","volume":"56 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Labelling Hierarchical Clusters of Scientific Articles\",\"authors\":\"Irina Peganova, A. Rebrova, Y. Nedumov\",\"doi\":\"10.1109/IVMEM.2019.00010\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Exploration of document collections is a complex task. One way to do this is to cluster the initial collection hierarchically and then label each cluster with a set of extracted terms. Good labelling should help exploration. We focus on the scientific domain and particularly on collections of abstracts of articles. Abstract is commonly a brief of a paper that outlines the research area, the challenge, the proposed solution and the results; so it could be used instead of a full article despite the difficulties related to its shortness. In this paper, we propose a new method HCBasic for labelling hierarchical clusters. It is particularly tuned for articles' abstracts and compared to three other methods: MTWL, hierMTWL and ComboBasic. To evaluate the quality of the labelling algorithms we did A/B testing in which eight volunteers searched for the articles that they were familiar with in the labelled cluster tree. We show that there is no single winner in terms of quality, and different methods are preferable in different cases.\",\"PeriodicalId\":166102,\"journal\":{\"name\":\"2019 Ivannikov Memorial Workshop (IVMEM)\",\"volume\":\"56 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 Ivannikov Memorial Workshop (IVMEM)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IVMEM.2019.00010\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 Ivannikov Memorial Workshop (IVMEM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IVMEM.2019.00010","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Labelling Hierarchical Clusters of Scientific Articles
Exploration of document collections is a complex task. One way to do this is to cluster the initial collection hierarchically and then label each cluster with a set of extracted terms. Good labelling should help exploration. We focus on the scientific domain and particularly on collections of abstracts of articles. Abstract is commonly a brief of a paper that outlines the research area, the challenge, the proposed solution and the results; so it could be used instead of a full article despite the difficulties related to its shortness. In this paper, we propose a new method HCBasic for labelling hierarchical clusters. It is particularly tuned for articles' abstracts and compared to three other methods: MTWL, hierMTWL and ComboBasic. To evaluate the quality of the labelling algorithms we did A/B testing in which eight volunteers searched for the articles that they were familiar with in the labelled cluster tree. We show that there is no single winner in terms of quality, and different methods are preferable in different cases.