Joyita Chakraborty , Dinesh K. Pradhan , Subrata Nandi
{"title":"A multiple k-means cluster ensemble framework for clustering citation trajectories","authors":"Joyita Chakraborty , Dinesh K. Pradhan , Subrata Nandi","doi":"10.1016/j.joi.2024.101507","DOIUrl":null,"url":null,"abstract":"<div><p>Citation maturity time varies for different articles. However, the impact of all articles is measured in a fixed window (2-5 years). Clustering their citation trajectories helps understand the knowledge diffusion process and reveals that not all articles gain immediate success after publication. Moreover, clustering trajectories is necessary for paper impact recommendation algorithms. It is a challenging problem because citation time series exhibit significant variability due to non-linear and non-stationary characteristics. Prior works propose a set of arbitrary thresholds and a fixed rule-based approach. All methods are primarily parameter-dependent. Consequently, it leads to inconsistencies while defining similar trajectories and ambiguities regarding their specific number. Most studies only capture extreme trajectories. Thus, a generalized clustering framework is required. This paper proposes a <em>feature-based multiple k-means cluster ensemble framework</em>. Multiple learners are trained for evaluating the credibility of class labels, unlike single clustering algorithms. 195,783 and 41,732 well-cited articles from the Microsoft Academic Graph data are considered for clustering short-term (10-year) and long-term (30-year) trajectories, respectively. It has linear run-time. Four distinct trajectories are obtained – <em>Early Rise-Rapid Decline (ER-RD)</em> (2.2%), <em>Early Rise-Slow Decline (ER-SD)</em> (45%), <em>Delayed Rise-Not yet Declined (DR-ND)</em> (53%), and <em>Delayed Rise-Slow Decline (DR-SD)</em> (0.8%). Individual trajectory differences for two different spans are studied. Most papers exhibit <em>ER-SD</em> and <em>DR-ND</em> patterns. The growth and decay times, cumulative citation distribution, and peak characteristics of individual trajectories' are re-defined empirically. A detailed comparative study reveals our proposed methodology can detect all distinct trajectory classes.</p></div>","PeriodicalId":48662,"journal":{"name":"Journal of Informetrics","volume":null,"pages":null},"PeriodicalIF":3.4000,"publicationDate":"2024-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Informetrics","FirstCategoryId":"91","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1751157724000208","RegionNum":2,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0
Abstract
Citation maturity time varies for different articles. However, the impact of all articles is measured in a fixed window (2-5 years). Clustering their citation trajectories helps understand the knowledge diffusion process and reveals that not all articles gain immediate success after publication. Moreover, clustering trajectories is necessary for paper impact recommendation algorithms. It is a challenging problem because citation time series exhibit significant variability due to non-linear and non-stationary characteristics. Prior works propose a set of arbitrary thresholds and a fixed rule-based approach. All methods are primarily parameter-dependent. Consequently, it leads to inconsistencies while defining similar trajectories and ambiguities regarding their specific number. Most studies only capture extreme trajectories. Thus, a generalized clustering framework is required. This paper proposes a feature-based multiple k-means cluster ensemble framework. Multiple learners are trained for evaluating the credibility of class labels, unlike single clustering algorithms. 195,783 and 41,732 well-cited articles from the Microsoft Academic Graph data are considered for clustering short-term (10-year) and long-term (30-year) trajectories, respectively. It has linear run-time. Four distinct trajectories are obtained – Early Rise-Rapid Decline (ER-RD) (2.2%), Early Rise-Slow Decline (ER-SD) (45%), Delayed Rise-Not yet Declined (DR-ND) (53%), and Delayed Rise-Slow Decline (DR-SD) (0.8%). Individual trajectory differences for two different spans are studied. Most papers exhibit ER-SD and DR-ND patterns. The growth and decay times, cumulative citation distribution, and peak characteristics of individual trajectories' are re-defined empirically. A detailed comparative study reveals our proposed methodology can detect all distinct trajectory classes.
期刊介绍:
Journal of Informetrics (JOI) publishes rigorous high-quality research on quantitative aspects of information science. The main focus of the journal is on topics in bibliometrics, scientometrics, webometrics, patentometrics, altmetrics and research evaluation. Contributions studying informetric problems using methods from other quantitative fields, such as mathematics, statistics, computer science, economics and econometrics, and network science, are especially encouraged. JOI publishes both theoretical and empirical work. In general, case studies, for instance a bibliometric analysis focusing on a specific research field or a specific country, are not considered suitable for publication in JOI, unless they contain innovative methodological elements.