{"title":"Predicting citation counts of papers","authors":"Junpeng Chen, Chunxia Zhang","doi":"10.1109/ICCI-CC.2015.7259421","DOIUrl":null,"url":null,"abstract":"The task of citation counts prediction is to predict the citation counts of a paper after a given time period. Future citation counts of papers are an important metric to estimate potential influences of published papers, and will be helpful for researchers to choose representative literatures. This task can be treated as a regression problem. This paper proposes two types of predictive features to represent fundamental characteristics of papers and authors: six content features and ten author features. We introduce the IBM Model 1 to calculate the association probabilities between paper topics which are employed to extract content features, and use the bipartite network projection to obtain the author collaboration network which is utilized to extract author features. Further, we introduce the Gradient Boosted Regression Trees to predict citation counts of papers. Our approach combines contents and topics of papers and multi-dimensional measures of author collaborations in one learning process. Experimental results on the KDD CUP dataset demonstrate that our predicting features and models are effective to solve the problem of citation counts prediction of papers.","PeriodicalId":328695,"journal":{"name":"2015 IEEE 14th International Conference on Cognitive Informatics & Cognitive Computing (ICCI*CC)","volume":"76 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"18","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE 14th International Conference on Cognitive Informatics & Cognitive Computing (ICCI*CC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCI-CC.2015.7259421","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 18
Abstract
The task of citation counts prediction is to predict the citation counts of a paper after a given time period. Future citation counts of papers are an important metric to estimate potential influences of published papers, and will be helpful for researchers to choose representative literatures. This task can be treated as a regression problem. This paper proposes two types of predictive features to represent fundamental characteristics of papers and authors: six content features and ten author features. We introduce the IBM Model 1 to calculate the association probabilities between paper topics which are employed to extract content features, and use the bipartite network projection to obtain the author collaboration network which is utilized to extract author features. Further, we introduce the Gradient Boosted Regression Trees to predict citation counts of papers. Our approach combines contents and topics of papers and multi-dimensional measures of author collaborations in one learning process. Experimental results on the KDD CUP dataset demonstrate that our predicting features and models are effective to solve the problem of citation counts prediction of papers.