{"title":"ScholarSeq","authors":"Yu Liu, Zhenzhao Sun, Yizhou Yan, Jing Li","doi":"10.1145/3331453.3362039","DOIUrl":null,"url":null,"abstract":"H-sequence, as the time evolution of h-index, is a promising approach to evaluating a scholar's performance throughout his entire career. However, the lack of benchmark dataset that could be used to compare and evaluate various new and existing h-sequence methods has limited the development of h-sequence or other time series indicators. In order to solve this problem, we have crawled about 7,276,970 papers in computer science field. After that, we find the most cited papers t that could identify out 200 top scientists and 50 ordinary scientists. Finally, we construct a benchmark dataset called ScholarSeq which contains information of 150 particular scholars who are major in computer science field. The dataset includes 37,900 papers published by these authors and 3,263,813 citing papers. ScholarSeq provides citation counts in each individual year for each paper, which can be applied to various academic career impact assessments based on time sequence such as h-sequence. Furthermore, it is of great significance that we package the dataset in paper-time matrices so that informetricians can easily get access to and study various innovative sequences of impact measures. In order to illustrate how to use ScholarSeq, we apply the dataset to analyze 4 state-of-the-arts h-sequence methods. Moreover, we have shared source codes, entire dataset and many other files on our website at http://scholarseq.beyondcloud.cn/.","PeriodicalId":162067,"journal":{"name":"Proceedings of the 3rd International Conference on Computer Science and Application Engineering","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 3rd International Conference on Computer Science and Application Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3331453.3362039","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
H-sequence, as the time evolution of h-index, is a promising approach to evaluating a scholar's performance throughout his entire career. However, the lack of benchmark dataset that could be used to compare and evaluate various new and existing h-sequence methods has limited the development of h-sequence or other time series indicators. In order to solve this problem, we have crawled about 7,276,970 papers in computer science field. After that, we find the most cited papers t that could identify out 200 top scientists and 50 ordinary scientists. Finally, we construct a benchmark dataset called ScholarSeq which contains information of 150 particular scholars who are major in computer science field. The dataset includes 37,900 papers published by these authors and 3,263,813 citing papers. ScholarSeq provides citation counts in each individual year for each paper, which can be applied to various academic career impact assessments based on time sequence such as h-sequence. Furthermore, it is of great significance that we package the dataset in paper-time matrices so that informetricians can easily get access to and study various innovative sequences of impact measures. In order to illustrate how to use ScholarSeq, we apply the dataset to analyze 4 state-of-the-arts h-sequence methods. Moreover, we have shared source codes, entire dataset and many other files on our website at http://scholarseq.beyondcloud.cn/.