{"title":"Regression discontinuity design and its applications to Science of Science: A survey","authors":"Mei Li, Yang Zhang, Yang Wang","doi":"10.2478/jdis-2023-0008","DOIUrl":"https://doi.org/10.2478/jdis-2023-0008","url":null,"abstract":"Abstract Purpose With the availability of large-scale scholarly datasets, scientists from various domains hope to understand the underlying mechanisms behind science, forming a vibrant area of inquiry in the emerging “science of science” field. As the results from the science of science often has strong policy implications, understanding the causal relationships between variables becomes prominent. However, the most credible quasi-experimental method among all causal inference methods, and a highly valuable tool in the empirical toolkit, Regression Discontinuity Design (RDD) has not been fully exploited in the field of science of science. In this paper, we provide a systematic survey of the RDD method, and its practical applications in the science of science. Design/methodology/approach First, we introduce the basic assumptions, mathematical notations, and two types of RDD, i.e., sharp and fuzzy RDD. Second, we use the Web of Science and the Microsoft Academic Graph datasets to study the evolution and citation patterns of RDD papers. Moreover, we provide a systematic survey of the applications of RDD methodologies in various scientific domains, as well as in the science of science. Finally, we demonstrate a case study to estimate the effect of Head Start Funding Proposals on child mortality. Findings RDD was almost neglected for 30 years after it was first introduced in 1960. Afterward, scientists used mathematical and economic tools to develop the RDD methodology. After 2010, RDD methods showed strong applications in various domains, including medicine, psychology, political science and environmental science. However, we also notice that the RDD method has not been well developed in science of science research. Research Limitations This work uses a keyword search to obtain RDD papers, which may neglect some related work. Additionally, our work does not aim to develop rigorous mathematical and technical details of RDD but rather focuses on its intuitions and applications. Practical implications This work proposes how to use the RDD method in science of science research. Originality/value This work systematically introduces the RDD, and calls for the awareness of using such a method in the field of science of science.","PeriodicalId":92237,"journal":{"name":"Journal of data and information science (Warsaw, Poland)","volume":"8 1","pages":"43 - 65"},"PeriodicalIF":0.0,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47869216","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Evaluating grant proposals: lessons from using metrics as screening device","authors":"K. Guba, Alexey Zheleznov, Elena Chechik","doi":"10.2478/jdis-2023-0010","DOIUrl":"https://doi.org/10.2478/jdis-2023-0010","url":null,"abstract":"Abstract Purpose This study examines the effects of using publication-based metrics for the initial screening in the application process for a project leader. The key questions are whether formal policy affects the allocation of funds to researchers with a better publication record and how the previous academic performance of principal investigators is related to future project results. Design/methodology/approach We compared two competitions, before and after the policy raised the publication threshold for the principal investigators. We analyzed 9,167 papers published by 332 winners in physics and the social sciences and humanities (SSH), and 11,253 publications resulting from each funded project. Findings We found that among physicists, even in the first period, grants tended to be allocated to prolific authors publishing in high-quality journals. In contrast, the SSH project grantees had been less prolific in publishing internationally in both periods; however, in the second period, the selection of grant recipients yielded better results regarding awarding grants to more productive authors in terms of the quantity and quality of publications. There was no evidence that this better selection of grant recipients resulted in better publication records during grant realization. Originality This study contributes to the discussion of formal policies that rely on metrics for the evaluation of grant proposals. The Russian case shows that such policy may have a profound effect on changing the supply side of applicants, especially in disciplines that are less suitable for metric-based evaluations. In spite of the criticism given to metrics, they might be a useful additional instrument in academic systems where professional expertise is corrupted and prevents allocation of funds to prolific researchers.","PeriodicalId":92237,"journal":{"name":"Journal of data and information science (Warsaw, Poland)","volume":"8 1","pages":"66 - 92"},"PeriodicalIF":0.0,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46442307","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"How a systems perspective can help us with the interdisciplinarity puzzle","authors":"J. Eykens","doi":"10.2478/jdis-2023-0005","DOIUrl":"https://doi.org/10.2478/jdis-2023-0005","url":null,"abstract":"","PeriodicalId":92237,"journal":{"name":"Journal of data and information science (Warsaw, Poland)","volume":"8 1","pages":"2 - 8"},"PeriodicalIF":0.0,"publicationDate":"2023-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41636586","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Practical operation and theoretical basis of difference-in-difference regression in science of science: The comparative trial on the scientific performance of Nobel laureates versus their coauthors","authors":"Yurui Huang, Chaolin Tian, Yifang Ma","doi":"10.2478/jdis-2023-0003","DOIUrl":"https://doi.org/10.2478/jdis-2023-0003","url":null,"abstract":"Abstract Purpose In recent decades, with the availability of large-scale scientific corpus datasets, difference-in-difference (DID) is increasingly used in the science of science and bibliometrics studies. DID method outputs the unbiased estimation on condition that several hypotheses hold, especially the common trend assumption. In this paper, we gave a systematic demonstration of DID in the science of science, and the potential ways to improve the accuracy of DID method. Design/methodology/approach At first, we reviewed the statistical assumptions, the model specification, and the application procedures of DID method. Second, to improve the necessary assumptions before conducting DID regression and the accuracy of estimation, we introduced some matching techniques serving as the pre-selecting step for DID design by matching control individuals who are equivalent to those treated ones on observational variables before the intervention. Lastly, we performed a case study to estimate the effects of prizewinning on the scientific performance of Nobel laureates, by comparing the yearly citation impact after the prizewinning year between Nobel laureates and their prizewinning-work coauthors. Findings We introduced the procedures to conduct a DID estimation and demonstrated the effectiveness to use matching method to improve the results. As a case study, we found that there are no significant increases in citations for Nobel laureates compared to their prizewinning coauthors. Research limitations This study ignored the rigorous mathematical deduction parts of DID, while focused on the practical parts. Practical implications This work gives experimental practice and potential guidelines to use DID method in science of science and bibliometrics studies. Originality/value This study gains insights into the usage of econometric tools in science of science.","PeriodicalId":92237,"journal":{"name":"Journal of data and information science (Warsaw, Poland)","volume":"8 1","pages":"29 - 46"},"PeriodicalIF":0.0,"publicationDate":"2023-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41942700","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Causal inference using regression-based statistical control: Confusion in Econometrics","authors":"Fan Chao, Guang Yu","doi":"10.2478/jdis-2023-0006","DOIUrl":"https://doi.org/10.2478/jdis-2023-0006","url":null,"abstract":"Abstract Regression is a widely used econometric tool in research. In observational studies, based on a number of assumptions, regression-based statistical control methods attempt to analyze the causation between treatment and outcome by adding control variables. However, this approach may not produce reliable estimates of causal effects. In addition to the shortcomings of the method, this lack of confidence is mainly related to ambiguous formulations in econometrics, such as the definition of selection bias, selection of core control variables, and method of testing for robustness. Within the framework of the causal models, we clarify the assumption of causal inference using regression-based statistical controls, as described in econometrics, and discuss how to select core control variables to satisfy this assumption and conduct robustness tests for regression estimates.","PeriodicalId":92237,"journal":{"name":"Journal of data and information science (Warsaw, Poland)","volume":"8 1","pages":"21 - 28"},"PeriodicalIF":0.0,"publicationDate":"2023-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46078264","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Large language models and scientific publishing","authors":"R. Rousseau, Liying Yang, J. Bollen, Zhesi Shen","doi":"10.2478/jdis-2023-0007","DOIUrl":"https://doi.org/10.2478/jdis-2023-0007","url":null,"abstract":"","PeriodicalId":92237,"journal":{"name":"Journal of data and information science (Warsaw, Poland)","volume":"8 1","pages":"1 - 1"},"PeriodicalIF":0.0,"publicationDate":"2023-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47427043","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shelia X. Wei, Helena H. Zhang, Howell Y. Wang, F. Y. Ye
{"title":"Identifying grey-rhino in eminent technologies via patent analysis","authors":"Shelia X. Wei, Helena H. Zhang, Howell Y. Wang, F. Y. Ye","doi":"10.2478/jdis-2023-0002","DOIUrl":"https://doi.org/10.2478/jdis-2023-0002","url":null,"abstract":"Abstract Purpose Following the typical features of the grey-rhino event as predictability and profound influence, we attempt to find a special pattern called the grey-rhino in eminent technologies via patent analysis. Design/methodology/approach We propose to combine triadic patent families and technology life cycle to define the grey-rhino model. Firstly, we design the indicator rhino-index Rh = ST/SP and descriptor sequence {Rh}, where ST and SP are the accumulative number of triadic patent families and all patent families respectively for a specific technology. Secondly, according to the two typical features of the grey-rhino event, a grey-rhino is defined as a technology that meets both qualitative and quantitative conditions. Qualitatively, this technology has a profound influence. Quantitatively, in the emerging stage, Rh ≥ Rae, where Rae is the average level of the proportion of triadic patent families. Finally, this model is verified in three datasets, namely Encyclopedia Britannica's list for the greatest inventions (EB technologies for short), MIT breakthrough technologies (MIT technologies) and Derwent Manual Code technologies (MAN technologies). Findings The result shows that there are 64.71% EB technologies and 50.00% MIT technologies meeting the quantitative standard of the grey-rhino model, but only 14.71% MAN technologies fit the quantitative standard. This falling trend indicates the quantitative standard of the grey-rhino model is reasonable. EB technologies and MIT technologies have profound influence on society, which means they satisfy the qualitative standard of the grey-rhino model. Hence, 64.71% EB technologies and 50.00% MIT technologies are grey-rhinos. In 14.71% MAN technologies meeting the quantitative standard, we make some qualitative judgments and deem U11-A01A, U12-A01A1A, and W01-A01A as grey-rhino technologies. In addition, grey-rhinos and non-grey-rhinos have some differences. Rh values of grey-rhinos have a downward trend, while Rh values of non-grey-rhinos have a contrary trend. Rh values of grey-rhinos are scattered relatively in the early stage and centralize gradually, but non-grey-rhinos do not have this feature. Research limitations There are four main limitations. First, if a technology satisfies the quantitative standard of the model, it is likely to be a grey-rhino but expert judgments are necessary. Second, we don’t know why it will be eminent, which involves technical contents. Thirdly, we did not consider the China National Intellectual Property Administration (CNIPA) and the German Patent and Trademark Office (DPMA) which also play important roles in worldwide patents, so we hope to expand our study to the CNIPA and the DPMA. Furthermore, we did not compare the rhino-index with other patent indicators. Practical implications If a technology meets the quantitative standard, this can be seen as early warning signals and the technology may become a grey-rhino in the future, which can catch people's at","PeriodicalId":92237,"journal":{"name":"Journal of data and information science (Warsaw, Poland)","volume":"8 1","pages":"47 - 71"},"PeriodicalIF":0.0,"publicationDate":"2023-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48514635","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Thelwall, K. Kousha, Meiko Makita, Mahshid Abdoli, E. Stuart, Paul Wilson, Jonathan M. Levitt
{"title":"Is big team research fair in national research assessments? The case of the UK Research Excellence Framework 2021","authors":"M. Thelwall, K. Kousha, Meiko Makita, Mahshid Abdoli, E. Stuart, Paul Wilson, Jonathan M. Levitt","doi":"10.2478/jdis-2023-0004","DOIUrl":"https://doi.org/10.2478/jdis-2023-0004","url":null,"abstract":"Abstract Collaborative research causes problems for research assessments because of the difficulty in fairly crediting its authors. Whilst splitting the rewards for an article amongst its authors has the greatest surface-level fairness, many important evaluations assign full credit to each author, irrespective of team size. The underlying rationales for this are labour reduction and the need to incentivise collaborative work because it is necessary to solve many important societal problems. This article assesses whether full counting changes results compared to fractional counting in the case of the UK's Research Excellence Framework (REF) 2021. For this assessment, fractional counting reduces the number of journal articles to as little as 10% of the full counting value, depending on the Unit of Assessment (UoA). Despite this large difference, allocating an overall grade point average (GPA) based on full counting or fractional counting gives results with a median Pearson correlation within UoAs of 0.98. The largest changes are for Archaeology (r=0.84) and Physics (r=0.88). There is a weak tendency for higher scoring institutions to lose from fractional counting, with the loss being statistically significant in 5 of the 34 UoAs. Thus, whilst the apparent over-weighting of contributions to collaboratively authored outputs does not seem too problematic from a fairness perspective overall, it may be worth examining in the few UoAs in which it makes the most difference.","PeriodicalId":92237,"journal":{"name":"Journal of data and information science (Warsaw, Poland)","volume":"8 1","pages":"9 - 20"},"PeriodicalIF":0.0,"publicationDate":"2022-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"69216606","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Peculiarities of gender disambiguation and ordering of non-English authors’ names for Economic papers beyond core databases","authors":"O. Mryglod, Serhii Nazarovets, S. Kozmenko","doi":"10.48550/arXiv.2211.16124","DOIUrl":"https://doi.org/10.48550/arXiv.2211.16124","url":null,"abstract":"Abstract Purpose To supplement the quantitative portrait of Ukrainian Economics discipline with the results of gender and author ordering analysis at the level of individual authors, special methods of working with bibliographic data with a predominant share of non-English authors are used. The properties of gender mixing, the likelihood of male and female authors occupying the first position in the authorship list, as well as the arrangements of names are studied. Design/methodology/approach A data set containing bibliographic records related to Ukrainian journal publications in the field of Economics is constructed using Crossref metadata. Partial semi-automatic disambiguation of authors’ names is performed. First names, along with gender-specific ethnic surnames, are used for gender disambiguation required for further comparative gender analysis. Random reshuffling of data is used to determine the impact of gender correlations. To assess the level of alphabetization for our data set, both Latin and Cyrillic versions of names are taken into account. Findings The lack of well-structured metadata and the poor use of digital identifiers lead to numerous problems with automatization of bibliographic data pre-processing, especially in the case of publications by non-Western authors. The described stages for working with such specific data help to work at the level of authors and analyse, in particular, gender issues. Despite the larger number of female authors, gender equality is more likely to be reported at the individual level for the discipline of Ukrainian Economics. The tendencies towards collaborative or solo-publications and gender mixing patterns are found to be dependent on the journal: the differences for publications indexed in Scopus and/or Web of Science databases are found. It has also been found that Ukrainian Economics research is characterized by rather a non-alphabetical order of authors. Research limitations Only partial authors’ name disambiguation is performed in a semi-automatic way. Gender labels can be derived only for authors declared by full First names or gender-specific Last names. Practical implications The typical features of Ukrainian Economic discipline can be used to perform a comparison with other countries and disciplines, to develop an informed-based assessment procedure at the national level. The proposed way of processing publication data can be borrowed to enrich metadata about other research disciplines, especially for non-English speaking countries. Originality/value To our knowledge, this is the first large-scale quantitative study of Ukrainian Economic discipline. The results obtained are valuable not only at the national level, but also contribute to general knowledge about Economic research, gender issues, and authors’ names ordering. An example of the use of Crossref data is provided, while this data source is still less used due to a number of drawbacks. Here, for the first time, attention is drawn to ","PeriodicalId":92237,"journal":{"name":"Journal of data and information science (Warsaw, Poland)","volume":"8 1","pages":"72 - 89"},"PeriodicalIF":0.0,"publicationDate":"2022-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42279156","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Subject Area Risk Assessment of Four Hungarian Universities with a View to the QS University Rankings by Subject","authors":"P. Sasvári, Anna Urbanovics","doi":"10.2478/jdis-2022-0023","DOIUrl":"https://doi.org/10.2478/jdis-2022-0023","url":null,"abstract":"Abstract Purpose The aim of our paper is to investigate the role of a mentor leading a research team in the overall scientific performance of an academic institution and the possible risks of their departure with a special attention to their publication output. Design/methodology/approach By using SciVal subject area data, we composed a formula describing the level of vulnerability of any given university in the case of losing any of its leading mentors, identifying other risk factors by dividing their careers into separate stages. Findings It turns out that the higher field-weighed citation impact is, the better position universities reach in the rankings by subject and the vulnerability of institutions highly depends on the mentors, especially in view of their contribution to the topic clusters. Research limitations The analysis covers the publication output of leading researchers working at four Hungarian universities, the scope of the analysis is worth being extended. Practical implications Our analysis has the potential to give an applicable systemic approach as well as a data collection scheme to university managements so as to formulate an inclusive and comprehensive research strategy involving the introduction of a reward system aimed at publications and further encouraging national and international research cooperation. Originality/value The methodology and the principles of risk assessment laid down in our paper are not restricted to measuring the vulnerability level of a limited group of academic institutions, they can be appropriately used for investigating the role of mentors or leading researchers at every university across the globe.","PeriodicalId":92237,"journal":{"name":"Journal of data and information science (Warsaw, Poland)","volume":"7 1","pages":"61 - 80"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47795519","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}