C. Daraio, R. Bruni, G. Catalano, Alessandro Daraio, G. Matteucci, M. Scannapieco, Daniel Wagner-Schuster, B. Lepori
{"title":"A Tailor-made Data Quality Approach for Higher Educational Data","authors":"C. Daraio, R. Bruni, G. Catalano, Alessandro Daraio, G. Matteucci, M. Scannapieco, Daniel Wagner-Schuster, B. Lepori","doi":"10.2478/jdis-2020-0029","DOIUrl":"https://doi.org/10.2478/jdis-2020-0029","url":null,"abstract":"Abstract Purpose This paper relates the definition of data quality procedures for knowledge organizations such as Higher Education Institutions. The main purpose is to present the flexible approach developed for monitoring the data quality of the European Tertiary Education Register (ETER) database, illustrating its functioning and highlighting the main challenges that still have to be faced in this domain. Design/methodology/approach The proposed data quality methodology is based on two kinds of checks, one to assess the consistency of cross-sectional data and the other to evaluate the stability of multiannual data. This methodology has an operational and empirical orientation. This means that the proposed checks do not assume any theoretical distribution for the determination of the threshold parameters that identify potential outliers, inconsistencies, and errors in the data. Findings We show that the proposed cross-sectional checks and multiannual checks are helpful to identify outliers, extreme observations and to detect ontological inconsistencies not described in the available meta-data. For this reason, they may be a useful complement to integrate the processing of the available information. Research limitations The coverage of the study is limited to European Higher Education Institutions. The cross-sectional and multiannual checks are not yet completely integrated. Practical implications The consideration of the quality of the available data and information is important to enhance data quality-aware empirical investigations, highlighting problems, and areas where to invest for improving the coverage and interoperability of data in future data collection initiatives. Originality/value The data-driven quality checks proposed in this paper may be useful as a reference for building and monitoring the data quality of new databases or of existing databases available for other countries or systems characterized by high heterogeneity and complexity of the units of analysis without relying on pre-specified theoretical distributions.","PeriodicalId":92237,"journal":{"name":"Journal of data and information science (Warsaw, Poland)","volume":"5 1","pages":"129 - 160"},"PeriodicalIF":0.0,"publicationDate":"2020-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43552469","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shushanik A. Sargsyan, E. Gzoyan, A. Mirzoyan, V. Blaginin
{"title":"Scientometric Implosion that Leads to Explosion: Case Study of Armenian Journals","authors":"Shushanik A. Sargsyan, E. Gzoyan, A. Mirzoyan, V. Blaginin","doi":"10.2478/jdis-2020-0028","DOIUrl":"https://doi.org/10.2478/jdis-2020-0028","url":null,"abstract":"Abstract Purpose The purpose of this study is to introduce a new concept and term into the scientometric discourse and research—scientometric implosion—and test the idea on the example of the Armenian journals. The article argues that the existence of a compressed scientific area in the country makes pressure on the journals and after some time this pressure makes one or several journals explode—break the limited national scientific area and move to the international arena. As soon as one of the local journals breaks through this compressed space and appears at an international level, further explosion happens, which makes the other journals follow the same path. Design/methodology/approach Our research is based on three international scientific databases—WoS, Scopus, and RISC CC, from where we have retrieved information about the Armenian journals indexed there and citations received by those journals and one national database—the Armenian Science Citation Index. Armenian Journal Impact Factor (ArmJIF) was calculated for the local Armenian journals based on the general impact factor formula. Journals were classified according to Glänzel and Schubert (2003). Findings Our results show that the science policy developed by the scientific authorities of Armenia and the introduction of ArmJIF have made the Armenian journals comply with international standards and resulted in some local journals to break the national scientific territory and be indexed in international scientific databases of RISC, Scopus, and WoS. Apart from complying with technical requirements, the journals start publishing articles also in foreign languages. Although nearly half of the local journals are in the fields of social sciences and humanities, only one journal from that field is indexed in international scientific databases. Research limitation One of the limitations of the study is that it was performed on the example of only one state and the second one is that more time passage is needed to firmly evaluate the results. However, the introduction of the concept can inspire other similar case study. Practical implications The new term and relevant model offered in the article can practically be used for the development of national journals. Originality/value The article proposes a new term and a concept in scientometrics.","PeriodicalId":92237,"journal":{"name":"Journal of data and information science (Warsaw, Poland)","volume":"5 1","pages":"187 - 196"},"PeriodicalIF":0.0,"publicationDate":"2020-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47388523","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The Gender Patenting Gap: A Study on the Iberoamerican Countries","authors":"Danilo S. Carvalho, Lydia Bares, Kelyane Silva","doi":"10.2478/jdis-2020-0025","DOIUrl":"https://doi.org/10.2478/jdis-2020-0025","url":null,"abstract":"Abstract Purpose This work presents a study on the female involvement in patent applications in all 23 Ibero-American countries that are WIPO members, in order to measure gender inequalities in institutional collaborations and technological fields, across time. Design/methodology/approach The data used in this paper come from EPO Worldwide Patent Statistical Database (PATSTAT). PATSTAT contains bibliographical data relating to more than 100 million patent documents from leading industrialized and developing countries, as well as legal event data from more than 40 patent authorities contained in the EPO worldwide legal event data (INPADOC). The extracted subset is composed of 150,863 patent applications with priority years between 2007 and 2016. Findings Our observations indicate that even in more dynamic economies such as Portugal and Spain, the participation of women per patent applications does not exceed 30%. Additionally, the distribution of female participation among institutional sectors and technological fields is consistent with previous studies in other regions and indicate a socio-cultural divide. Research limitations Unisex names were not considered and were counted as gender unknown, and patent applications for which no inventor information was available were discarded, but further effort of data analysis may provide more information about gender inequalities. Practical implications While patents are imperfect variables of inventive step and therefore should be considered as a variable proxy of innovation, our findings may help to guide the implementation of policies for balancing gender participation in innovative activities, as well as instigating research into the issues causing divisive participation along gender lines. Originality/value While there is a widespread effort into evaluating and improving the participation of groups recognized as minorities within state-of-the-art activities, research about women participation in the innovation sector is fragmented due to differing regional characteristics: industrial and academic segmentation, socio-economic disparities, and cultural factors. Thus, localized studies present an opportunity of filling the gaps of knowledge on societal participation in innovation activities.","PeriodicalId":92237,"journal":{"name":"Journal of data and information science (Warsaw, Poland)","volume":"5 1","pages":"116 - 128"},"PeriodicalIF":0.0,"publicationDate":"2020-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49099148","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Acknowledgment of Libraries in the Journal Literature: An Exploratory Study","authors":"David E. Hubbard, Sierra Laddusaw","doi":"10.2478/jdis-2020-0023","DOIUrl":"https://doi.org/10.2478/jdis-2020-0023","url":null,"abstract":"Abstract Purpose This study examines acknowledgments to libraries in the journal literature, as well as the efficacy of using Web of Science (WoS) to locate general acknowledgment text. Design/methodology/approach This mixed-methods approach quantifies and characterizes acknowledgments to libraries in the journal literature. Using WoS's Funding Text field, the acknowledgments for six peer universities were identified and then characterized. The efficacy of using WoS to locate library acknowledgments was assessed by comparing the WoS Funding Text search results to the actual acknowledgment text found in the articles. Findings Acknowledgments to libraries were found in articles at all six peer universities, though the absolute and relative numbers were quite low (< 0.5%). Most of the library acknowledgments were for resources (collections, funding, etc.), and many were concentrated in natural history (e.g. zoology). Examination of Texas A&M University zoology articles found that 91.7% of the funding information came from “acknowledgments” and not specifically a funding acknowledgment section. The WoS Funding Text search found 56% of the library acknowledgments compared to a search of the actual acknowledgment text in the articles. Research limitations Limiting publications to journals, using a single truncated search term, and including only six research universities in the United States. Practical implications This study examined library acknowledgments, but the same approach could be applied to searches of other keywords, institutions/organizations, individuals, etc. While not specifically designed to search general acknowledgments, WoS's Funding Text field can be used as an exploratory tool to search acknowledgments beyond funding. Originality/value There are a few studies that have examined library acknowledgments in the scholarly literature, though to date none of those studies have examined the efficacy of using the WoS Funding Text field to locate those library acknowledgments within the journal literature.","PeriodicalId":92237,"journal":{"name":"Journal of data and information science (Warsaw, Poland)","volume":"5 1","pages":"178 - 186"},"PeriodicalIF":0.0,"publicationDate":"2020-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42638146","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Library and Information Science Papers Discussed on Twitter: A new Network-based Approach for Measuring Public Attention","authors":"R. Haunschild, L. Leydesdorff, L. Bornmann","doi":"10.2478/jdis-2020-0017","DOIUrl":"https://doi.org/10.2478/jdis-2020-0017","url":null,"abstract":"Abstract Purpose In recent years, one can witness a trend in research evaluation to measure the impact on society or attention to research by society (beyond science). We address the following question: can Twitter be meaningfully used for the mapping of public and scientific discourses? Design/methodology/approach Recently, Haunschild et al. (2019) introduced a new network-oriented approach for using Twitter data in research evaluation. Such a procedure can be used to measure the public discussion around a specific field or topic. In this study, we used all papers published in the Web of Science (WoS, Clarivate Analytics) subject category Information Science & Library Science to explore the publicly discussed topics from the area of library and information science (LIS) in comparison to the topics used by scholars in their publications in this area. Findings The results show that LIS papers are represented rather well on Twitter. Similar topics appear in the networks of author keywords of all LIS papers, not tweeted LIS papers, and tweeted LIS papers. The networks of the author keywords of all LIS papers and not tweeted LIS papers are most similar to each other. Research limitations Only papers published since 2011 with DOI were analyzed. Practical implications Although Twitter data do not seem to be useful for quantitative research evaluation, it seems that Twitter data can be used in a more qualitative way for mapping of public and scientific discourses. Originality/value This study explores a rather new methodology for comparing public and scientific discourses.","PeriodicalId":92237,"journal":{"name":"Journal of data and information science (Warsaw, Poland)","volume":"5 1","pages":"17 - 5"},"PeriodicalIF":0.0,"publicationDate":"2020-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42321207","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jessica Cox, Darin McBeath, Corey A. Harper, Ron Daniel
{"title":"Co-occurrence of Cell Lines, Basal Media and Supplementation in the Biomedical Research Literature","authors":"Jessica Cox, Darin McBeath, Corey A. Harper, Ron Daniel","doi":"10.2478/jdis-2020-0016","DOIUrl":"https://doi.org/10.2478/jdis-2020-0016","url":null,"abstract":"Abstract Purpose The use of in vitro cell culture and experimentation is a cornerstone of biomedical research, however, more attention has recently been given to the potential consequences of using such artificial basal medias and undefined supplements. As a first step towards better understanding and measuring the impact these systems have on experimental results, we use text mining to capture typical research practices and trends around cell culture. Design/methodology/approach To measure the scale of in vitro cell culture use, we have analyzed a corpus of 94,695 research articles that appear in biomedical research journals published in ScienceDirect from 2000–2018. Central to our investigation is the observation that studies using cell culture describe conditions using the typical sentence structure of cell line, basal media, and supplemented compounds. Here we tag our corpus with a curated list of basal medias and the Cellosaurus ontology using the Aho-Corasick algorithm. We also processed the corpus with Stanford CoreNLP to find nouns that follow the basal media, in an attempt to identify supplements used. Findings Interestingly, we find that researchers frequently use DMEM even if a cell line's vendor recommends less concentrated media. We see long-tailed distributions for the usage of media and cell lines, with DMEM and RPMI dominating the media, and HEK293, HEK293T, and HeLa dominating cell lines used. Research limitations Our analysis was restricted to documents in ScienceDirect, and our text mining method achieved high recall but low precision and mandated manual inspection of many tokens. Practical implications Our findings document current cell culture practices in the biomedical research community, which can be used as a resource for future experimental design. Originality/value No other work has taken a text mining approach to surveying cell culture practices in biomedical research.","PeriodicalId":92237,"journal":{"name":"Journal of data and information science (Warsaw, Poland)","volume":"5 1","pages":"161 - 177"},"PeriodicalIF":0.0,"publicationDate":"2020-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44045294","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The Compound F2-index and the Compound H-index as Extension of the f2 and h-indexes from a Dynamic Perspective①","authors":"Y. Fassin","doi":"10.2478/jdis-2020-0019","DOIUrl":"https://doi.org/10.2478/jdis-2020-0019","url":null,"abstract":"Abstract Purpose Elaboration of an indicator to include the dynamic aspect of citations in bibliometric indexes. Design/methodology/approach A new bibliometric methodology—the f2-index—is applied at the career level and at the level of the recent 5 years to analyze the dynamic aspect of bibliometrics. The method is applied, as an illustration, to the field of corporate governance. Findings The compound F2-index as an extension of the f2-index recognizes past achievements but also values new research work with potential. The method is extended to the h-index and the h2-index. An activity index is defined as the ratio between the recent h’-index to the career h-index. Research limitations The compound F2 and H-indexes are PAC, probably approximately correct, and depend on the selection and database. Practical implications The F2- and H compound indexes allow identifying the rising stars of a field from a dynamic perspective. The activity ratio highlights the contribution of younger researchers. Originality/value The new methodology demonstrates the underestimated dynamic capacity of bibliometric research.","PeriodicalId":92237,"journal":{"name":"Journal of data and information science (Warsaw, Poland)","volume":"5 1","pages":"71 - 83"},"PeriodicalIF":0.0,"publicationDate":"2020-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49661066","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Evidence-based Nomenclature and Taxonomy of Research Impact Indicators","authors":"M. Arsalan, Omar Mubin, A. Mahmud","doi":"10.2478/jdis-2020-0018","DOIUrl":"https://doi.org/10.2478/jdis-2020-0018","url":null,"abstract":"Abstract Purpose This study aims to classify research impact indicators based on their characteristics and scope. A concept of evidence-based nomenclature of research impact (RI) indicator has been introduced for generalization and transformation of scope. Design/methodology/approch Literature was collected related to the research impact assessment. It was categorized in conceptual and applied case studies. One hundred and nineteen indicators were selected to prepare classification and nomenclature. The nomenclature was developed based on the principle—“every indicator is a contextual-function to explain the impact”. Every indicator was disintegrated into three parts, i.e. Function, Domain, and Target Areas. Findings The main functions of research impact indicators express improvement (63%), recognition (23%), and creation/development (14%). The focus of research impact indicators in literature is more towards the academic domain (59%) whereas the environment/sustainability domain is least considered (4%). As a result, research impact related to the research aspects is felt the most (29%). Other target areas include system and services, methods and procedures, networking, planning, policy development, economic aspects and commercialisation, etc. Research limitations This research applied to 119 research impact indicators. However, the inclusion of additional indicators may change the result. Practical implications The plausible effect of nomenclature is a better organization of indicators with appropriate tags of functions, domains, and target areas. This approach also provides a framework of indicator generalization and transformation. Therefore, similar indicators can be applied in other fields and target areas with modifications. Originality/value The development of nomenclature for research impact indicators is a novel approach in scientometrics. It is developed on the same line as presented in other scientific disciplines, where fundamental objects need to classify on common standards such as biology and chemistry.","PeriodicalId":92237,"journal":{"name":"Journal of data and information science (Warsaw, Poland)","volume":"5 1","pages":"33 - 56"},"PeriodicalIF":0.0,"publicationDate":"2020-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48269493","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Discipline Impact Factor: Some of Its History, Some of the Author's Experience of Its Application, the Continuing Reasons for Its Use and… Next Beyond","authors":"V. Lazarev","doi":"10.2478/jdis-2020-0015","DOIUrl":"https://doi.org/10.2478/jdis-2020-0015","url":null,"abstract":"Abstract Purpose This work aims to consider the role and some of the 42-year history of the discipline impact factor (DIF) in evaluation of serial publications. Also, the original “symmetric” indicator called the “discipline susceptibility factor” is to be presented. Design/methodology/approach In accordance with the purpose of the work, the methods are analytical interpretation of the scientific literature related to this problem as well as speculative explanations. The information base of the research is bibliometric publications dealing with impact, impact factor, discipline impact factor, and discipline susceptibility factor. Findings Examples of the DIF application and modification of the indicator are given. It is shown why research and university libraries need to use the DIF to evaluate serials in conditions of scarce funding for subscription to serial publications, even if open access is available. The role of the DIF for evaluating journals by authors of scientific papers when choosing a good and right journal for submitting a paper is also briefly discussed. An original indicator “symmetrical” to the DIF (the “discipline susceptibility factor”) and its differences from the DIF in terms of content and purpose of evaluation are also briefly presented. Research limitations The selection of publications for the information base of the research did not include those in which the DIF was only mentioned, used partially or not for its original purpose. Restrictions on the length of the article to be submitted in this special issue of the JDIS also caused exclusion even a number of completely relevant publications. Consideration of the DIF is not placed in the context of describing other derivatives from the Garfield impact factor. Practical implications An underrated bibliometric indicator, viz. the discipline impact factor is being promoted for the practical application. An original indicator “symmetrical” to DIF has been proposed in order of searching serial publications representing the external research fields that might fit for potential applications of the results of scientific activities obtained within the framework of the specific research field represented by the cited specialized journals. Both can be useful in research and university libraries in their endeavors to improve scientific information services. Also, both can be used for evaluating journals by authors of scientific papers when choosing a journal to submit a paper. Originality/value The article substantiates the need to evaluate scientific serial publications in library activities—even in conditions of access to huge and convenient databases (subscription packages) and open access to a large number of serial publications. It gives a mini-survey of the history of one of the methods of such evaluation, and offers an original method for evaluating scientific serial publications.","PeriodicalId":92237,"journal":{"name":"Journal of data and information science (Warsaw, Poland)","volume":"5 1","pages":"197 - 209"},"PeriodicalIF":0.0,"publicationDate":"2020-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46163700","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Historical Bibliometrics Using Google Scholar: The Case of Roman Law, 1727–2016","authors":"Janne Pölönen, Björn Hammarfelt","doi":"10.2478/jdis-2020-0024","DOIUrl":"https://doi.org/10.2478/jdis-2020-0024","url":null,"abstract":"Abstract Purpose The purpose of this study is to investigate the historical and linguistic coverage of Google Scholar, using publications in the field of Roman law as an example. Design/methodology/approach To create a dataset of Roman law publications, we retrieved a total of 21,300 records of publications, published between years 1500 and 2016, with title including words denoting “Roman law” in English, French, German, Italian, and Spanish. Findings We were able to find publications dating back to 1727. The largest number of publications and authors date to the late 19th century, and this peak might be explained by the role of Roman law in French legal education at the time. Furthermore, we found exceptionally skewed concentration of publications to authors, as well as of citations to publications. We speculate that this could be explained by the long time-frame of the study, and the importance of classic works. Research limitation Major limitations, and potential future work, relate to data quality, and cleaning, disambiguation of publications and authors, as well as comparing coverage with other data sources. Practical implications We find Google Scholar to be a promising data source for historical bibliometrics. This approach may help bridge the gap between bibliometrics and the “digital humanities”. Originality/value Earlier studies have focused mainly on Google Scholar's coverage of publications and citations in general, or in specific fields. The historical coverage has, however, received less attention.","PeriodicalId":92237,"journal":{"name":"Journal of data and information science (Warsaw, Poland)","volume":"5 1","pages":"18 - 32"},"PeriodicalIF":0.0,"publicationDate":"2020-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44190729","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}