{"title":"Literature Filtering for Systematic Reviews with Transformers","authors":"John Hawkins, David Tivey","doi":"arxiv-2405.20354","DOIUrl":"https://doi.org/arxiv-2405.20354","url":null,"abstract":"Identifying critical research within the growing body of academic work is an\u0000essential element of quality research. Systematic review processes, used in\u0000evidence-based medicine, formalise this as a procedure that must be followed in\u0000a research program. However, it comes with an increasing burden in terms of the\u0000time required to identify the important articles of research for a given topic.\u0000In this work, we develop a method for building a general-purpose filtering\u0000system that matches a research question, posed as a natural language\u0000description of the required content, against a candidate set of articles\u0000obtained via the application of broad search terms. Our results demonstrate\u0000that transformer models, pre-trained on biomedical literature then fine tuned\u0000for the specific task, offer a promising solution to this problem. The model\u0000can remove large volumes of irrelevant articles for most research questions.","PeriodicalId":501285,"journal":{"name":"arXiv - CS - Digital Libraries","volume":"41 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141252532","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Detection of the papermilling behavior","authors":"Igor Podlubny","doi":"arxiv-2405.19872","DOIUrl":"https://doi.org/arxiv-2405.19872","url":null,"abstract":"Based on the analysis of the data obtainable from the Web of Science\u0000publication and citation database, typical signs of possible papermilling\u0000behavior are described, quantified, and illustrated by examples. A MATLAB\u0000function is provided for the analysis of the outputs from the Web of Science. A\u0000new quantitative indicator -- integrity index, or I-index -- is proposed for\u0000using it along with standard bibliographic and scientometric indicators.","PeriodicalId":501285,"journal":{"name":"arXiv - CS - Digital Libraries","volume":"41 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141190218","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On the modification and revocation of open source licences","authors":"Paul Gagnon, Misha Benjamin, Justine Gauthier, Catherine Regis, Jenny Lee, Alexei Nordell-Markovits","doi":"arxiv-2407.13064","DOIUrl":"https://doi.org/arxiv-2407.13064","url":null,"abstract":"Historically, open source commitments have been deemed irrevocable once\u0000materials are released under open source licenses. In this paper, the authors\u0000argue for the creation of a subset of rights that allows open source\u0000contributors to force users to (i) update to the most recent version of a\u0000model, (ii) accept new use case restrictions, or even (iii) cease using the\u0000software entirely. While this would be a departure from the traditional open\u0000source approach, the legal, reputational and moral risks related to\u0000open-sourcing AI models could justify contributors having more control over\u0000downstream uses. Recent legislative changes have also opened the door to\u0000liability of open source contributors in certain cases. The authors believe\u0000that contributors would welcome the ability to ensure that downstream users are\u0000implementing updates that address issues like bias, guardrail workarounds or\u0000adversarial attacks on their contributions. Finally, this paper addresses how\u0000this license category would interplay with RAIL licenses, and how it should be\u0000operationalized and adopted by key stakeholders such as OSS platforms and\u0000scanning tools.","PeriodicalId":501285,"journal":{"name":"arXiv - CS - Digital Libraries","volume":"47 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141737146","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Generation and human-expert evaluation of interesting research ideas using knowledge graphs and large language models","authors":"Xuemei Gu, Mario Krenn","doi":"arxiv-2405.17044","DOIUrl":"https://doi.org/arxiv-2405.17044","url":null,"abstract":"Advanced artificial intelligence (AI) systems with access to millions of\u0000research papers could inspire new research ideas that may not be conceived by\u0000humans alone. However, how interesting are these AI-generated ideas, and how\u0000can we improve their quality? Here, we introduce SciMuse, a system that uses an\u0000evolving knowledge graph built from more than 58 million scientific papers to\u0000generate personalized research ideas via an interface to GPT-4. We conducted a\u0000large-scale human evaluation with over 100 research group leaders from the Max\u0000Planck Society, who ranked more than 4,000 personalized research ideas based on\u0000their level of interest. This evaluation allows us to understand the\u0000relationships between scientific interest and the core properties of the\u0000knowledge graph. We find that data-efficient machine learning can predict\u0000research interest with high precision, allowing us to optimize the\u0000interest-level of generated research ideas. This work represents a step towards\u0000an artificial scientific muse that could catalyze unforeseen collaborations and\u0000suggest interesting avenues for scientists.","PeriodicalId":501285,"journal":{"name":"arXiv - CS - Digital Libraries","volume":"15 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141165440","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Eamon Duede, William Dolan, André Bauer, Ian Foster, Karim Lakhani
{"title":"Oil & Water? Diffusion of AI Within and Across Scientific Fields","authors":"Eamon Duede, William Dolan, André Bauer, Ian Foster, Karim Lakhani","doi":"arxiv-2405.15828","DOIUrl":"https://doi.org/arxiv-2405.15828","url":null,"abstract":"This study empirically investigates claims of the increasing ubiquity of\u0000artificial intelligence (AI) within roughly 80 million research publications\u0000across 20 diverse scientific fields, by examining the change in scholarly\u0000engagement with AI from 1985 through 2022. We observe exponential growth, with\u0000AI-engaged publications increasing approximately thirteenfold (13x) across all\u0000fields, suggesting a dramatic shift from niche to mainstream. Moreover, we\u0000provide the first empirical examination of the distribution of AI-engaged\u0000publications across publication venues within individual fields, with results\u0000that reveal a broadening of AI engagement within disciplines. While this\u0000broadening engagement suggests a move toward greater disciplinary integration\u0000in every field, increased ubiquity is associated with a semantic tension\u0000between AI-engaged research and more traditional disciplinary research. Through\u0000an analysis of tens of millions of document embeddings, we observe a complex\u0000interplay between AI-engaged and non-AI-engaged research within and across\u0000fields, suggesting that increasing ubiquity is something of an oil-and-water\u0000phenomenon -- AI-engaged work is spreading out over fields, but not mixing well\u0000with non-AI-engaged work.","PeriodicalId":501285,"journal":{"name":"arXiv - CS - Digital Libraries","volume":"56 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141165434","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Markus Stocker, Lauren Snyder, Matthew Anfuso, Oliver Ludwig, Freya Thießen, Kheir Eddine Farfar, Muhammad Haris, Allard Oelen, Mohamad Yaser Jaradeh
{"title":"Rethinking the production and publication of machine-reusable expressions of research findings","authors":"Markus Stocker, Lauren Snyder, Matthew Anfuso, Oliver Ludwig, Freya Thießen, Kheir Eddine Farfar, Muhammad Haris, Allard Oelen, Mohamad Yaser Jaradeh","doi":"arxiv-2405.13129","DOIUrl":"https://doi.org/arxiv-2405.13129","url":null,"abstract":"Literature is the primary expression of scientific knowledge and an important\u0000source of research data. However, scientific knowledge expressed in narrative\u0000text documents is not inherently machine reusable. To facilitate knowledge\u0000reuse, e.g. for synthesis research, scientific knowledge must be extracted from\u0000articles and organized into databases post-publication. The high time costs and\u0000inaccuracies associated with completing these activities manually has driven\u0000the development of techniques that automate knowledge extraction. Tackling the\u0000problem with a different mindset, we propose a pre-publication approach, known\u0000as reborn, that ensures scientific knowledge is born reusable, i.e. produced in\u0000a machine-reusable format during knowledge production. We implement the\u0000approach using the Open Research Knowledge Graph infrastructure for FAIR\u0000scientific knowledge organization. We test the approach with three use cases,\u0000and discuss the role of publishers and editors in scaling the approach. Our\u0000results suggest that the proposed approach is superior compared to classical\u0000manual and semi-automated post-publication extraction techniques in terms of\u0000knowledge richness and accuracy as well as technological simplicity.","PeriodicalId":501285,"journal":{"name":"arXiv - CS - Digital Libraries","volume":"14 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141149818","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Transfer Learning Approach for Railway Technical Map (RTM) Component Identification","authors":"Obadage Rochana Rumalshan, Pramuka Weerasinghe, Mohamed Shaheer, Prabhath Gunathilake, Erunika Dayaratna","doi":"arxiv-2405.13229","DOIUrl":"https://doi.org/arxiv-2405.13229","url":null,"abstract":"The extreme popularity over the years for railway transportation urges the\u0000necessity to maintain efficient railway management systems around the globe.\u0000Even though, at present, there exist a large collection of Computer Aided\u0000Designed Railway Technical Maps (RTMs) but available only in the portable\u0000document format (PDF). Using Deep Learning and Optical Character Recognition\u0000techniques, this research work proposes a generic system to digitize the\u0000relevant map component data from a given input image and create a formatted\u0000text file per image. Out of YOLOv3, SSD and Faster-RCNN object detection models\u0000used, Faster-RCNN yields the highest mean Average Precision (mAP) and the\u0000highest F1 score values 0.68 and 0.76 respectively. Further it is proven from\u0000the results obtained that, one can improve the results with OCR when the text\u0000containing image is being sent through a sophisticated pre-processing pipeline\u0000to remove distortions.","PeriodicalId":501285,"journal":{"name":"arXiv - CS - Digital Libraries","volume":"23 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141149685","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Scientific discourse on YouTube: Motivations for citing research in comments","authors":"Sören Striewski, Olga Zagovora, Isabella Peters","doi":"arxiv-2405.12798","DOIUrl":"https://doi.org/arxiv-2405.12798","url":null,"abstract":"YouTube is a valuable source of user-generated content on a wide range of\u0000topics, and it encourages user participation through the use of a comment\u0000system. Video content is increasingly addressing scientific topics, and there\u0000is evidence that both academics and consumers use video descriptions and video\u0000comments to refer to academic research and scientific publications. Because\u0000commenting is a discursive behavior, this study will provide insights on why\u0000individuals post links to research publications in comments. For this, a\u0000qualitative content analysis and iterative coding approach were applied.\u0000Furthermore, the reasons for mentioning academic publications in comments were\u0000contrasted with the reasons for citing in scholarly works and with reasons for\u0000commenting on YouTube. We discovered that the primary motives for sharing\u0000research links were (1) providing more insights into the topic and (2)\u0000challenging information offered by other commentators.","PeriodicalId":501285,"journal":{"name":"arXiv - CS - Digital Libraries","volume":"42 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141149686","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Amplifying Academic Research through YouTube: Engagement Metrics as Predictors of Citation Impact","authors":"Olga Zagovora, Talisa Schwal, Katrin Weller","doi":"arxiv-2405.12734","DOIUrl":"https://doi.org/arxiv-2405.12734","url":null,"abstract":"This study explores the interplay between YouTube engagement metrics and the\u0000academic impact of cited publications within video descriptions, amid declining\u0000trust in traditional journalism and increased reliance on social media for\u0000information. By analyzing data from Altmetric.com and YouTube's API, it\u0000assesses how YouTube video features relate to citation impact. Initial results\u0000suggest that videos citing scientific publications and garnering high\u0000engagement-likes, comments, and references to other publications-may function\u0000as a filtering mechanism or even as a predictor of impactful research.","PeriodicalId":501285,"journal":{"name":"arXiv - CS - Digital Libraries","volume":"47 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141149783","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nuria Bautista-Puig, Enrique Orduna-Malea, Philippe Mongeon
{"title":"The participation of public in knowledge production: a citizen science projects overview","authors":"Nuria Bautista-Puig, Enrique Orduna-Malea, Philippe Mongeon","doi":"arxiv-2405.10829","DOIUrl":"https://doi.org/arxiv-2405.10829","url":null,"abstract":"Citizen Science (CS) is related to public engagement in scientific research.\u0000The tasks in which the citizens can be involved are diverse and can range from\u0000data collection and tagging images to participation in the planning and\u0000research design. However, little is known about the involvement degree of the\u0000citizens to CS projects, and the contribution of those projects to the\u0000advancement of knowledge (e.g. scientific outcomes). This study aims to gain a\u0000better understanding by analysing the SciStarter database. A total of 2,346 CS\u0000projects were identified, mainly from Ecology and Environmental Sciences. Of\u0000these projects, 91% show low participation of the citizens (Level 1 \"citizens\u0000as sensors\" and 2 \"citizens as interpreters\", from Haklay's scale). In terms of\u0000scientific output, 918 papers indexed in the Web of Science (WoS) were\u0000identified. The most prolific projects were found to have lower levels of\u0000citizen involvement, specifically at Levels 1 and 2.","PeriodicalId":501285,"journal":{"name":"arXiv - CS - Digital Libraries","volume":"98 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141149684","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}