{"title":"'Intelligence Studies Network': A human-curated database for indexing resources with open-source tools","authors":"Yusuf A. Ozkan","doi":"arxiv-2408.03868","DOIUrl":"https://doi.org/arxiv-2408.03868","url":null,"abstract":"The Intelligence Studies Network is a comprehensive resource database for\u0000publications, events, conferences, and calls for papers in the field of\u0000intelligence studies. It offers a novel solution for monitoring, indexing, and\u0000visualising resources. Sources are automatically monitored and added to a\u0000manually curated database, ensuring the relevance of items to intelligence\u0000studies. Curated outputs are stored in a group library on Zotero, an\u0000open-source reference management tool. The metadata of items in Zotero is\u0000enriched with OpenAlex, an open access bibliographic database. Finally, outputs\u0000are listed and visualised on a Streamlit app, an open-source Python framework\u0000for building apps. This paper aims to explain the Intelligence Studies Network\u0000database and provide a detailed guide on data sources and the workflow. This\u0000study demonstrates that it is possible to create a specialised academic\u0000database by using open source tools.","PeriodicalId":501285,"journal":{"name":"arXiv - CS - Digital Libraries","volume":"21 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141930691","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Simplifying Scholarly Abstracts for Accessible Digital Libraries","authors":"Haining Wang, Jason Clark","doi":"arxiv-2408.03899","DOIUrl":"https://doi.org/arxiv-2408.03899","url":null,"abstract":"Standing at the forefront of knowledge dissemination, digital libraries\u0000curate vast collections of scientific literature. However, these scholarly\u0000writings are often laden with jargon and tailored for domain experts rather\u0000than the general public. As librarians, we strive to offer services to a\u0000diverse audience, including those with lower reading levels. To extend our\u0000services beyond mere access, we propose fine-tuning a language model to rewrite\u0000scholarly abstracts into more comprehensible versions, thereby making scholarly\u0000literature more accessible when requested. We began by introducing a corpus\u0000specifically designed for training models to simplify scholarly abstracts. This\u0000corpus consists of over three thousand pairs of abstracts and significance\u0000statements from diverse disciplines. We then fine-tuned four language models\u0000using this corpus. The outputs from the models were subsequently examined both\u0000quantitatively for accessibility and semantic coherence, and qualitatively for\u0000language quality, faithfulness, and completeness. Our findings show that the\u0000resulting models can improve readability by over three grade levels, while\u0000maintaining fidelity to the original content. Although commercial\u0000state-of-the-art models still hold an edge, our models are much more compact,\u0000can be deployed locally in an affordable manner, and alleviate the privacy\u0000concerns associated with using commercial models. We envision this work as a\u0000step toward more inclusive and accessible libraries, improving our services for\u0000young readers and those without a college degree.","PeriodicalId":501285,"journal":{"name":"arXiv - CS - Digital Libraries","volume":"192 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141930692","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ivan Heibi, Arianna Moretti, Silvio Peroni, Marta Soricetti
{"title":"The OpenCitations Index","authors":"Ivan Heibi, Arianna Moretti, Silvio Peroni, Marta Soricetti","doi":"arxiv-2408.02321","DOIUrl":"https://doi.org/arxiv-2408.02321","url":null,"abstract":"This article presents the OpenCitations Index, a collection of open citation\u0000data maintained by OpenCitations, an independent, not-for-profit infrastructure\u0000organisation for open scholarship dedicated to publishing open bibliographic\u0000and citation data using Semantic Web and Linked Open Data technologies. The\u0000collection involves citation data harvested from multiple sources. To address\u0000the possibility of different sources providing citation data for bibliographic\u0000entities represented with different identifiers, therefore potentially\u0000representing same citation, a deduplication mechanism has been implemented.\u0000This ensures that citations integrated into OpenCitations Index are accurately\u0000identified uniquely, even when different identifiers are used. This mechanism\u0000follows a specific workflow, which encompasses a preprocessing of the original\u0000source data, a management of the provided bibliographic metadata, and the\u0000generation of new citation data to be integrated into the OpenCitations Index.\u0000The process relies on another data collection: OpenCitations Meta, and on the\u0000use of a new globally persistent identifier, namely OMID (OpenCitations Meta\u0000Identifier). As of July 2024, OpenCitations Index stores over 2 billion unique\u0000citation links, harvest from Crossref, the National Institute of Heath Open\u0000Citation Collection (NIH-OCC), DataCite, OpenAIRE, and the Japan Link Center\u0000(JaLC). OpenCitations Index can be systematically accessed and queried through\u0000several services, including SPARQL endpoint, REST APIs, and web interfaces.\u0000Additionally, dataset dumps are available for free download and reuse (under\u0000CC0 waiver) in various formats (CSV, N-Triples, and Scholix), including\u0000provenance and change tracking information.","PeriodicalId":501285,"journal":{"name":"arXiv - CS - Digital Libraries","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141930693","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Joshua Morriss, Tod Brindle, Jessica Bah Rösman, Daniel Reibsamen, Andreas Enz
{"title":"The Literature Review Network: An Explainable Artificial Intelligence for Systematic Literature Reviews, Meta-analyses, and Method Development","authors":"Joshua Morriss, Tod Brindle, Jessica Bah Rösman, Daniel Reibsamen, Andreas Enz","doi":"arxiv-2408.05239","DOIUrl":"https://doi.org/arxiv-2408.05239","url":null,"abstract":"Systematic literature reviews are the highest quality of evidence in\u0000research. However, the review process is hindered by significant resource and\u0000data constraints. The Literature Review Network (LRN) is the first of its kind\u0000explainable AI platform adhering to PRISMA 2020 standards, designed to automate\u0000the entire literature review process. LRN was evaluated in the domain of\u0000surgical glove practices using 3 search strings developed by experts to query\u0000PubMed. A non-expert trained all LRN models. Performance was benchmarked\u0000against an expert manual review. Explainability and performance metrics\u0000assessed LRN's ability to replicate the experts' review. Concordance was\u0000measured with the Jaccard index and confusion matrices. Researchers were\u0000blinded to the other's results until study completion. Overlapping studies were\u0000integrated into an LRN-generated systematic review. LRN models demonstrated\u0000superior classification accuracy without expert training, achieving 84.78% and\u000085.71% accuracy. The highest performance model achieved high interrater\u0000reliability (k = 0.4953) and explainability metrics, linking 'reduce',\u0000'accident', and 'sharp' with 'double-gloving'. Another LRN model covered 91.51%\u0000of the relevant literature despite diverging from the non-expert's judgments (k\u0000= 0.2174), with the terms 'latex', 'double' (gloves), and 'indication'. LRN\u0000outperformed the manual review (19,920 minutes over 11 months), reducing the\u0000entire process to 288.6 minutes over 5 days. This study demonstrates that\u0000explainable AI does not require expert training to successfully conduct\u0000PRISMA-compliant systematic literature reviews like an expert. LRN summarized\u0000the results of surgical glove studies and identified themes that were nearly\u0000identical to the clinical researchers' findings. Explainable AI can accurately\u0000expedite our understanding of clinical practices, potentially revolutionizing\u0000healthcare research.","PeriodicalId":501285,"journal":{"name":"arXiv - CS - Digital Libraries","volume":"60 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142219123","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The Unique Citing Documents Journal Impact Factor (Uniq-JIF) as a Supplement for the standard Journal Impact Factor","authors":"Zhesi Shen, Li Li, Yu Liao","doi":"arxiv-2408.08884","DOIUrl":"https://doi.org/arxiv-2408.08884","url":null,"abstract":"This paper introduces the Unique Citing Documents Journal Impact\u0000Factor(Uniq-JIF) as a supplement to the traditional Journal Impact Factor(JIF).\u0000The Uniq-JIF counts each citing document only once, aiming to reduce the\u0000effects of citation manipulations. Analysis of 2023 Journal Citation Reports\u0000data shows that for most journals, the Uniq-JIF is less than 20% lower than the\u0000JIF, though some journals show a drop of over 75%. The Uniq-JIF also highlights\u0000significant reductions for journals suppressed due to citation issues,\u0000indicating its effectiveness in identifying problematic journals. The Uniq-JIF\u0000offers a more nuanced view of a journal's influence and can help reveal\u0000journals needing further scrutiny.","PeriodicalId":501285,"journal":{"name":"arXiv - CS - Digital Libraries","volume":"307 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142219120","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The Artificial Intelligence Disclosure (AID) Framework: An Introduction","authors":"Kari D. Weaver","doi":"arxiv-2408.01904","DOIUrl":"https://doi.org/arxiv-2408.01904","url":null,"abstract":"As the use of Generative Artificial Intelligence tools have grown in higher\u0000education and research, there have been increasing calls for transparency and\u0000granularity around the use and attribution of the use of these tools. Thus far,\u0000this need has been met via the recommended inclusion of a note, with little to\u0000no guidance on what the note itself should include. This has been identified as\u0000a problem to the use of AI in academic and research contexts. This article\u0000introduces The Artificial Intelligence Disclosure (AID) Framework, a standard,\u0000comprehensive, and detailed framework meant to inform the development and\u0000writing of GenAI disclosure for education and research.","PeriodicalId":501285,"journal":{"name":"arXiv - CS - Digital Libraries","volume":"30 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141930694","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Hotspots and Trends in Magnetoencephalography Research (2013-2022): A Bibliometric Analysis","authors":"Shen Liu, Jingwen Zhao","doi":"arxiv-2408.08877","DOIUrl":"https://doi.org/arxiv-2408.08877","url":null,"abstract":"This study aimed to utilize bibliometric methods to analyze trends in\u0000international Magnetoencephalography (MEG) research from 2013 to 2022. Due to\u0000the limited volume of domestic literature on MEG, this analysis focuses solely\u0000on the global research landscape, providing insights from the past decade as a\u0000representative sample. This study utilized bibliometric methods to explore and\u0000analyze the progress, hotspots and developmental trends in international MEG\u0000research spanning from 1995 to 2022. The results indicated a dynamic and steady\u0000growth trend in the overall number of publications in MEG. Ryusuke Kakigi\u0000emerged as the most prolific author, while Neuroimage led as the most prolific\u0000journal. Current hotspots in MEG research encompass resting state, networks,\u0000functional connectivity, phase dynamics, oscillation, and more. Future trends\u0000in MEG research are poised to advance across three key aspects: disease\u0000treatment and practical applications, experimental foundations and technical\u0000advancements, and fundamental and advanced human cognition. In the future,\u0000there should be a focus on enhancing cross-integration and utilization of MEG\u0000with other instruments to diversify research methodologies in this field","PeriodicalId":501285,"journal":{"name":"arXiv - CS - Digital Libraries","volume":"6 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142219121","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Francis Kulumba, Wissam Antoun, Guillaume Vimont, Laurent Romary
{"title":"Harvesting Textual and Structured Data from the HAL Publication Repository","authors":"Francis Kulumba, Wissam Antoun, Guillaume Vimont, Laurent Romary","doi":"arxiv-2407.20595","DOIUrl":"https://doi.org/arxiv-2407.20595","url":null,"abstract":"HAL (Hyper Articles en Ligne) is the French national publication repository,\u0000used by most higher education and research organizations for their open science\u0000policy. As a digital library, it is a rich repository of scholarly documents,\u0000but its potential for advanced research has been underutilized. We present\u0000HALvest, a unique dataset that bridges the gap between citation networks and\u0000the full text of papers submitted on HAL. We craft our dataset by filtering HAL\u0000for scholarly publications, resulting in approximately 700,000 documents,\u0000spanning 34 languages across 13 identified domains, suitable for language model\u0000training, and yielding approximately 16.5 billion tokens (with 8 billion in\u0000French and 7 billion in English, the most represented languages). We transform\u0000the metadata of each paper into a citation network, producing a directed\u0000heterogeneous graph. This graph includes uniquely identified authors on HAL, as\u0000well as all open submitted papers, and their citations. We provide a baseline\u0000for authorship attribution using the dataset, implement a range of\u0000state-of-the-art models in graph representation learning for link prediction,\u0000and discuss the usefulness of our generated knowledge graph structure.","PeriodicalId":501285,"journal":{"name":"arXiv - CS - Digital Libraries","volume":"113 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141868311","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"R-Index: A Metric for Assessing Researcher Contributions to Peer Review","authors":"Milad Malekzadeh","doi":"arxiv-2407.19949","DOIUrl":"https://doi.org/arxiv-2407.19949","url":null,"abstract":"I propose the R-Index, defined as the difference between the sum of review\u0000responsibilities for a researcher's publications and the number of reviews they\u0000have completed, as a novel metric to effectively characterize a researcher's\u0000contribution to the peer review process. This index aims to balance the demands\u0000placed on the peer review system by a researcher's publication output with\u0000their engagement in reviewing others' work, providing a measure of whether they\u0000are giving back to the academic community commensurately with their own\u0000publication demands. The R-Index offers a straightforward and fair approach to\u0000encourage equitable participation in peer review, thereby supporting the\u0000sustainability and efficiency of the scholarly publishing process.","PeriodicalId":501285,"journal":{"name":"arXiv - CS - Digital Libraries","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141868192","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Adilson Vital Jr., Filipi N. Silva, Osvaldo N. Oliveira Jr., Diego R. Amancio
{"title":"Predicting citation impact of research papers using GPT and other text embeddings","authors":"Adilson Vital Jr., Filipi N. Silva, Osvaldo N. Oliveira Jr., Diego R. Amancio","doi":"arxiv-2407.19942","DOIUrl":"https://doi.org/arxiv-2407.19942","url":null,"abstract":"The impact of research papers, typically measured in terms of citation\u0000counts, depends on several factors, including the reputation of the authors,\u0000journals, and institutions, in addition to the quality of the scientific work.\u0000In this paper, we present an approach that combines natural language processing\u0000and machine learning to predict the impact of papers in a specific journal. Our\u0000focus is on the text, which should correlate with impact and the topics covered\u0000in the research. We employed a dataset of over 40,000 articles from ACS Applied\u0000Materials and Interfaces spanning from 2012 to 2022. The data was processed\u0000using various text embedding techniques and classified with supervised machine\u0000learning algorithms. Papers were categorized into the top 20% most cited within\u0000the journal, using both yearly and cumulative citation counts as metrics. Our\u0000analysis reveals that the method employing generative pre-trained transformers\u0000(GPT) was the most efficient for embedding, while the random forest algorithm\u0000exhibited the best predictive power among the machine learning algorithms. An\u0000optimized accuracy of 80% in predicting whether a paper was among the top 20%\u0000most cited was achieved for the cumulative citation count when abstracts were\u0000processed. This accuracy is noteworthy, considering that author, institution,\u0000and early citation pattern information were not taken into account. The\u0000accuracy increased only slightly when the full texts of the papers were\u0000processed. Also significant is the finding that a simpler embedding technique,\u0000term frequency-inverse document frequency (TFIDF), yielded performance close to\u0000that of GPT. Since TFIDF captures the topics of the paper we infer that, apart\u0000from considering author and institution biases, citation counts for the\u0000considered journal may be predicted by identifying topics and \"reading\" the\u0000abstract of a paper.","PeriodicalId":501285,"journal":{"name":"arXiv - CS - Digital Libraries","volume":"50 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141868190","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}