arXiv - CS - Digital Libraries最新文献_第10页

Shacl4Bib: custom validation of library data Shacl4Bib：自定义验证图书馆数据

arXiv - CS - Digital Libraries Pub Date : 2024-05-15 DOI: arxiv-2405.09177

Péter Király

引用次数: 0

Distinguishing articles in questionable and non-questionable journals using quantitative indicators associated with quality 利用与质量相关的量化指标区分有问题和无问题期刊上的文章

arXiv - CS - Digital Libraries Pub Date : 2024-05-10 DOI: arxiv-2405.06308

Dimity Stephen

{"title":"Distinguishing articles in questionable and non-questionable journals using quantitative indicators associated with quality","authors":"Dimity Stephen","doi":"arxiv-2405.06308","DOIUrl":"https://doi.org/arxiv-2405.06308","url":null,"abstract":"This study investigates the viability of distinguishing articles in\u0000questionable journals (QJs) from those in non-QJs on the basis of quantitative\u0000indicators typically associated with quality. Subsequently, I examine what can\u0000be deduced about the quality of articles in QJs based on the differences\u0000observed. I contrast the length of abstracts and full-texts, prevalence of\u0000spelling errors, text readability, number of references and citations, the size\u0000and internationality of the author team, the documentation of ethics and\u0000informed consent statements, and the presence erroneous decisions based on\u0000statistical errors in 1,714 articles from 31 QJs, 1,691 articles from 16\u0000journals indexed in Web of Science (WoS), and 1,900 articles from 45 mid-tier\u0000journals, all in the field of psychology. The results suggest that QJ articles\u0000do diverge from the disciplinary standards set by peer-reviewed journals in\u0000psychology on quantitative indicators of quality that tend to reflect the\u0000effect of peer review and editorial processes. However, mid-tier and WoS\u0000journals are also affected by potential quality concerns, such as\u0000under-reporting of ethics and informed consent processes and the presence of\u0000errors in interpreting statistics. Further research is required to develop a\u0000comprehensive understanding of the quality of articles in QJs.","PeriodicalId":501285,"journal":{"name":"arXiv - CS - Digital Libraries","volume":"131 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140932664","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Can citations tell us about a paper's reproducibility? A case study of machine learning papers 引用能说明论文的可复制性吗？机器学习论文案例研究

arXiv - CS - Digital Libraries Pub Date : 2024-05-07 DOI: arxiv-2405.03977

Rochana R. Obadage, Sarah M. Rajtmajer, Jian Wu

引用次数: 0

NACSOS-nexus: NLP Assisted Classification, Synthesis and Online Screening with New and EXtended Usage Scenarios NACSOS-nexus：使用新的和扩展的使用场景进行 NLP 辅助分类、合成和在线筛选

arXiv - CS - Digital Libraries Pub Date : 2024-05-07 DOI: arxiv-2405.04621

Tim Repke, Max Callaghan

引用次数: 0

Research information in the light of artificial intelligence: quality and data ecologies 人工智能背景下的科研信息：质量与数据生态

arXiv - CS - Digital Libraries Pub Date : 2024-05-06 DOI: arxiv-2405.12997

Otmane Azeroual, Tibor Koltay

{"title":"Research information in the light of artificial intelligence: quality and data ecologies","authors":"Otmane Azeroual, Tibor Koltay","doi":"arxiv-2405.12997","DOIUrl":"https://doi.org/arxiv-2405.12997","url":null,"abstract":"This paper presents multi- and interdisciplinary approaches for finding the\u0000appropriate AI technologies for research information. Professional research\u0000information management (RIM) is becoming increasingly important as an expressly\u0000data-driven tool for researchers. It is not only the basis of scientific\u0000knowledge processes, but also related to other data. A concept and a process\u0000model of the elementary phases from the start of the project to the ongoing\u0000operation of the AI methods in the RIM is presented, portraying the\u0000implementation of an AI project, meant to enable universities and research\u0000institutions to support their researchers in dealing with incorrect and\u0000incomplete research information, while it is being stored in their RIMs. Our\u0000aim is to show how research information harmonizes with the challenges of data\u0000literacy and data quality issues, related to AI, also wanting to underline that\u0000any project can be successful if the research institutions and various\u0000departments of universities, involved work together and appropriate support is\u0000offered to improve research information and data management.","PeriodicalId":501285,"journal":{"name":"arXiv - CS - Digital Libraries","volume":"45 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141149819","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

On the performativity of SDG classifications in large bibliometric databases 论大型文献计量数据库中可持续发展目标分类的可执行性

arXiv - CS - Digital Libraries Pub Date : 2024-05-05 DOI: arxiv-2405.03007

Matteo Ottaviani, Stephan Stahlschmidt

引用次数: 0

Assembling ensembling: An adventure in approaches across disciplines 汇编汇编：跨学科方法探险

arXiv - CS - Digital Libraries Pub Date : 2024-05-04 DOI: arxiv-2405.02599

Amanda Bleichrodt, Lydia Bourouiba, Gerardo Chowell, Eric T. Lofgren, J. Michael Reed, Sadie J. Ryan, Nina H. Fefferman

{"title":"Assembling ensembling: An adventure in approaches across disciplines","authors":"Amanda Bleichrodt, Lydia Bourouiba, Gerardo Chowell, Eric T. Lofgren, J. Michael Reed, Sadie J. Ryan, Nina H. Fefferman","doi":"arxiv-2405.02599","DOIUrl":"https://doi.org/arxiv-2405.02599","url":null,"abstract":"When we think of model ensembling or ensemble modeling, there are many\u0000possibilities that come to mind in different disciplines. For example, one\u0000might think of a set of descriptions of a phenomenon in the world, perhaps a\u0000time series or a snapshot of multivariate space, and perhaps that set is\u0000comprised of data-independent descriptions, or perhaps it is quite\u0000intentionally fit *to* data, or even a suite of data sets with a common theme\u0000or intention. The very meaning of 'ensemble' - a collection together - conjures\u0000different ideas across and even within disciplines approaching phenomena. In\u0000this paper, we present a typology of the scope of these potential perspectives.\u0000It is not our goal to present a review of terms and concepts, nor is it to\u0000convince all disciplines to adopt a common suite of terms, which we view as\u0000futile. Rather, our goal is to disambiguate terms, concepts, and processes\u0000associated with 'ensembles' and 'ensembling' in order to facilitate\u0000communication, awareness, and possible adoption of tools across disciplines.","PeriodicalId":501285,"journal":{"name":"arXiv - CS - Digital Libraries","volume":"15 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140884180","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Workflow for GLAM Metadata Crosswalk GLAM 元数据交叉工作流程

arXiv - CS - Digital Libraries Pub Date : 2024-05-03 DOI: arxiv-2405.02113

Arianna Moretti, Ivan Heibi, Silvio Peroni

{"title":"A Workflow for GLAM Metadata Crosswalk","authors":"Arianna Moretti, Ivan Heibi, Silvio Peroni","doi":"arxiv-2405.02113","DOIUrl":"https://doi.org/arxiv-2405.02113","url":null,"abstract":"The acquisition of physical artifacts not only involves transferring existing\u0000information into the digital ecosystem but also generates information as a\u0000process itself, underscoring the importance of meticulous management of FAIR\u0000data and metadata. In addition, the diversity of objects within the cultural\u0000heritage domain is reflected in a multitude of descriptive models. The\u0000digitization process expands the opportunities for exchange and joint\u0000utilization, granted that the descriptive schemas are made interoperable in\u0000advance. To achieve this goal, we propose a replicable workflow for metadata\u0000schema crosswalks that facilitates the preservation and accessibility of\u0000cultural heritage in the digital ecosystem. This work presents a methodology\u0000for metadata generation and management in the case study of the digital twin of\u0000the temporary exhibition \"The Other Renaissance - Ulisse Aldrovandi and the\u0000Wonders of the World\". The workflow delineates a systematic, step-by-step\u0000transformation of tabular data into RDF format, to enhance Linked Open Data.\u0000The methodology adopts the RDF Mapping Language (RML) technology for converting\u0000data to RDF with a human contribution involvement. This last aspect entails an\u0000interaction between digital humanists and domain experts through surveys\u0000leading to the abstraction and reformulation of domain-specific knowledge, to\u0000be exploited in the process of formalizing and converting information.","PeriodicalId":501285,"journal":{"name":"arXiv - CS - Digital Libraries","volume":"17 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140883899","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Callico: a Versatile Open-Source Document Image Annotation Platform Callico：多功能开源文档图像注释平台

arXiv - CS - Digital Libraries Pub Date : 2024-05-02 DOI: arxiv-2405.01071

Christopher Kermorvant, Eva Bardou, Manon Blanco, Bastien Abadie

{"title":"Callico: a Versatile Open-Source Document Image Annotation Platform","authors":"Christopher Kermorvant, Eva Bardou, Manon Blanco, Bastien Abadie","doi":"arxiv-2405.01071","DOIUrl":"https://doi.org/arxiv-2405.01071","url":null,"abstract":"This paper presents Callico, a web-based open source platform designed to\u0000simplify the annotation process in document recognition projects. The move\u0000towards data-centric AI in machine learning and deep learning underscores the\u0000importance of high-quality data, and the need for specialised tools that\u0000increase the efficiency and effectiveness of generating such data. For document\u0000image annotation, Callico offers dual-display annotation for digitised\u0000documents, enabling simultaneous visualisation and annotation of scanned images\u0000and text. This capability is critical for OCR and HTR model training, document\u0000layout analysis, named entity recognition, form-based key value annotation or\u0000hierarchical structure annotation with element grouping. The platform supports\u0000collaborative annotation with versatile features backed by a commitment to open\u0000source development, high-quality code standards and easy deployment via Docker.\u0000Illustrative use cases - including the transcription of the Belfort municipal\u0000registers, the indexing of French World War II prisoners for the ICRC, and the\u0000extraction of personal information from the Socface project's census lists -\u0000demonstrate Callico's applicability and utility.","PeriodicalId":501285,"journal":{"name":"arXiv - CS - Digital Libraries","volume":"31 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140830790","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Clustering Running Titles to Understand the Printing of Early Modern Books 对流水账进行分组以了解早期现代书籍的印刷情况

arXiv - CS - Digital Libraries Pub Date : 2024-05-01 DOI: arxiv-2405.00752

Nikolai Vogler, Kartik Goyal, Samuel V. Lemley, D. J. Schuldt, Christopher N. Warren, Max G'Sell, Taylor Berg-Kirkpatrick

{"title":"Clustering Running Titles to Understand the Printing of Early Modern Books","authors":"Nikolai Vogler, Kartik Goyal, Samuel V. Lemley, D. J. Schuldt, Christopher N. Warren, Max G'Sell, Taylor Berg-Kirkpatrick","doi":"arxiv-2405.00752","DOIUrl":"https://doi.org/arxiv-2405.00752","url":null,"abstract":"We propose a novel computational approach to automatically analyze the\u0000physical process behind printing of early modern letterpress books via\u0000clustering the running titles found at the top of their pages. Specifically, we\u0000design and compare custom neural and feature-based kernels for computing\u0000pairwise visual similarity of a scanned document's running titles and cluster\u0000the titles in order to track any deviations from the expected pattern of a\u0000book's printing. Unlike body text which must be reset for every page, the\u0000running titles are one of the static type elements in a skeleton forme i.e. the\u0000frame used to print each side of a sheet of paper, and were often re-used\u0000during a book's printing. To evaluate the effectiveness of our approach, we\u0000manually annotate the running title clusters on about 1600 pages across 8 early\u0000modern books of varying size and formats. Our method can detect potential\u0000deviation from the expected patterns of such skeleton formes, which helps\u0000bibliographers understand the phenomena associated with a text's transmission,\u0000such as censorship. We also validate our results against a manual bibliographic\u0000analysis of a counterfeit early edition of Thomas Hobbes' Leviathan (1651).","PeriodicalId":501285,"journal":{"name":"arXiv - CS - Digital Libraries","volume":"12 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140830789","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0