Ivan Heibi, Arianna Moretti, Silvio Peroni, Marta Soricetti
{"title":"OpenCitations 索引","authors":"Ivan Heibi, Arianna Moretti, Silvio Peroni, Marta Soricetti","doi":"arxiv-2408.02321","DOIUrl":null,"url":null,"abstract":"This article presents the OpenCitations Index, a collection of open citation\ndata maintained by OpenCitations, an independent, not-for-profit infrastructure\norganisation for open scholarship dedicated to publishing open bibliographic\nand citation data using Semantic Web and Linked Open Data technologies. The\ncollection involves citation data harvested from multiple sources. To address\nthe possibility of different sources providing citation data for bibliographic\nentities represented with different identifiers, therefore potentially\nrepresenting same citation, a deduplication mechanism has been implemented.\nThis ensures that citations integrated into OpenCitations Index are accurately\nidentified uniquely, even when different identifiers are used. This mechanism\nfollows a specific workflow, which encompasses a preprocessing of the original\nsource data, a management of the provided bibliographic metadata, and the\ngeneration of new citation data to be integrated into the OpenCitations Index.\nThe process relies on another data collection: OpenCitations Meta, and on the\nuse of a new globally persistent identifier, namely OMID (OpenCitations Meta\nIdentifier). As of July 2024, OpenCitations Index stores over 2 billion unique\ncitation links, harvest from Crossref, the National Institute of Heath Open\nCitation Collection (NIH-OCC), DataCite, OpenAIRE, and the Japan Link Center\n(JaLC). OpenCitations Index can be systematically accessed and queried through\nseveral services, including SPARQL endpoint, REST APIs, and web interfaces.\nAdditionally, dataset dumps are available for free download and reuse (under\nCC0 waiver) in various formats (CSV, N-Triples, and Scholix), including\nprovenance and change tracking information.","PeriodicalId":501285,"journal":{"name":"arXiv - CS - Digital Libraries","volume":"1 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"The OpenCitations Index\",\"authors\":\"Ivan Heibi, Arianna Moretti, Silvio Peroni, Marta Soricetti\",\"doi\":\"arxiv-2408.02321\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This article presents the OpenCitations Index, a collection of open citation\\ndata maintained by OpenCitations, an independent, not-for-profit infrastructure\\norganisation for open scholarship dedicated to publishing open bibliographic\\nand citation data using Semantic Web and Linked Open Data technologies. The\\ncollection involves citation data harvested from multiple sources. To address\\nthe possibility of different sources providing citation data for bibliographic\\nentities represented with different identifiers, therefore potentially\\nrepresenting same citation, a deduplication mechanism has been implemented.\\nThis ensures that citations integrated into OpenCitations Index are accurately\\nidentified uniquely, even when different identifiers are used. This mechanism\\nfollows a specific workflow, which encompasses a preprocessing of the original\\nsource data, a management of the provided bibliographic metadata, and the\\ngeneration of new citation data to be integrated into the OpenCitations Index.\\nThe process relies on another data collection: OpenCitations Meta, and on the\\nuse of a new globally persistent identifier, namely OMID (OpenCitations Meta\\nIdentifier). As of July 2024, OpenCitations Index stores over 2 billion unique\\ncitation links, harvest from Crossref, the National Institute of Heath Open\\nCitation Collection (NIH-OCC), DataCite, OpenAIRE, and the Japan Link Center\\n(JaLC). OpenCitations Index can be systematically accessed and queried through\\nseveral services, including SPARQL endpoint, REST APIs, and web interfaces.\\nAdditionally, dataset dumps are available for free download and reuse (under\\nCC0 waiver) in various formats (CSV, N-Triples, and Scholix), including\\nprovenance and change tracking information.\",\"PeriodicalId\":501285,\"journal\":{\"name\":\"arXiv - CS - Digital Libraries\",\"volume\":\"1 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-08-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Digital Libraries\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2408.02321\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Digital Libraries","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.02321","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
本文介绍了OpenCitations索引,这是一个由OpenCitations维护的开放引文数据集。OpenCitations是一个独立的非营利性开放学术基础设施组织,致力于利用语义网(Semantic Web)和关联开放数据(Linked Open Data)技术发布开放书目和引文数据。该文集涉及从多个来源获取的引文数据。为了解决不同来源为使用不同标识符表示的书目实体提供引文数据,从而可能代表相同引文的问题,我们实施了重复数据删除机制。该机制遵循一个特定的工作流程,其中包括对原始源数据的预处理、对所提供书目元数据的管理,以及生成新的引文数据以集成到 OpenCitations 索引中:该过程依赖于另一个数据收集:OpenCitations Meta,以及使用一个新的全球持久标识符,即 OMID(OpenCitations MetaIdentifier)。截至 2024 年 7 月,OpenCitations 索引存储了超过 20 亿条唯一引用链接,这些链接来自 Crossref、美国国立卫生研究院开放引文集(NIH-OCC)、DataCite、OpenAIRE 和日本链接中心(JaLC)。OpenCitations Index 可通过 SPARQL 端点、REST API 和 Web 界面等多种服务进行系统访问和查询。此外,数据集转储可通过各种格式(CSV、N-Triples 和 Scholix)免费下载和重复使用(根据CC0 豁免),包括证明和变更跟踪信息。
This article presents the OpenCitations Index, a collection of open citation
data maintained by OpenCitations, an independent, not-for-profit infrastructure
organisation for open scholarship dedicated to publishing open bibliographic
and citation data using Semantic Web and Linked Open Data technologies. The
collection involves citation data harvested from multiple sources. To address
the possibility of different sources providing citation data for bibliographic
entities represented with different identifiers, therefore potentially
representing same citation, a deduplication mechanism has been implemented.
This ensures that citations integrated into OpenCitations Index are accurately
identified uniquely, even when different identifiers are used. This mechanism
follows a specific workflow, which encompasses a preprocessing of the original
source data, a management of the provided bibliographic metadata, and the
generation of new citation data to be integrated into the OpenCitations Index.
The process relies on another data collection: OpenCitations Meta, and on the
use of a new globally persistent identifier, namely OMID (OpenCitations Meta
Identifier). As of July 2024, OpenCitations Index stores over 2 billion unique
citation links, harvest from Crossref, the National Institute of Heath Open
Citation Collection (NIH-OCC), DataCite, OpenAIRE, and the Japan Link Center
(JaLC). OpenCitations Index can be systematically accessed and queried through
several services, including SPARQL endpoint, REST APIs, and web interfaces.
Additionally, dataset dumps are available for free download and reuse (under
CC0 waiver) in various formats (CSV, N-Triples, and Scholix), including
provenance and change tracking information.