生物医学文献中命名实体共现的资源描述框架（RDF）模型及其与PubChemRDF的集成

IF 5.7 2区化学 Q1 CHEMISTRY, MULTIDISCIPLINARY

Journal of Cheminformatics Pub Date : 2025-05-21 DOI:10.1186/s13321-025-01017-0

Qingliang Li, Sunghwan Kim, Leonid Zaslavsky, Tiejun Cheng, Bo Yu, Evan E. Bolton

{"title":"生物医学文献中命名实体共现的资源描述框架（RDF）模型及其与PubChemRDF的集成","authors":"Qingliang Li, Sunghwan Kim, Leonid Zaslavsky, Tiejun Cheng, Bo Yu, Evan E. Bolton","doi":"10.1186/s13321-025-01017-0","DOIUrl":null,"url":null,"abstract":"<div><p>Named entities, such as chemicals/drugs, genes/proteins, and diseases, and their associations are not only important components of biomedical literature, but also the foundation of creating biomedical knowledgebases and knowledge graphs. This work addresses the challenges of expressing co-occurrence associations between named entities extracted from a biomedical literature corpus in a machine-readable format. We developed a Resource Description Framework (RDF) data model and integrated it into the PubChemRDF resource, which is freely accessible and publicly available. The developed co-occurrence data model was populated into a triplestore with named entities and their associations derived from text mining of millions of biomedical references found in PubMed. The utility of the data model was demonstrated through multiple use cases. Together with meta-data modeling of the references including the information about the author, journal, grant, and funding agency, this data model allows researchers to address pertinent biomedical questions through SPARQL queries and helps to exploit biomedical knowledge in various user perspectives and use cases.</p></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"17 1","pages":""},"PeriodicalIF":5.7000,"publicationDate":"2025-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-025-01017-0","citationCount":"0","resultStr":"{\"title\":\"A resource description framework (RDF) model of named entity co-occurrences in biomedical literature and its integration with PubChemRDF\",\"authors\":\"Qingliang Li, Sunghwan Kim, Leonid Zaslavsky, Tiejun Cheng, Bo Yu, Evan E. Bolton\",\"doi\":\"10.1186/s13321-025-01017-0\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Named entities, such as chemicals/drugs, genes/proteins, and diseases, and their associations are not only important components of biomedical literature, but also the foundation of creating biomedical knowledgebases and knowledge graphs. This work addresses the challenges of expressing co-occurrence associations between named entities extracted from a biomedical literature corpus in a machine-readable format. We developed a Resource Description Framework (RDF) data model and integrated it into the PubChemRDF resource, which is freely accessible and publicly available. The developed co-occurrence data model was populated into a triplestore with named entities and their associations derived from text mining of millions of biomedical references found in PubMed. The utility of the data model was demonstrated through multiple use cases. Together with meta-data modeling of the references including the information about the author, journal, grant, and funding agency, this data model allows researchers to address pertinent biomedical questions through SPARQL queries and helps to exploit biomedical knowledge in various user perspectives and use cases.</p></div>\",\"PeriodicalId\":617,\"journal\":{\"name\":\"Journal of Cheminformatics\",\"volume\":\"17 1\",\"pages\":\"\"},\"PeriodicalIF\":5.7000,\"publicationDate\":\"2025-05-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-025-01017-0\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Cheminformatics\",\"FirstCategoryId\":\"92\",\"ListUrlMain\":\"https://link.springer.com/article/10.1186/s13321-025-01017-0\",\"RegionNum\":2,\"RegionCategory\":\"化学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"CHEMISTRY, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Cheminformatics","FirstCategoryId":"92","ListUrlMain":"https://link.springer.com/article/10.1186/s13321-025-01017-0","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}

引用次数: 0

摘要

命名实体（如化学物质/药物、基因/蛋白质和疾病）及其关联不仅是生物医学文献的重要组成部分，也是创建生物医学知识库和知识图谱的基础。这项工作解决了以机器可读格式从生物医学文献语料库中提取的命名实体之间表达共现关联的挑战。我们开发了一个资源描述框架（Resource Description Framework， RDF）数据模型，并将其集成到PubChemRDF资源中，该资源可以免费访问并公开可用。开发的共现数据模型被填充到一个triplestore中，其中包含命名实体及其关联，这些实体来自PubMed中发现的数百万个生物医学参考文献的文本挖掘。通过多个用例演示了数据模型的实用性。与参考文献的元数据建模（包括作者、期刊、授权和资助机构的信息）一起，该数据模型允许研究人员通过SPARQL查询解决相关的生物医学问题，并有助于从不同的用户角度和用例中利用生物医学知识。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A resource description framework (RDF) model of named entity co-occurrences in biomedical literature and its integration with PubChemRDF

Named entities, such as chemicals/drugs, genes/proteins, and diseases, and their associations are not only important components of biomedical literature, but also the foundation of creating biomedical knowledgebases and knowledge graphs. This work addresses the challenges of expressing co-occurrence associations between named entities extracted from a biomedical literature corpus in a machine-readable format. We developed a Resource Description Framework (RDF) data model and integrated it into the PubChemRDF resource, which is freely accessible and publicly available. The developed co-occurrence data model was populated into a triplestore with named entities and their associations derived from text mining of millions of biomedical references found in PubMed. The utility of the data model was demonstrated through multiple use cases. Together with meta-data modeling of the references including the information about the author, journal, grant, and funding agency, this data model allows researchers to address pertinent biomedical questions through SPARQL queries and helps to exploit biomedical knowledge in various user perspectives and use cases.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Cheminformatics CHEMISTRY, MULTIDISCIPLINARY-COMPUTER SCIENCE, INFORMATION SYSTEMS

CiteScore

14.10

自引率

7.00%

发文量

审稿时长

3 months

期刊介绍： Journal of Cheminformatics is an open access journal publishing original peer-reviewed research in all aspects of cheminformatics and molecular modelling. Coverage includes, but is not limited to: chemical information systems, software and databases, and molecular modelling, chemical structure representations and their use in structure, substructure, and similarity searching of chemical substance and chemical reaction databases, computer and molecular graphics, computer-aided molecular design, expert systems, QSAR, and data mining techniques.