Qingliang Li, Sunghwan Kim, Leonid Zaslavsky, Tiejun Cheng, Bo Yu, Evan E. Bolton
{"title":"生物医学文献中命名实体共现的资源描述框架(RDF)模型及其与PubChemRDF的集成","authors":"Qingliang Li, Sunghwan Kim, Leonid Zaslavsky, Tiejun Cheng, Bo Yu, Evan E. Bolton","doi":"10.1186/s13321-025-01017-0","DOIUrl":null,"url":null,"abstract":"<div><p>Named entities, such as chemicals/drugs, genes/proteins, and diseases, and their associations are not only important components of biomedical literature, but also the foundation of creating biomedical knowledgebases and knowledge graphs. This work addresses the challenges of expressing co-occurrence associations between named entities extracted from a biomedical literature corpus in a machine-readable format. We developed a Resource Description Framework (RDF) data model and integrated it into the PubChemRDF resource, which is freely accessible and publicly available. The developed co-occurrence data model was populated into a triplestore with named entities and their associations derived from text mining of millions of biomedical references found in PubMed. The utility of the data model was demonstrated through multiple use cases. Together with meta-data modeling of the references including the information about the author, journal, grant, and funding agency, this data model allows researchers to address pertinent biomedical questions through SPARQL queries and helps to exploit biomedical knowledge in various user perspectives and use cases.</p></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"17 1","pages":""},"PeriodicalIF":5.7000,"publicationDate":"2025-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-025-01017-0","citationCount":"0","resultStr":"{\"title\":\"A resource description framework (RDF) model of named entity co-occurrences in biomedical literature and its integration with PubChemRDF\",\"authors\":\"Qingliang Li, Sunghwan Kim, Leonid Zaslavsky, Tiejun Cheng, Bo Yu, Evan E. Bolton\",\"doi\":\"10.1186/s13321-025-01017-0\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Named entities, such as chemicals/drugs, genes/proteins, and diseases, and their associations are not only important components of biomedical literature, but also the foundation of creating biomedical knowledgebases and knowledge graphs. This work addresses the challenges of expressing co-occurrence associations between named entities extracted from a biomedical literature corpus in a machine-readable format. We developed a Resource Description Framework (RDF) data model and integrated it into the PubChemRDF resource, which is freely accessible and publicly available. The developed co-occurrence data model was populated into a triplestore with named entities and their associations derived from text mining of millions of biomedical references found in PubMed. The utility of the data model was demonstrated through multiple use cases. Together with meta-data modeling of the references including the information about the author, journal, grant, and funding agency, this data model allows researchers to address pertinent biomedical questions through SPARQL queries and helps to exploit biomedical knowledge in various user perspectives and use cases.</p></div>\",\"PeriodicalId\":617,\"journal\":{\"name\":\"Journal of Cheminformatics\",\"volume\":\"17 1\",\"pages\":\"\"},\"PeriodicalIF\":5.7000,\"publicationDate\":\"2025-05-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-025-01017-0\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Cheminformatics\",\"FirstCategoryId\":\"92\",\"ListUrlMain\":\"https://link.springer.com/article/10.1186/s13321-025-01017-0\",\"RegionNum\":2,\"RegionCategory\":\"化学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"CHEMISTRY, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Cheminformatics","FirstCategoryId":"92","ListUrlMain":"https://link.springer.com/article/10.1186/s13321-025-01017-0","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}
A resource description framework (RDF) model of named entity co-occurrences in biomedical literature and its integration with PubChemRDF
Named entities, such as chemicals/drugs, genes/proteins, and diseases, and their associations are not only important components of biomedical literature, but also the foundation of creating biomedical knowledgebases and knowledge graphs. This work addresses the challenges of expressing co-occurrence associations between named entities extracted from a biomedical literature corpus in a machine-readable format. We developed a Resource Description Framework (RDF) data model and integrated it into the PubChemRDF resource, which is freely accessible and publicly available. The developed co-occurrence data model was populated into a triplestore with named entities and their associations derived from text mining of millions of biomedical references found in PubMed. The utility of the data model was demonstrated through multiple use cases. Together with meta-data modeling of the references including the information about the author, journal, grant, and funding agency, this data model allows researchers to address pertinent biomedical questions through SPARQL queries and helps to exploit biomedical knowledge in various user perspectives and use cases.
期刊介绍:
Journal of Cheminformatics is an open access journal publishing original peer-reviewed research in all aspects of cheminformatics and molecular modelling.
Coverage includes, but is not limited to:
chemical information systems, software and databases, and molecular modelling,
chemical structure representations and their use in structure, substructure, and similarity searching of chemical substance and chemical reaction databases,
computer and molecular graphics, computer-aided molecular design, expert systems, QSAR, and data mining techniques.