BASIL DB: bioactive semantic integration and linking database.

IF 2 3区 工程技术 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY
David Jackson, Paul Groth, Hazar Harmouch
{"title":"BASIL DB: bioactive semantic integration and linking database.","authors":"David Jackson, Paul Groth, Hazar Harmouch","doi":"10.1186/s13326-025-00336-3","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Bioactive compounds found in foods and plants can provide health benefits, including antioxidant and anti-inflammatory effects. Research into their role in disease prevention and personalized nutrition is expanding, but challenges such as data complexity, inconsistent methods, and the rapid growth of scientific literature can hinder progress. To address these issues, we developed BASIL DB (BioActive Semantic Integration and Linking Database), a knowledge graph (KG) database that leverages natural language processing (NLP) techniques to streamline data organization and analysis. This automated approach offers greater scalability and comprehensiveness than traditional methods such as manual data curation and entry.</p><p><strong>Construction and content: </strong>The process of constructing the BASIL DB is divided into four fundamental steps: data collection, data preprocessing, data extraction, and data integration. Data on bioactives and foods are sourced from structured databases. The relevant randomized controlled trials (RCTs) were extracted from PubMed. The data are then prepared by cleaning inconsistencies and structuring them for analysis. In the data extraction phase, NLP tools, including a large language model (LLM), are utilized to analyze clinical trials and extract data on bioactive compounds and their health impacts. The integration phase compiles these data into a knowledge graph, which consists of the entities Foods, Bioactives, and Health Conditions as nodes and their interactions as edges. To quantify the relationships/interactions between these entities, we generate a weight for each edge on the basis of empirical evidence and methodological rigor.</p><p><strong>Utility and discussion: </strong>The BASIL DB incorporates 433 compounds, 40296 research papers, 7256 health effects, and 4197 food items. The database features query and visualization capabilities, including interactive graphs and custom filtering options, that showcase different aspects of the data. Users are able to explore the relationships between bioactives and health effects, enhancing both research efficiency and insight discovery.</p><p><strong>Conclusion: </strong>The BASIL DB is a knowledge graph database of bioactive compounds. This study provides a structured resource for exploring the relationships among bioactives, foods, and health outcomes, representing a step toward a more systematic and data-driven approach to understanding the health effects of bioactive compounds. Future work will focus on expanding the database and refining the utilized methods. Extending the BASIL DB will help bridge the gap between traditional and conventional approaches to nutrition, guiding future research in bioactive compound discovery and health optimization.</p><p><strong>Availability: </strong>Users can access and explore the data via https://basil-db.github.io/info.html or fork and run the respective script via https://github.com/basil-db/script .</p>","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":"16 1","pages":"14"},"PeriodicalIF":2.0000,"publicationDate":"2025-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12351831/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Biomedical Semantics","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1186/s13326-025-00336-3","RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Bioactive compounds found in foods and plants can provide health benefits, including antioxidant and anti-inflammatory effects. Research into their role in disease prevention and personalized nutrition is expanding, but challenges such as data complexity, inconsistent methods, and the rapid growth of scientific literature can hinder progress. To address these issues, we developed BASIL DB (BioActive Semantic Integration and Linking Database), a knowledge graph (KG) database that leverages natural language processing (NLP) techniques to streamline data organization and analysis. This automated approach offers greater scalability and comprehensiveness than traditional methods such as manual data curation and entry.

Construction and content: The process of constructing the BASIL DB is divided into four fundamental steps: data collection, data preprocessing, data extraction, and data integration. Data on bioactives and foods are sourced from structured databases. The relevant randomized controlled trials (RCTs) were extracted from PubMed. The data are then prepared by cleaning inconsistencies and structuring them for analysis. In the data extraction phase, NLP tools, including a large language model (LLM), are utilized to analyze clinical trials and extract data on bioactive compounds and their health impacts. The integration phase compiles these data into a knowledge graph, which consists of the entities Foods, Bioactives, and Health Conditions as nodes and their interactions as edges. To quantify the relationships/interactions between these entities, we generate a weight for each edge on the basis of empirical evidence and methodological rigor.

Utility and discussion: The BASIL DB incorporates 433 compounds, 40296 research papers, 7256 health effects, and 4197 food items. The database features query and visualization capabilities, including interactive graphs and custom filtering options, that showcase different aspects of the data. Users are able to explore the relationships between bioactives and health effects, enhancing both research efficiency and insight discovery.

Conclusion: The BASIL DB is a knowledge graph database of bioactive compounds. This study provides a structured resource for exploring the relationships among bioactives, foods, and health outcomes, representing a step toward a more systematic and data-driven approach to understanding the health effects of bioactive compounds. Future work will focus on expanding the database and refining the utilized methods. Extending the BASIL DB will help bridge the gap between traditional and conventional approaches to nutrition, guiding future research in bioactive compound discovery and health optimization.

Availability: Users can access and explore the data via https://basil-db.github.io/info.html or fork and run the respective script via https://github.com/basil-db/script .

Abstract Image

Abstract Image

Abstract Image

BASIL DB:生物活性语义整合和链接数据库。
背景:在食物和植物中发现的生物活性化合物可以提供健康益处,包括抗氧化和抗炎作用。对它们在疾病预防和个性化营养中的作用的研究正在扩大,但数据复杂性、方法不一致以及科学文献的快速增长等挑战可能阻碍进展。为了解决这些问题,我们开发了BASIL DB(生物活性语义集成和链接数据库),这是一个知识图谱(KG)数据库,利用自然语言处理(NLP)技术来简化数据组织和分析。与手动数据管理和输入等传统方法相比,这种自动化方法提供了更大的可伸缩性和全面性。构建和内容:BASIL数据库的构建过程分为数据采集、数据预处理、数据提取和数据集成四个基本步骤。生物活性物质和食品的数据来源于结构化数据库。相关随机对照试验(rct)摘自PubMed。然后,通过清理不一致的数据并对其进行结构化以供分析来准备数据。在数据提取阶段,包括大型语言模型(LLM)在内的NLP工具被用于分析临床试验和提取生物活性化合物及其健康影响的数据。集成阶段将这些数据汇编成一个知识图,该知识图由实体“食品”、“生物活性”和“健康状况”组成,这些实体作为节点,它们之间的相互作用作为边缘。为了量化这些实体之间的关系/相互作用,我们根据经验证据和方法的严谨性为每个边缘生成权重。应用和讨论:BASIL数据库包含433种化合物,40296篇研究论文,7256种健康效应和4197种食品。该数据库具有查询和可视化功能,包括交互式图形和自定义过滤选项,可以显示数据的不同方面。用户能够探索生物活性物质与健康效应之间的关系,从而提高研究效率和洞察发现。结论:BASIL数据库是一个生物活性化合物知识图谱数据库。这项研究为探索生物活性物质、食物和健康结果之间的关系提供了一个结构化的资源,代表着朝着更系统和数据驱动的方法来理解生物活性物质对健康的影响迈出了一步。今后的工作将集中于扩大数据库和改进所使用的方法。扩展BASIL DB将有助于弥合传统和传统营养方法之间的差距,指导未来生物活性化合物的发现和健康优化研究。可用性:用户可以通过https://basil-db.github.io/info.html或fork访问和探索数据,并通过https://github.com/basil-db/script运行相应的脚本。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Journal of Biomedical Semantics
Journal of Biomedical Semantics MATHEMATICAL & COMPUTATIONAL BIOLOGY-
CiteScore
4.20
自引率
5.30%
发文量
28
审稿时长
30 weeks
期刊介绍: Journal of Biomedical Semantics addresses issues of semantic enrichment and semantic processing in the biomedical domain. The scope of the journal covers two main areas: Infrastructure for biomedical semantics: focusing on semantic resources and repositories, meta-data management and resource description, knowledge representation and semantic frameworks, the Biomedical Semantic Web, and semantic interoperability. Semantic mining, annotation, and analysis: focusing on approaches and applications of semantic resources; and tools for investigation, reasoning, prediction, and discoveries in biomedicine.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信