Database: The Journal of Biological Databases and Curation最新文献

筛选
英文 中文
LitSumm: large language models for literature summarization of noncoding RNAs.
IF 3.4 4区 生物学
Database: The Journal of Biological Databases and Curation Pub Date : 2025-02-05 DOI: 10.1093/database/baaf006
Andrew Green, Carlos Eduardo Ribas, Nancy Ontiveros-Palacios, Sam Griffiths-Jones, Anton I Petrov, Alex Bateman, Blake Sweeney
{"title":"LitSumm: large language models for literature summarization of noncoding RNAs.","authors":"Andrew Green, Carlos Eduardo Ribas, Nancy Ontiveros-Palacios, Sam Griffiths-Jones, Anton I Petrov, Alex Bateman, Blake Sweeney","doi":"10.1093/database/baaf006","DOIUrl":"https://doi.org/10.1093/database/baaf006","url":null,"abstract":"<p><p>Curation of literature in life sciences is a growing challenge. The continued increase in the rate of publication, coupled with the relatively fixed number of curators worldwide, presents a major challenge to developers of biomedical knowledgebases. Very few knowledgebases have resources to scale to the whole relevant literature and all have to prioritize their efforts. In this work, we take a first step to alleviating the lack of curator time in RNA science by generating summaries of literature for noncoding RNAs using large language models (LLMs). We demonstrate that high-quality, factually accurate summaries with accurate references can be automatically generated from the literature using a commercial LLM and a chain of prompts and checks. Manual assessment was carried out for a subset of summaries, with the majority being rated extremely high quality. We apply our tool to a selection of >4600 ncRNAs and make the generated summaries available via the RNAcentral resource. We conclude that automated literature summarization is feasible with the current generation of LLMs, provided that careful prompting and automated checking are applied. Database URL: https://rnacentral.org/.</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2025 ","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143254947","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Expression of Concern: DisGeNet: a disease-centric interaction database among diseases and various associated genes.
IF 3.4 4区 生物学
Database: The Journal of Biological Databases and Curation Pub Date : 2025-01-31 DOI: 10.1093/database/baaf007
{"title":"Expression of Concern: DisGeNet: a disease-centric interaction database among diseases and various associated genes.","authors":"","doi":"10.1093/database/baaf007","DOIUrl":"10.1093/database/baaf007","url":null,"abstract":"","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2025 ","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11784583/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143070758","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Helping authors produce FAIR taxonomic data: evaluation of an author-driven phenotype data production prototype.
IF 3.4 4区 生物学
Database: The Journal of Biological Databases and Curation Pub Date : 2025-01-29 DOI: 10.1093/database/baae097
Limin Zhang, Julian Starr, Bruce Ford, Anton Reznicek, Yuxuan Zhou, Étienne Léveillé-Bourret, Étienne Lacroix-Carignan, Jacques Cayouette, Tyler W Smith, Donald Sutherland, Paul Catling, Jeffery M Saarela, Hong Cui, James Macklin
{"title":"Helping authors produce FAIR taxonomic data: evaluation of an author-driven phenotype data production prototype.","authors":"Limin Zhang, Julian Starr, Bruce Ford, Anton Reznicek, Yuxuan Zhou, Étienne Léveillé-Bourret, Étienne Lacroix-Carignan, Jacques Cayouette, Tyler W Smith, Donald Sutherland, Paul Catling, Jeffery M Saarela, Hong Cui, James Macklin","doi":"10.1093/database/baae097","DOIUrl":"https://doi.org/10.1093/database/baae097","url":null,"abstract":"<p><p>It is well-known that the use of vocabulary in phenotype treatments is often inconsistent. An earlier survey of biologists who create or use phenotypic characters revealed that this lack of standardization leads to ambiguities, frustrating both the consumers and producers of phenotypic data. Such ambiguities are challenging for biologists, and more so for Artificial Intelligence, to resolve. That survey also indicated a strong interest in a new authoring workflow supported by ontologies to ensure published phenotype data are FAIR (Findable, Accessible, Interoperable, and Reusable) and suitable for large-scale computational analyses. In this article, we introduce a prototype software system designed for authors to produce computational phenotype data. This platform includes a web-based, ontology-enhanced editor for taxonomic characters (Character Recorder), an Ontology Backend holding standardized vocabulary (the Cared Ontology), and a mobile application for resolving ontological conflicts (Conflict Resolver). We present two formal user evaluations of Character Recorder, the main interface authors would interact with to produce FAIR data. The evaluations were conducted with undergraduate biology students and Carex experts. We evaluated Character Recorder against Microsoft Excel on their effectiveness, efficiency, and the cognitive demands of the users in producing computable taxon-by-character matrices. The evaluations showed that Character Recorder is quickly learnable for both student and professional participants, with its cognitive demand comparable to Excel's. Participants agreed that the quality of the data Character Recorder yielded was superior. Students praised Character Recorder's educational value, while Carex experts were keen to recommend it and help evolve it from a prototype into a comprehensive tool. Feature improvements recommended by expert participants have been implemented after the evaluation.</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2025 ","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143064244","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
BbGSD: Black-boned Sheep Genome SNP Database.
IF 3.4 4区 生物学
Database: The Journal of Biological Databases and Curation Pub Date : 2025-01-28 DOI: 10.1093/database/baaf004
Chunjuan He, Lichang Chen, Juntao Cao, Yuqing Zhong, Zhendong Gao, Weidong Deng, Jiajin Zhang
{"title":"BbGSD: Black-boned Sheep Genome SNP Database.","authors":"Chunjuan He, Lichang Chen, Juntao Cao, Yuqing Zhong, Zhendong Gao, Weidong Deng, Jiajin Zhang","doi":"10.1093/database/baaf004","DOIUrl":"10.1093/database/baaf004","url":null,"abstract":"<p><p>Lanping black-boned (LPBB) sheep are a unique and rare ruminant species, characterized by black pigmentation in the skin and internal organs. Thus far, LPBB are the only known animal with heritable melanin characteristics besides the black-boned chicken, and the only mammal known to contain a large amount of melanin in the body. LPBB have therefore attracted substantial research attention, due to their potential contribution to medicine. However, long periods of grazing freely and crossbreeding with Lanping normal sheep (LPN) have diluted LPBB breeding resources, posing a challenge to the protection of species. To ensure the effective conservation and management of LPBB genetic resources, the construction of a large-scale database of genotypic information is therefore very important. To achieve this, we established the first LPBB-specific SNP database, named Black-boned Sheep Genome SNP Database (BbGSD, http://202.203.179.115:3838/oarsnpdb) using sheep genotype data (100 LPBB and 50 LPN) across 46 894 242 SNP sites. In this database, we implemented four main function modules: (i) the \"LD heatmap\" module, which uses a heatmap to enable the interactive visualization of pairwise linkage disequilibrium (LD) measurements between SNPs; (ii) the \"SNP distribution\" module, which allows users to interactively visualize tabular genotype data as heat maps; (iii) the \"Phylogenetics\" module which enables phylogenetic analysis to explore the evolutionary history or genetic relationships of the LPBB sheep; and the \"Diversity\" module, which can be used to calculate and display the nucleotide diversity among sheep populations in user-specified genomic regions. BbGSD is essential for accelerating studies on the functional genomics and screening of molecular markers of molecular-assisted breeding in black-boned sheep. Database URL: http://202.203.179.115:3838/oarsnpdb.</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2025 ","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11774206/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143058345","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The TOXIN knowledge graph: supporting animal-free risk assessment of cosmetics.
IF 3.4 4区 生物学
Database: The Journal of Biological Databases and Curation Pub Date : 2025-01-28 DOI: 10.1093/database/baae121
Sara Sepehri, Anja Heymans, Dinja De Win, Jan Maushagen, Audrey Sanctorum, Christophe Debruyne, Robim M Rodrigues, Joery De Kock, Vera Rogiers, Olga De Troyer, Tamara Vanhaecke
{"title":"The TOXIN knowledge graph: supporting animal-free risk assessment of cosmetics.","authors":"Sara Sepehri, Anja Heymans, Dinja De Win, Jan Maushagen, Audrey Sanctorum, Christophe Debruyne, Robim M Rodrigues, Joery De Kock, Vera Rogiers, Olga De Troyer, Tamara Vanhaecke","doi":"10.1093/database/baae121","DOIUrl":"10.1093/database/baae121","url":null,"abstract":"<p><p>The European Union's ban on animal testing for cosmetic products and their ingredients, combined with the lack of validated animal-free methods, poses challenges in evaluating their potential repeated-dose organ toxicity. To address this, innovative strategies like Next-Generation Risk Assessment (NGRA) are being explored, integrating historical animal data with new mechanistic insights from non-animal New Approach Methodologies (NAMs). This paper introduces the TOXIN knowledge graph (TOXIN KG), a tool designed to retrieve toxicological information on cosmetic ingredients, with a focus on liver-related data. TOXIN KG uses graph-structured semantic technology and integrates toxicological data through ontologies, ensuring interoperable representation. The primary data source is safety information on cosmetic ingredients from scientific opinions issued by the Scientific Committee on Consumer Safety between 2009 and 2019. The ToxRTool automates the reliability assessment of toxicity studies, while the Simplified Molecular Input Line Entry System (SMILES) notation standardizes chemical identification, enabling in silico prediction of repeated-dose toxicity via the implementation of the Organization for Economic Co-operation and Development Quantitative Structure-Activity Relationship Toolbox (OECD QSAR Toolbox). The ToXic Process Ontology, enriched with relevant biological repositories, is employed to represent toxicological concepts systematically. Search filters allow the identification of cosmetic compounds potentially linked to liver toxicity. Data visualization is achieved through Ontodia, a JavaScript library. TOXIN KG, filled with information for 88 cosmetic ingredients, allowed us to identify 53 compounds affecting at least one liver toxicity parameter in a 90-day repeated-dose animal study. For one compound, we illustrate how TOXIN KG links this observation to hepatic cholestasis as an adverse outcome. In an ab initio NGRA context, follow-up in vitro studies using human-based NAMs would be necessary to understand the compound's biological activity and the molecular mechanism leading to the adverse effect. In summary, TOXIN KG emerges as a valuable tool for advancing the reusability of cosmetics safety data, providing knowledge in support of NAM-based hazard and risk assessments. Database URL: https://toxin-search.netlify.app/.</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2025 ","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11776536/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143064250","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
BuffExDb: web-based tissue-specific gene expression resource for breeding and conservation programmes in Bubalus bubalis.
IF 3.4 4区 生物学
Database: The Journal of Biological Databases and Curation Pub Date : 2025-01-24 DOI: 10.1093/database/baae128
Naina Kumari, Samir Kumar, Anupama Roy, Princy Saini, Sarika Jaiswal, Mir Asif Iquebal, Ulavappa B Angadi, Dinesh Kumar
{"title":"BuffExDb: web-based tissue-specific gene expression resource for breeding and conservation programmes in Bubalus bubalis.","authors":"Naina Kumari, Samir Kumar, Anupama Roy, Princy Saini, Sarika Jaiswal, Mir Asif Iquebal, Ulavappa B Angadi, Dinesh Kumar","doi":"10.1093/database/baae128","DOIUrl":"10.1093/database/baae128","url":null,"abstract":"<p><p>Amidst the global challenge of extreme poverty, the livestock sector can significantly contribute to global sustainable development goals by enhancing resilience, smallholder productivity, and market participation. The Indian livestock sector is one of the largest in the world with a total livestock population of 535.82 million, ∼10.7% of the world's livestock population. Buffalo (Bubalus bubalis) holds significant importance in India and other Asian countries, notably contributing to their economies by surpassing cattle in milk production and providing various valuable products. The limited availability of genomic and transcriptomic resources for buffaloes hinders the efforts to enhance their traits for increased milk and meat production. To address this gap, this study adopted the state-of-the-art bioinformatics tools to analyse 2429 transcriptomes representing 438 BioSamples from 23 BioProjects obtained from a public domain database, representing 76 different types of tissues and cell types from all major organ systems in buffalo species (river and swamp). The outcome of this exhaustive genomic data led to the development of a relational buffalo expression database based on a three-tier architecture named as BuffExDb (http://46.202.167.198/buffex/). The user-friendliness and flexibilities in retrieval of tissue-specific genes (TSGs) and their functional annotation are the major characteristics of BuffExDb. This is the first of its kind that offers an effortlessly navigable and filterable database, enabling users to examine and visualize the expression levels of each tissue across multiple samples, simultaneously. It also provides the Tau score parameter for the identification of TSGs along with their essential roles in tissue development, maintenance, and function as observed through the enrichment test for gene ontologies. The exhaustive outcome of this work would pave the way for the biological, functional, and evolutionary studies for easy access. This prior information based on tissue-specific mechanisms can be used for future genomic research, especially in association studies in endeavour of enhanced buffalo breeding and conservation programmes. Database URL: http://46.202.167.198/buffex/.</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2025 ","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11758923/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143032529","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A change language for ontologies and knowledge graphs.
IF 3.4 4区 生物学
Database: The Journal of Biological Databases and Curation Pub Date : 2025-01-22 DOI: 10.1093/database/baae133
Harshad Hegde, Jennifer Vendetti, Damien Goutte-Gattat, J Harry Caufield, John B Graybeal, Nomi L Harris, Naouel Karam, Christian Kindermann, Nicolas Matentzoglu, James A Overton, Mark A Musen, Christopher J Mungall
{"title":"A change language for ontologies and knowledge graphs.","authors":"Harshad Hegde, Jennifer Vendetti, Damien Goutte-Gattat, J Harry Caufield, John B Graybeal, Nomi L Harris, Naouel Karam, Christian Kindermann, Nicolas Matentzoglu, James A Overton, Mark A Musen, Christopher J Mungall","doi":"10.1093/database/baae133","DOIUrl":"10.1093/database/baae133","url":null,"abstract":"<p><p>Ontologies and knowledge graphs (KGs) are general-purpose computable representations of some domain, such as human anatomy, and are frequently a crucial part of modern information systems. Most of these structures change over time, incorporating new knowledge or information that was previously missing. Managing these changes is a challenge, both in terms of communicating changes to users and providing mechanisms to make it easier for multiple stakeholders to contribute. To fill that need, we have created KGCL, the Knowledge Graph Change Language (https://github.com/INCATools/kgcl), a standard data model for describing changes to KGs and ontologies at a high level, and an accompanying human-readable Controlled Natural Language (CNL). This language serves two purposes: a curator can use it to request desired changes, and it can also be used to describe changes that have already happened, corresponding to the concepts of \"apply patch\" and \"diff\" commonly used for managing changes in text documents and computer programs. Another key feature of KGCL is that descriptions are at a high enough level to be useful and understood by a variety of stakeholders-e.g. ontology edits can be specified by commands like \"add synonym 'arm' to 'forelimb'\" or \"move 'Parkinson disease' under 'neurodegenerative disease'.\" We have also built a suite of tools for managing ontology changes. These include an automated agent that integrates with and monitors GitHub ontology repositories and applies any requested changes and a new component in the BioPortal ontology resource that allows users to make change requests directly from within the BioPortal user interface. Overall, the KGCL data model, its CNL, and associated tooling allow for easier management and processing of changes associated with the development of ontologies and KGs. Database URL: https://github.com/INCATools/kgcl.</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2025 ","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11753292/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143022562","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Standardized pipelines support and facilitate integration of diverse datasets at the Rat Genome Database.
IF 3.4 4区 生物学
Database: The Journal of Biological Databases and Curation Pub Date : 2025-01-22 DOI: 10.1093/database/baae132
Jennifer R Smith, Marek A Tutaj, Jyothi Thota, Logan Lamers, Adam C Gibson, Akhilanand Kundurthi, Varun Reddy Gollapally, Kent C Brodie, Stacy Zacher, Stanley J F Laulederkind, G Thomas Hayman, Shur-Jen Wang, Monika Tutaj, Mary L Kaldunski, Mahima Vedi, Wendy M Demos, Jeffrey L De Pons, Melinda R Dwinell, Anne E Kwitek
{"title":"Standardized pipelines support and facilitate integration of diverse datasets at the Rat Genome Database.","authors":"Jennifer R Smith, Marek A Tutaj, Jyothi Thota, Logan Lamers, Adam C Gibson, Akhilanand Kundurthi, Varun Reddy Gollapally, Kent C Brodie, Stacy Zacher, Stanley J F Laulederkind, G Thomas Hayman, Shur-Jen Wang, Monika Tutaj, Mary L Kaldunski, Mahima Vedi, Wendy M Demos, Jeffrey L De Pons, Melinda R Dwinell, Anne E Kwitek","doi":"10.1093/database/baae132","DOIUrl":"10.1093/database/baae132","url":null,"abstract":"<p><p>The Rat Genome Database (RGD) is a multispecies knowledgebase which integrates genetic, multiomic, phenotypic, and disease data across 10 mammalian species. To support cross-species, multiomics studies and to enhance and expand on data manually extracted from the biomedical literature by the RGD team of expert curators, RGD imports and integrates data from multiple sources. These include major databases and a substantial number of domain-specific resources, as well as direct submissions by individual researchers. The incorporation of these diverse datatypes is handled by a growing list of automated import, export, data processing, and quality control pipelines. This article outlines the development over time of a standardized infrastructure for automated RGD pipelines with a summary of key design decisions and a focus on lessons learned.</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2025 ","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11753291/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143022144","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Correction to: The landscape of microRNA interaction annotation: analysis of three rare disorders as a case study. 修正:microRNA相互作用的景观注释:作为案例研究的三种罕见疾病的分析。
IF 3.4 4区 生物学
Database: The Journal of Biological Databases and Curation Pub Date : 2025-01-13 DOI: 10.1093/database/baae131
{"title":"Correction to: The landscape of microRNA interaction annotation: analysis of three rare disorders as a case study.","authors":"","doi":"10.1093/database/baae131","DOIUrl":"10.1093/database/baae131","url":null,"abstract":"","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2025 ","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11726336/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142969881","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
LSD600: the first corpus of biomedical abstracts annotated with lifestyle-disease relations. LSD600:第一个带有生活方式与疾病关系注释的生物医学摘要语料库。
IF 3.4 4区 生物学
Database: The Journal of Biological Databases and Curation Pub Date : 2025-01-13 DOI: 10.1093/database/baae129
Esmaeil Nourani, Evangelia-Mantelena Makri, Xiqing Mao, Sampo Pyysalo, Søren Brunak, Katerina Nastou, Lars Juhl Jensen
{"title":"LSD600: the first corpus of biomedical abstracts annotated with lifestyle-disease relations.","authors":"Esmaeil Nourani, Evangelia-Mantelena Makri, Xiqing Mao, Sampo Pyysalo, Søren Brunak, Katerina Nastou, Lars Juhl Jensen","doi":"10.1093/database/baae129","DOIUrl":"10.1093/database/baae129","url":null,"abstract":"<p><p>Lifestyle factors (LSFs) are increasingly recognized as instrumental in both the development and control of diseases. Despite their importance, there is a lack of methods to extract relations between LSFs and diseases from the literature, a step necessary to consolidate the currently available knowledge into a structured form. As simple co-occurrence-based relation extraction (RE) approaches are unable to distinguish between the different types of LSF-disease relations, context-aware models such as transformers are required to extract and classify these relations into specific relation types. However, no comprehensive LSF-disease RE system existed, nor a corpus suitable for developing one. We present LSD600 (available at https://zenodo.org/records/13952449), the first corpus specifically designed for LSF-disease RE, comprising 600 abstracts with 1900 relations of eight distinct types between 5027 diseases and 6930 LSF entities. We evaluated LSD600's quality by training a RoBERTa model on the corpus, achieving an F-score of 68.5% for the multilabel RE task on the held-out test set. We further validated LSD600 by using the trained model on the two Nutrition-Disease and FoodDisease datasets, where it achieved F-scores of 70.7% and 80.7%, respectively. Building on these performance results, LSD600 and the RE system trained on it can be valuable resources to fill the existing gap in this area and pave the way for downstream applications. Database URL: https://zenodo.org/records/13952449.</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2025 ","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11756709/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143001804","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信