{"title":"Working in biocuration: contemporary experiences and perspectives.","authors":"Sarah R Davies","doi":"10.1093/database/baaf003","DOIUrl":"https://doi.org/10.1093/database/baaf003","url":null,"abstract":"<p><p>This perspective article synthesizes current knowledge regarding what is known regarding biocuration as a career and the challenges facing the field. It draws on existing literature and ongoing qualitative research to discuss the nature of biocuration, biocurators' career trajectories, key challenges that biocurators face, and strategies for overcoming these. Overall, biocurators express a high degree of satisfaction with their work and see it as central to the wider biosciences. The central challenges that they face relate to the underfunding and under-recognition of this work, meaning that there is minimal stable funding for the field and that the work of human biocurators is often invisible to those who use curated resources. The article closes by critically discussing existing and potential strategies for responding to these challenges.</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2025 ","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144126946","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Andrew Green, Carlos Eduardo Ribas, Nancy Ontiveros-Palacios, Sam Griffiths-Jones, Anton I Petrov, Alex Bateman, Blake Sweeney
{"title":"LitSumm: large language models for literature summarization of noncoding RNAs.","authors":"Andrew Green, Carlos Eduardo Ribas, Nancy Ontiveros-Palacios, Sam Griffiths-Jones, Anton I Petrov, Alex Bateman, Blake Sweeney","doi":"10.1093/database/baaf006","DOIUrl":"10.1093/database/baaf006","url":null,"abstract":"<p><p>Curation of literature in life sciences is a growing challenge. The continued increase in the rate of publication, coupled with the relatively fixed number of curators worldwide, presents a major challenge to developers of biomedical knowledgebases. Very few knowledgebases have resources to scale to the whole relevant literature and all have to prioritize their efforts. In this work, we take a first step to alleviating the lack of curator time in RNA science by generating summaries of literature for noncoding RNAs using large language models (LLMs). We demonstrate that high-quality, factually accurate summaries with accurate references can be automatically generated from the literature using a commercial LLM and a chain of prompts and checks. Manual assessment was carried out for a subset of summaries, with the majority being rated extremely high quality. We apply our tool to a selection of >4600 ncRNAs and make the generated summaries available via the RNAcentral resource. We conclude that automated literature summarization is feasible with the current generation of LLMs, provided that careful prompting and automated checking are applied. Database URL: https://rnacentral.org/.</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2025 ","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11833236/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143254947","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Andrew Green, Carlos Eduardo Ribas, Nancy Ontiveros-Palacios, Sam Griffiths-Jones, Anton I Petrov, Alex Bateman, Blake Sweeney
{"title":"LitSumm: large language models for literature summarization of noncoding RNAs.","authors":"Andrew Green, Carlos Eduardo Ribas, Nancy Ontiveros-Palacios, Sam Griffiths-Jones, Anton I Petrov, Alex Bateman, Blake Sweeney","doi":"10.1093/database/baaf006","DOIUrl":"https://doi.org/10.1093/database/baaf006","url":null,"abstract":"<p><p>Curation of literature in life sciences is a growing challenge. The continued increase in the rate of publication, coupled with the relatively fixed number of curators worldwide, presents a major challenge to developers of biomedical knowledgebases. Very few knowledgebases have resources to scale to the whole relevant literature and all have to prioritize their efforts. In this work, we take a first step to alleviating the lack of curator time in RNA science by generating summaries of literature for noncoding RNAs using large language models (LLMs). We demonstrate that high-quality, factually accurate summaries with accurate references can be automatically generated from the literature using a commercial LLM and a chain of prompts and checks. Manual assessment was carried out for a subset of summaries, with the majority being rated extremely high quality. We apply our tool to a selection of >4600 ncRNAs and make the generated summaries available via the RNAcentral resource. We conclude that automated literature summarization is feasible with the current generation of LLMs, provided that careful prompting and automated checking are applied. Database URL: https://rnacentral.org/.</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2025 ","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144126882","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Expression of Concern: DisGeNet: a disease-centric interaction database among diseases and various associated genes.","authors":"","doi":"10.1093/database/baaf007","DOIUrl":"https://doi.org/10.1093/database/baaf007","url":null,"abstract":"","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2025 ","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144126745","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Expression of Concern: DisGeNet: a disease-centric interaction database among diseases and various associated genes.","authors":"","doi":"10.1093/database/baaf007","DOIUrl":"https://doi.org/10.1093/database/baaf007","url":null,"abstract":"","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2025 ","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143995483","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Expression of Concern: DisGeNet: a disease-centric interaction database among diseases and various associated genes.","authors":"","doi":"10.1093/database/baaf007","DOIUrl":"10.1093/database/baaf007","url":null,"abstract":"","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2025 ","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11784583/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143070758","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Limin Zhang, Julian Starr, Bruce Ford, Anton Reznicek, Yuxuan Zhou, Étienne Léveillé-Bourret, Étienne Lacroix-Carignan, Jacques Cayouette, Tyler W Smith, Donald Sutherland, Paul Catling, Jeffery M Saarela, Hong Cui, James Macklin
{"title":"Helping authors produce FAIR taxonomic data: evaluation of an author-driven phenotype data production prototype.","authors":"Limin Zhang, Julian Starr, Bruce Ford, Anton Reznicek, Yuxuan Zhou, Étienne Léveillé-Bourret, Étienne Lacroix-Carignan, Jacques Cayouette, Tyler W Smith, Donald Sutherland, Paul Catling, Jeffery M Saarela, Hong Cui, James Macklin","doi":"10.1093/database/baae097","DOIUrl":"10.1093/database/baae097","url":null,"abstract":"<p><p>It is well-known that the use of vocabulary in phenotype treatments is often inconsistent. An earlier survey of biologists who create or use phenotypic characters revealed that this lack of standardization leads to ambiguities, frustrating both the consumers and producers of phenotypic data. Such ambiguities are challenging for biologists, and more so for Artificial Intelligence, to resolve. That survey also indicated a strong interest in a new authoring workflow supported by ontologies to ensure published phenotype data are FAIR (Findable, Accessible, Interoperable, and Reusable) and suitable for large-scale computational analyses. In this article, we introduce a prototype software system designed for authors to produce computational phenotype data. This platform includes a web-based, ontology-enhanced editor for taxonomic characters (Character Recorder), an Ontology Backend holding standardized vocabulary (the Cared Ontology), and a mobile application for resolving ontological conflicts (Conflict Resolver). We present two formal user evaluations of Character Recorder, the main interface authors would interact with to produce FAIR data. The evaluations were conducted with undergraduate biology students and Carex experts. We evaluated Character Recorder against Microsoft Excel on their effectiveness, efficiency, and the cognitive demands of the users in producing computable taxon-by-character matrices. The evaluations showed that Character Recorder is quickly learnable for both student and professional participants, with its cognitive demand comparable to Excel's. Participants agreed that the quality of the data Character Recorder yielded was superior. Students praised Character Recorder's educational value, while Carex experts were keen to recommend it and help evolve it from a prototype into a comprehensive tool. Feature improvements recommended by expert participants have been implemented after the evaluation.</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2025 ","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11928229/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143064244","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Limin Zhang, Julian Starr, Bruce Ford, Anton Reznicek, Yuxuan Zhou, Étienne Léveillé-Bourret, Étienne Lacroix-Carignan, Jacques Cayouette, Tyler W Smith, Donald Sutherland, Paul Catling, Jeffery M Saarela, Hong Cui, James Macklin
{"title":"Helping authors produce FAIR taxonomic data: evaluation of an author-driven phenotype data production prototype.","authors":"Limin Zhang, Julian Starr, Bruce Ford, Anton Reznicek, Yuxuan Zhou, Étienne Léveillé-Bourret, Étienne Lacroix-Carignan, Jacques Cayouette, Tyler W Smith, Donald Sutherland, Paul Catling, Jeffery M Saarela, Hong Cui, James Macklin","doi":"10.1093/database/baae097","DOIUrl":"https://doi.org/10.1093/database/baae097","url":null,"abstract":"<p><p>It is well-known that the use of vocabulary in phenotype treatments is often inconsistent. An earlier survey of biologists who create or use phenotypic characters revealed that this lack of standardization leads to ambiguities, frustrating both the consumers and producers of phenotypic data. Such ambiguities are challenging for biologists, and more so for Artificial Intelligence, to resolve. That survey also indicated a strong interest in a new authoring workflow supported by ontologies to ensure published phenotype data are FAIR (Findable, Accessible, Interoperable, and Reusable) and suitable for large-scale computational analyses. In this article, we introduce a prototype software system designed for authors to produce computational phenotype data. This platform includes a web-based, ontology-enhanced editor for taxonomic characters (Character Recorder), an Ontology Backend holding standardized vocabulary (the Cared Ontology), and a mobile application for resolving ontological conflicts (Conflict Resolver). We present two formal user evaluations of Character Recorder, the main interface authors would interact with to produce FAIR data. The evaluations were conducted with undergraduate biology students and Carex experts. We evaluated Character Recorder against Microsoft Excel on their effectiveness, efficiency, and the cognitive demands of the users in producing computable taxon-by-character matrices. The evaluations showed that Character Recorder is quickly learnable for both student and professional participants, with its cognitive demand comparable to Excel's. Participants agreed that the quality of the data Character Recorder yielded was superior. Students praised Character Recorder's educational value, while Carex experts were keen to recommend it and help evolve it from a prototype into a comprehensive tool. Feature improvements recommended by expert participants have been implemented after the evaluation.</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2025 ","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144126846","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"BbGSD: Black-boned Sheep Genome SNP Database.","authors":"Chunjuan He, Lichang Chen, Juntao Cao, Yuqing Zhong, Zhendong Gao, Weidong Deng, Jiajin Zhang","doi":"10.1093/database/baaf004","DOIUrl":"https://doi.org/10.1093/database/baaf004","url":null,"abstract":"<p><p>Lanping black-boned (LPBB) sheep are a unique and rare ruminant species, characterized by black pigmentation in the skin and internal organs. Thus far, LPBB are the only known animal with heritable melanin characteristics besides the black-boned chicken, and the only mammal known to contain a large amount of melanin in the body. LPBB have therefore attracted substantial research attention, due to their potential contribution to medicine. However, long periods of grazing freely and crossbreeding with Lanping normal sheep (LPN) have diluted LPBB breeding resources, posing a challenge to the protection of species. To ensure the effective conservation and management of LPBB genetic resources, the construction of a large-scale database of genotypic information is therefore very important. To achieve this, we established the first LPBB-specific SNP database, named Black-boned Sheep Genome SNP Database (BbGSD, http://202.203.179.115:3838/oarsnpdb) using sheep genotype data (100 LPBB and 50 LPN) across 46 894 242 SNP sites. In this database, we implemented four main function modules: (i) the \"LD heatmap\" module, which uses a heatmap to enable the interactive visualization of pairwise linkage disequilibrium (LD) measurements between SNPs; (ii) the \"SNP distribution\" module, which allows users to interactively visualize tabular genotype data as heat maps; (iii) the \"Phylogenetics\" module which enables phylogenetic analysis to explore the evolutionary history or genetic relationships of the LPBB sheep; and the \"Diversity\" module, which can be used to calculate and display the nucleotide diversity among sheep populations in user-specified genomic regions. BbGSD is essential for accelerating studies on the functional genomics and screening of molecular markers of molecular-assisted breeding in black-boned sheep. Database URL: http://202.203.179.115:3838/oarsnpdb.</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2025 ","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144126387","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"BbGSD: Black-boned Sheep Genome SNP Database.","authors":"Chunjuan He, Lichang Chen, Juntao Cao, Yuqing Zhong, Zhendong Gao, Weidong Deng, Jiajin Zhang","doi":"10.1093/database/baaf004","DOIUrl":"10.1093/database/baaf004","url":null,"abstract":"<p><p>Lanping black-boned (LPBB) sheep are a unique and rare ruminant species, characterized by black pigmentation in the skin and internal organs. Thus far, LPBB are the only known animal with heritable melanin characteristics besides the black-boned chicken, and the only mammal known to contain a large amount of melanin in the body. LPBB have therefore attracted substantial research attention, due to their potential contribution to medicine. However, long periods of grazing freely and crossbreeding with Lanping normal sheep (LPN) have diluted LPBB breeding resources, posing a challenge to the protection of species. To ensure the effective conservation and management of LPBB genetic resources, the construction of a large-scale database of genotypic information is therefore very important. To achieve this, we established the first LPBB-specific SNP database, named Black-boned Sheep Genome SNP Database (BbGSD, http://202.203.179.115:3838/oarsnpdb) using sheep genotype data (100 LPBB and 50 LPN) across 46 894 242 SNP sites. In this database, we implemented four main function modules: (i) the \"LD heatmap\" module, which uses a heatmap to enable the interactive visualization of pairwise linkage disequilibrium (LD) measurements between SNPs; (ii) the \"SNP distribution\" module, which allows users to interactively visualize tabular genotype data as heat maps; (iii) the \"Phylogenetics\" module which enables phylogenetic analysis to explore the evolutionary history or genetic relationships of the LPBB sheep; and the \"Diversity\" module, which can be used to calculate and display the nucleotide diversity among sheep populations in user-specified genomic regions. BbGSD is essential for accelerating studies on the functional genomics and screening of molecular markers of molecular-assisted breeding in black-boned sheep. Database URL: http://202.203.179.115:3838/oarsnpdb.</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2025 ","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11774206/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143058345","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}