{"title":"mirTarCLASH: a comprehensive miRNA target database based on chimeric read-based experiments.","authors":"Tzu-Hsien Yang, Xiang-Wei Li, Yuan-Han Lee, Shang-Yi Lu, Wei-Sheng Wu, Heng-Chi Lee","doi":"10.1093/database/baaf023","DOIUrl":"https://doi.org/10.1093/database/baaf023","url":null,"abstract":"<p><p>MicroRNAs (miRNAs) can target messenger RNAs to control their degradation or translation repression effects. Therefore, identifying the target and binding sites of different miRNAs is essential for understanding miRNA functions. To investigate these interactions, researchers have employed the cross-linking, ligation, and sequencing of hybrids (CLASH-seq) and similar CLASH-like approaches to generate chimeric reads formed by miRNAs and their targeting segments. These chimeric reads allow for the direct extraction of both the miRNA-target gene pairs and their corresponding binding sites. Nevertheless, these studies lack user-friendly platforms for researchers to investigate these interactions efficiently, thus hindering scientists' ability to explore miRNA functions. To address this gap, we developed mirTarCLASH, a comprehensive database that deposits 502 061/322 707/224 452 unique hybrid reads from human/mouse/worm miRNA chimeric read-based experiments. In mirTarCLASH, the chimera analysis algorithm ChiRA and two distinct binding site inference tools, RNAup and miRanda, were adopted to facilitate the exploration of miRNA-target pairs derived from CLASH-like experiments. Compared with existing similar repositories, mirTarCLASH further enables several confidence evaluation filters with visualization functions for the extracted results. The results can be further refined based on the key properties of the miRNA targeting sites, including read depths, numbers of supporting algorithms, and cross-linking-induced mutations, to enhance confidence levels. In addition, these miRNA-binding sites are visually represented through an integrated transcript atlas. Finally, we demonstrated the biological applicability of mirTarCLASH via the well-characterized example interaction between cel-let-7-5p and lin-41 in Caenorhabditis elegans, showcasing the potential of mirTarCLASH to provide novel insights for subsequent experimental research designs. The constructed mirTarCLASH database is freely available at https://cosbi.ee.ncku.edu.tw/MirTarClash. Database URL: https://cosbi.ee.ncku.edu.tw/MirTarClash.</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2025 ","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144126858","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Naama Menda, Bryan J Ellerbrock, Christiano C Simoes, Srikanth Kumar Karaikal, Christine Nyaga, Mirella Flores-Gonzalez, Isaak Y Tecle, David Lyon, Afolabi Agbona, Paterne A Agre, Prasad Peteti, Violet Akech, Amos Asiimwe, Eglantine Fauvelle, Karima Meghar, Thierry Tran, Dominique Dufour, Laurel Cooper, Marie-Angélique Laporte, Elizabeth Arnaud, Lukas A Mueller
{"title":"Post-composing ontology terms for efficient phenotyping in plant breeding.","authors":"Naama Menda, Bryan J Ellerbrock, Christiano C Simoes, Srikanth Kumar Karaikal, Christine Nyaga, Mirella Flores-Gonzalez, Isaak Y Tecle, David Lyon, Afolabi Agbona, Paterne A Agre, Prasad Peteti, Violet Akech, Amos Asiimwe, Eglantine Fauvelle, Karima Meghar, Thierry Tran, Dominique Dufour, Laurel Cooper, Marie-Angélique Laporte, Elizabeth Arnaud, Lukas A Mueller","doi":"10.1093/database/baaf020","DOIUrl":"https://doi.org/10.1093/database/baaf020","url":null,"abstract":"<p><p>Ontologies are widely used in databases to standardize data, improving data quality, integration, and ease of comparison. Within ontologies tailored to diverse use cases, post-composing user-defined terms reconciles the demands for standardization on the one hand and flexibility on the other. In many instances of Breedbase, a digital ecosystem for plant breeding designed for genomic selection, the goal is to capture phenotypic data using highly curated and rigorous crop ontologies, while adapting to the specific requirements of plant breeders to record data quickly and efficiently. For example, post-composing enables users to tailor ontology terms to suit specific and granular use cases such as repeated measurements on different plant parts and special sample preparation techniques. To achieve this, we have implemented a post-composing tool based on orthogonal ontologies providing users with the ability to introduce additional levels of phenotyping granularity tailored to unique experimental designs. Post-composed terms are designed to be reused by all breeding programs within a Breedbase instance but are not exported to the crop reference ontologies. Breedbase users can post-compose terms across various categories, such as plant anatomy, treatments, temporal events, and breeding cycles, and, as a result, generate highly specific terms for more accurate phenotyping.</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2025 ","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144126879","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Naama Menda, Bryan J Ellerbrock, Christiano C Simoes, Srikanth Kumar Karaikal, Christine Nyaga, Mirella Flores-Gonzalez, Isaak Y Tecle, David Lyon, Afolabi Agbona, Paterne A Agre, Prasad Peteti, Violet Akech, Amos Asiimwe, Eglantine Fauvelle, Karima Meghar, Thierry Tran, Dominique Dufour, Laurel Cooper, Marie-Angélique Laporte, Elizabeth Arnaud, Lukas A Mueller
{"title":"Post-composing ontology terms for efficient phenotyping in plant breeding.","authors":"Naama Menda, Bryan J Ellerbrock, Christiano C Simoes, Srikanth Kumar Karaikal, Christine Nyaga, Mirella Flores-Gonzalez, Isaak Y Tecle, David Lyon, Afolabi Agbona, Paterne A Agre, Prasad Peteti, Violet Akech, Amos Asiimwe, Eglantine Fauvelle, Karima Meghar, Thierry Tran, Dominique Dufour, Laurel Cooper, Marie-Angélique Laporte, Elizabeth Arnaud, Lukas A Mueller","doi":"10.1093/database/baaf020","DOIUrl":"10.1093/database/baaf020","url":null,"abstract":"<p><p>Ontologies are widely used in databases to standardize data, improving data quality, integration, and ease of comparison. Within ontologies tailored to diverse use cases, post-composing user-defined terms reconciles the demands for standardization on the one hand and flexibility on the other. In many instances of Breedbase, a digital ecosystem for plant breeding designed for genomic selection, the goal is to capture phenotypic data using highly curated and rigorous crop ontologies, while adapting to the specific requirements of plant breeders to record data quickly and efficiently. For example, post-composing enables users to tailor ontology terms to suit specific and granular use cases such as repeated measurements on different plant parts and special sample preparation techniques. To achieve this, we have implemented a post-composing tool based on orthogonal ontologies providing users with the ability to introduce additional levels of phenotyping granularity tailored to unique experimental designs. Post-composed terms are designed to be reused by all breeding programs within a Breedbase instance but are not exported to the crop reference ontologies. Breedbase users can post-compose terms across various categories, such as plant anatomy, treatments, temporal events, and breeding cycles, and, as a result, generate highly specific terms for more accurate phenotyping.</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2025 ","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11927528/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143673564","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A comprehensive experimental comparison between federated and centralized learning.","authors":"Swier Garst, Julian Dekker, Marcel Reinders","doi":"10.1093/database/baaf016","DOIUrl":"10.1093/database/baaf016","url":null,"abstract":"<p><p>Federated learning is an upcoming machine learning paradigm which allows data from multiple sources to be used for training of classifiers without the data leaving the source it originally resides. This can be highly valuable for use cases such as medical research, where gathering data at a central location can be quite complicated due to privacy and legal concerns of the data. In such cases, federated learning has the potential to vastly speed up the research cycle. Although federated and central learning have been compared from a theoretical perspective, an extensive experimental comparison of performances and learning behavior still lacks. We have performed a comprehensive experimental comparison between federated and centralized learning. We evaluated various classifiers on various datasets exploring influences of different sample distributions as well as different class distributions across the clients. The results show similar performances under a wide variety of settings between the federated and central learning strategies. Federated learning is able to deal with various imbalances in the data distributions. It is sensitive to batch effects between different datasets when they coincide with location, similar to central learning, but this setting might go unobserved more easily. Federated learning seems to be robust to various challenges such as skewed data distributions, high data dimensionality, multiclass problems, and complex models. Taken together, the insights from our comparison gives much promise for applying federated learning as an alternative to sharing data. Code for reproducing the results in this work can be found at: https://github.com/swiergarst/FLComparison.</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2025 ","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11928227/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143673562","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A comprehensive experimental comparison between federated and centralized learning.","authors":"Swier Garst, Julian Dekker, Marcel Reinders","doi":"10.1093/database/baaf016","DOIUrl":"https://doi.org/10.1093/database/baaf016","url":null,"abstract":"<p><p>Federated learning is an upcoming machine learning paradigm which allows data from multiple sources to be used for training of classifiers without the data leaving the source it originally resides. This can be highly valuable for use cases such as medical research, where gathering data at a central location can be quite complicated due to privacy and legal concerns of the data. In such cases, federated learning has the potential to vastly speed up the research cycle. Although federated and central learning have been compared from a theoretical perspective, an extensive experimental comparison of performances and learning behavior still lacks. We have performed a comprehensive experimental comparison between federated and centralized learning. We evaluated various classifiers on various datasets exploring influences of different sample distributions as well as different class distributions across the clients. The results show similar performances under a wide variety of settings between the federated and central learning strategies. Federated learning is able to deal with various imbalances in the data distributions. It is sensitive to batch effects between different datasets when they coincide with location, similar to central learning, but this setting might go unobserved more easily. Federated learning seems to be robust to various challenges such as skewed data distributions, high data dimensionality, multiclass problems, and complex models. Taken together, the insights from our comparison gives much promise for applying federated learning as an alternative to sharing data. Code for reproducing the results in this work can be found at: https://github.com/swiergarst/FLComparison.</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2025 ","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144126974","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"VarGuideAtlas: a repository of variant interpretation guidelines.","authors":"Mireia Costa, Alberto García S, Oscar Pastor","doi":"10.1093/database/baaf017","DOIUrl":"10.1093/database/baaf017","url":null,"abstract":"<p><p>Variant interpretation guidelines guide the process of determining the role of DNA variants in patients' health. Currently, hundreds of guidelines exist, each applicable to a particular clinical domain. However, they are scattered across multiple resources and scientific literature. To address this issue, we present VarGuideAtlas, a comprehensive repository of variant interpretation guidelines that compiles information from ClinGen, ClinVar, and PubMed. Our repository offers a user-friendly web interface with advanced search capabilities, enabling clinicians and researchers to efficiently find relevant guidelines tailored to specific genes, diseases, or variant types. We employ ontologies to characterize each guideline, ensuring consistency and improving interoperability with bioinformatics tools. VarGuideAtlas represents a significant advance toward standardizing variant interpretation practices, facilitating more informed decision-making, improved clinical outcomes, and more precise genomic research. VarGuideAtlas is publicly accessible via a web-based platform (https://genomics-hub.pros.dsic.upv.es:3016/).</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2025 ","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11895764/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143604068","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"VarGuideAtlas: a repository of variant interpretation guidelines.","authors":"Mireia Costa, Alberto García S, Oscar Pastor","doi":"10.1093/database/baaf017","DOIUrl":"https://doi.org/10.1093/database/baaf017","url":null,"abstract":"<p><p>Variant interpretation guidelines guide the process of determining the role of DNA variants in patients' health. Currently, hundreds of guidelines exist, each applicable to a particular clinical domain. However, they are scattered across multiple resources and scientific literature. To address this issue, we present VarGuideAtlas, a comprehensive repository of variant interpretation guidelines that compiles information from ClinGen, ClinVar, and PubMed. Our repository offers a user-friendly web interface with advanced search capabilities, enabling clinicians and researchers to efficiently find relevant guidelines tailored to specific genes, diseases, or variant types. We employ ontologies to characterize each guideline, ensuring consistency and improving interoperability with bioinformatics tools. VarGuideAtlas represents a significant advance toward standardizing variant interpretation practices, facilitating more informed decision-making, improved clinical outcomes, and more precise genomic research. VarGuideAtlas is publicly accessible via a web-based platform (https://genomics-hub.pros.dsic.upv.es:3016/).</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2025 ","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144126926","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Pipeline to explore information on genome editing using large language models and genome editing meta-database.","authors":"Takayuki Suzuki, Hidemasa Bono","doi":"10.1093/database/baaf022","DOIUrl":"https://doi.org/10.1093/database/baaf022","url":null,"abstract":"<p><p>Genome editing (GE) is widely recognized as an effective and valuable technology in life sciences research. However, certain genes are difficult to edit depending on some factors such as the type of species, sequences, and GE tools. Therefore, confirming the presence or absence of GE practices in previous publications is crucial for the effective designing and establishment of research using GE. Although the Genome Editing Meta-database (GEM: https://bonohu.hiroshima-u.ac.jp/gem/) aims to provide as comprehensive GE information as possible, it does not indicate how each registered gene is involved in GE. In this study, we developed a systematic method for extracting essential GE information using large language models from the information based on GEM and GE-related articles. This approach allows for a systematic and efficient investigation of GE information that cannot be achieved using the current GEM alone. In addition, by converting the extracted GE information into metrics, we propose a potential application of this method to prioritize genes for future research. The extracted GE information and novel GE-related scores are expected to facilitate the efficient selection of target genes for GE and support the design of research using GE. Database URLs: https://github.com/szktkyk/extract_geinfo, https://github.com/szktkyk/visualize_geinfo.</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2025 ","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144126878","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Pipeline to explore information on genome editing using large language models and genome editing meta-database.","authors":"Takayuki Suzuki, Hidemasa Bono","doi":"10.1093/database/baaf022","DOIUrl":"10.1093/database/baaf022","url":null,"abstract":"<p><p>Genome editing (GE) is widely recognized as an effective and valuable technology in life sciences research. However, certain genes are difficult to edit depending on some factors such as the type of species, sequences, and GE tools. Therefore, confirming the presence or absence of GE practices in previous publications is crucial for the effective designing and establishment of research using GE. Although the Genome Editing Meta-database (GEM: https://bonohu.hiroshima-u.ac.jp/gem/) aims to provide as comprehensive GE information as possible, it does not indicate how each registered gene is involved in GE. In this study, we developed a systematic method for extracting essential GE information using large language models from the information based on GEM and GE-related articles. This approach allows for a systematic and efficient investigation of GE information that cannot be achieved using the current GEM alone. In addition, by converting the extracted GE information into metrics, we propose a potential application of this method to prioritize genes for future research. The extracted GE information and novel GE-related scores are expected to facilitate the efficient selection of target genes for GE and support the design of research using GE. Database URLs: https://github.com/szktkyk/extract_geinfo, https://github.com/szktkyk/visualize_geinfo.</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2025 ","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11890094/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143582230","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fernando Mora-Márquez, Mikel Hurtado, Unai López de Heredia
{"title":"gymnotoa-db: a database and application to optimize functional annotation in gymnosperms.","authors":"Fernando Mora-Márquez, Mikel Hurtado, Unai López de Heredia","doi":"10.1093/database/baaf019","DOIUrl":"10.1093/database/baaf019","url":null,"abstract":"<p><p>Gymnosperms are a clade of non-flowering plants that include about 1000 living species. Due to their complex genomes and lack of genomic resources, functional annotation in genomics and transcriptomics on gymnosperms suffers from limitations. Here we present gymnotoa-db, which is a novel, publicly accessible relational database designed to facilitate functional annotation in gymnosperms. This database stores non-redundant records of gymnosperm proteins, encompassing taxonomic and functional information. The complementary software, gymnotoa-app, enables users to download gymnotoa-db and execute a comprehensive functional annotation pipeline for high-throughput sequencing-derived DNA or cDNA sequences. gymnotoa-app's user-friendly interface and efficient algorithms streamline the functional annotation process, making it an invaluable tool for researchers studying gymnosperms. We compared gymnotoa-app's performance against other annotation tools utilizing disparate reference databases. Our results demonstrate gymnotoa-app's superior ability to accurately annotate gymnosperm transcripts, recovering a greater number of transcripts and unique, non-redundant Gene Ontology terms. gymnotoa-db's distinctive features include comprehensive coverage with a non-redundant dataset of gymnosperm protein sequences, robust functional information that integrates data from multiple ontology systems, including GO, KEGG, EC, and MetaCYC, while keeping the taxonomic context, including Arabidopsis homologs. Database URL: https://blogs.upm.es/gymnotoa-db/2024/09/19/gymnotoa-app/.</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2025 ","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11886576/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143572429","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}