Database: The Journal of Biological Databases and Curation最新文献

筛选
英文 中文
JTIS: enhancing biomedical document-level relation extraction through joint training with intermediate steps. JTIS:通过中间步骤联合训练,加强生物医学文献级关系提取。
IF 3.4 4区 生物学
Database: The Journal of Biological Databases and Curation Pub Date : 2024-12-19 DOI: 10.1093/database/baae125
Jiru Li, Dinghao Pan, Zhihao Yang, Yuanyuan Sun, Hongfei Lin, Jian Wang
{"title":"JTIS: enhancing biomedical document-level relation extraction through joint training with intermediate steps.","authors":"Jiru Li, Dinghao Pan, Zhihao Yang, Yuanyuan Sun, Hongfei Lin, Jian Wang","doi":"10.1093/database/baae125","DOIUrl":"10.1093/database/baae125","url":null,"abstract":"<p><p>Biomedical Relation Extraction (RE) is central to Biomedical Natural Language Processing and is crucial for various downstream applications. Existing RE challenges in the field of biology have primarily focused on intra-sentential analysis. However, with the rapid increase in the volume of literature and the complexity of relationships between biomedical entities, it often becomes necessary to consider multiple sentences to fully extract the relationship between a pair of entities. Current methods often fail to fully capture the complex semantic structures of information in documents, thereby affecting extraction accuracy. Therefore, unlike traditional RE methods that rely on sentence-level analysis and heuristic rules, our method focuses on extracting entity relationships from biomedical literature titles and abstracts and classifying relations that are novel findings. In our method, a multitask training approach is employed for fine-tuning a Pre-trained Language Model in the field of biology. Based on a broad spectrum of carefully designed tasks, our multitask method not only extracts relations of better quality due to more effective supervision but also achieves a more accurate classification of whether the entity pairs are novel findings. Moreover, by applying a model ensemble method, we further enhance our model's performance. The extensive experiments demonstrate that our method achieves significant performance improvements, i.e. surpassing the existing baseline by 3.94% in RE and 3.27% in Triplet Novel Typing in F1 score on BioRED, confirming its effectiveness in handling complex biomedical literature RE tasks. Database URL: https://codalab.lisn.upsaclay.fr/competitions/13377#learn_the_details-dataset.</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2024 ","pages":""},"PeriodicalIF":3.4,"publicationDate":"2024-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11658465/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142863576","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
scEccDNAdb: an integrated single-cell eccDNA resource for human and mouse. scEccDNAdb:人类和小鼠的集成单细胞eccDNA资源。
IF 3.4 4区 生物学
Database: The Journal of Biological Databases and Curation Pub Date : 2024-12-18 DOI: 10.1093/database/baae126
Wenqing Wang, Xinyu Zhao, Tianyu Ma, Tengwei Zhong, Junnuo Zheng, Zhiyun Guo
{"title":"scEccDNAdb: an integrated single-cell eccDNA resource for human and mouse.","authors":"Wenqing Wang, Xinyu Zhao, Tianyu Ma, Tengwei Zhong, Junnuo Zheng, Zhiyun Guo","doi":"10.1093/database/baae126","DOIUrl":"10.1093/database/baae126","url":null,"abstract":"<p><p>Extrachromosomal circular DNA (eccDNA), an extrachromosomal circular structured DNA, is extensively found in eukaryotes. Investigating eccDNA at the single-cell level is crucial for understanding cellular heterogeneity, evolution, development, and specific cellular functions. However, high-throughput identification methods for single-cell eccDNA are complex, and the lack of mature, widely applicable technologies has resulted in limited resources. To address this gap, we built scEccDNAdb, a database based on single-cell whole-genome sequencing data. It contains 3 195 464 single-cell eccDNA entries from human and mouse samples, with annotations including oncogenes, typical enhancers, super-enhancers, CCCTC-binding factor-binding sites, single nucleotide polymorphisms, chromatin accessibility, expression quantitative trait loci, transcription factor binding sites, motifs, and structural variants. Additionally, it provides nine online analysis and visualization tools, which enable the creation of publication-quality figures through user-uploaded files. Overall, scEccDNAdb is a comprehensive database for analyzing single-cell eccDNA data across diverse cell types, tissues, and species. Database URL: https://lcbb.swjtu.edu.cn/scEccDNAdb/.</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2024 ","pages":""},"PeriodicalIF":3.4,"publicationDate":"2024-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11654243/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142853293","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AthRiboNC: an Arabidopsis database for ncRNAs with coding potential revealed from ribosome profiling. AthRiboNC:一个从核糖体分析中发现具有编码潜力的ncrna的拟南芥数据库。
IF 3.4 4区 生物学
Database: The Journal of Biological Databases and Curation Pub Date : 2024-12-17 DOI: 10.1093/database/baae123
Yi Shen, Liya Liu, Enyan Liu, Sida Li, Yuriy Orlov, Vladimir Ivanisenko, Ming Chen
{"title":"AthRiboNC: an Arabidopsis database for ncRNAs with coding potential revealed from ribosome profiling.","authors":"Yi Shen, Liya Liu, Enyan Liu, Sida Li, Yuriy Orlov, Vladimir Ivanisenko, Ming Chen","doi":"10.1093/database/baae123","DOIUrl":"10.1093/database/baae123","url":null,"abstract":"<p><p>Non-coding RNAs (ncRNAs) are traditionally considered incapable of encoding proteins, but new evidence suggests that small open reading frames (sORFs) within ncRNAs can actually encode biologically functional small peptides. Despite growing recognition of their importance, a systematic exploration of plant ncRNAs with coding potential has remained largely uncharted territory, especially in the context of their translational activities. By collecting and analyzing Ribo-Seq data from 226 Arabidopsis thaliana samples, we have integrated extensive information on Arabidopsis ncRNAs with coding potential and developed the AthRiboNC database, a novel and dedicated database that consolidates extensive information on ncRNAs with coding potential in Arabidopsis. AthRiboNC covers detailed information on 2743 long non-coding RNAs, 255 microRNAs, and 1871 circular RNA in Arabidopsis, along with 40 162 ORFs identified from these ncRNAs. The database also constructs co-expression networks for ncRNAs with coding potential, revealing correlations and potential biological function interpretations. With a commitment to accessibility and ease-of-use, AthRiboNC features a clear and intuitive interface. We hope that AthRiboNC will serve as a valuable resource for exploring the coding potential of plant ncRNAs. Database URL: https://bis.zju.edu.cn/athribonc.</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2024 ","pages":""},"PeriodicalIF":3.4,"publicationDate":"2024-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11651143/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142846024","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Probe my Pathway (PmP): a portal to explore the chemical coverage of the human Reactome. Probe my Pathway (PmP):探索人类反应组化学覆盖范围的门户网站。
IF 3.4 4区 生物学
Database: The Journal of Biological Databases and Curation Pub Date : 2024-12-05 DOI: 10.1093/database/baae116
Haejin Angela Kwak, Lihua Liu, Matthieu Schapira
{"title":"Probe my Pathway (PmP): a portal to explore the chemical coverage of the human Reactome.","authors":"Haejin Angela Kwak, Lihua Liu, Matthieu Schapira","doi":"10.1093/database/baae116","DOIUrl":"10.1093/database/baae116","url":null,"abstract":"<p><p>Deciphering pathway-phenotype associations is critical for a system-wide understanding of cells and the chemistry of life. An approach to reach this goal is to systematically modulate pathways pharmacologically. The targeted and controlled regulation of an increasing number of proteins is becoming possible, thanks to the growing list of chemical probes and chemogenomic compounds available to cell biologists, but no resource is available that directly maps these chemical tools on cellular pathways. To fill this gap, we developed Probe my Pathway (PmP), a database where high-quality chemical probes and well-characterized sets of chemogenomic compounds are mapped on all the human pathways of the Reactome database. The web interface allows users to browse the data via icicle charts or search the data for compounds, proteins, or pathways. Chemists can rapidly find pathways with low chemical coverage or explore the structural chemistry of ligands targeting specific cellular machineries. Cell biologists can look for chemical probes targeting different proteins in the same pathway or find which pathways are targeted by chemical probes of interest. PmP is updated annually and will grow with the expanding chemical tool kit produced by Target 2035 and other efforts. Database URL: https://apps.thesgc.org/pmp/.</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2024 ","pages":""},"PeriodicalIF":3.4,"publicationDate":"2024-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11630241/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142827744","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Toward clearer recognition and easier usefulness: development of a cross-lingual atherosclerotic cerebrovascular disease ontology. 更清晰的识别和更方便的使用:开发跨语言的动脉粥样硬化性脑血管疾病本体论。
IF 3.4 4区 生物学
Database: The Journal of Biological Databases and Curation Pub Date : 2024-12-05 DOI: 10.1093/database/baae117
Hetong Ma, Liu Shen, Jiayang Wang, Shilong Wang, Min Wang, Meng Wang, Zixiao Li, Jiao Li
{"title":"Toward clearer recognition and easier usefulness: development of a cross-lingual atherosclerotic cerebrovascular disease ontology.","authors":"Hetong Ma, Liu Shen, Jiayang Wang, Shilong Wang, Min Wang, Meng Wang, Zixiao Li, Jiao Li","doi":"10.1093/database/baae117","DOIUrl":"10.1093/database/baae117","url":null,"abstract":"<p><p>Atherosclerotic cerebrovascular disease could result in a great number of deaths and disabilities. However, it did not acquire enough attention. Less information, statistics, or data on the disease has been revealed. Thus, no systematic concept datasets were released to help clinicians clarify the scope, assist research, and offer maximized value. This study aimed to develop a cross-lingual atherosclerotic cerebrovascular disease ontology; describe the workflow, schema, hierarchical structure, and the highlighted content; design a brand-new rehabilitation ontology; implement the ontology evaluation; and illustrate the application scenarios in real-world scenarios. We implemented nine steps based on the Ontology Development 101 methodologies combined with expert opinions. The ontology included collection and specification of clinical requirements, background investigation and knowledge acquisition, ontology selection and reuse, scope identification, schema definition, concept extraction, concept extension, ontology verification, and ontology evaluation. We evaluated the proposed ontology in the literature classification task. The current ontology included 10 top-level classes, respectively, clinical manifestation, comorbidity, complication, diagnosis, model of atherosclerotic cerebrovascular disease, pathogenesis, prevention, rehabilitation, risk factor, and treatment. There are 1715 concepts in the 11-level ontology, covering 4588 Chinese terms, 6617 English terms, and 972 definitions. The ontology could be applied in real-world scenarios such as information retrieval, new expression discovery, named entity recognition, and knowledge fusion, and the use case proved that it could offer satisfying support to related medical scenarios. The ontology was proven to be useful in text classification tasks, and the weight-F1 score could reach >80% combined with the pretrained model. The proposed ontology provided a clear set of cross-lingual concepts and terms with an explicit hierarchical structure, helping scientific researchers to quickly retrieve relevant medical literature, assisting data scientists to efficiently identify relevant contents in electronic health records, and providing a clear domain framework for academic reference. Database URL: https://bioportal.bioontology.org/ontologies/ACVD_ONTOLOGY.</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2024 ","pages":""},"PeriodicalIF":3.4,"publicationDate":"2024-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11630243/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142827747","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The text2term tool to map free-text descriptions of biomedical terms to ontologies. 将生物医学术语的自由文本描述映射到本体的text2term工具。
IF 3.4 4区 生物学
Database: The Journal of Biological Databases and Curation Pub Date : 2024-11-28 DOI: 10.1093/database/baae119
Rafael S Gonçalves, Jason Payne, Amelia Tan, Carmen Benitez, Jamie Haddock, Robert Gentleman
{"title":"The text2term tool to map free-text descriptions of biomedical terms to ontologies.","authors":"Rafael S Gonçalves, Jason Payne, Amelia Tan, Carmen Benitez, Jamie Haddock, Robert Gentleman","doi":"10.1093/database/baae119","DOIUrl":"10.1093/database/baae119","url":null,"abstract":"<p><p>There is an ongoing need for scalable tools to aid researchers in both retrospective and prospective standardization of discrete entity types-such as disease names, cell types, or chemicals-that are used in metadata associated with biomedical data. When metadata are not well-structured or precise, the associated data are harder to find and are often burdensome to reuse, analyze, or integrate with other datasets due to the upfront curation effort required to make the data usable-typically through retrospective standardization and cleaning of the (meta)data. With the goal of facilitating the task of standardizing metadata-either in bulk or in a one-by-one fashion, e.g. to support autocompletion of biomedical entities in forms-we have developed an open-source tool called text2term that maps free-text descriptions of biomedical entities to controlled terms in ontologies. The tool is highly configurable and can be used in multiple ways that cater to different users and expertise levels-it is available on Python Package Index and can be used programmatically as any Python package; it can also be used via a command-line interface or via our hosted, graphical user interface-based web application or by deploying a local instance of our interactive application using Docker. Database URL: https://pypi.org/project/text2term.</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2024 ","pages":""},"PeriodicalIF":3.4,"publicationDate":"2024-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11604108/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142750183","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Genome-wide identification of SSR markers from coding regions for endangered Argania spinosa L. skeels and construction of SSR database: AsSSRdb. 从濒危刺阿干树(Argania spinosa L. skeels)编码区鉴定全基因组 SSR 标记并构建 SSR 数据库:AsSSRdb.
IF 3.4 4区 生物学
Database: The Journal of Biological Databases and Curation Pub Date : 2024-11-27 DOI: 10.1093/database/baae118
Karim Rabeh, Najoua Mghazli, Fatima Gaboun, Abdelkarim Filali-Maltouf, Laila Sbabou, Bouchra Belkadi
{"title":"Genome-wide identification of SSR markers from coding regions for endangered Argania spinosa L. skeels and construction of SSR database: AsSSRdb.","authors":"Karim Rabeh, Najoua Mghazli, Fatima Gaboun, Abdelkarim Filali-Maltouf, Laila Sbabou, Bouchra Belkadi","doi":"10.1093/database/baae118","DOIUrl":"10.1093/database/baae118","url":null,"abstract":"<p><p>Microsatellites [simple sequence repeats (SSRs)] are one of the most widely used sources of genetic markers, particularly prevalent in plants. Despite their importance in various applications, a comprehensive genome-wide identification of coding sequence (CDS)-associated SSR markers in the Argania spinosa L. genome has yet to be conducted. In this study, 66 280 CDSs containing 5351 SSRs within 4535 A. spinosa L. CDSs were identified. Among these, tri-nucleotide motifs (58.96%) were the most common, followed by hexa-nucleotide (15.71%) and di-nucleotide motifs (13.32%). The predominant SSR motif in the tri-nucleotide category was AAG (24.4%), while AG (94.1%) was the most abundant among di-nucleotide repeats. Furthermore, the extracted CDSs containing SSRs were subjected to functional annotation; 3396 CDSs (74.88%) exhibited homology with known proteins, 3341 CDSs (73.7%) were assigned Gene Ontology terms, 1004 CDSs were annotated with Enzyme Commission numbers, and 832 (18.3%) were annotated with KEGG pathways. A total of 3475 primer pairs were designed, out of which 3264 were successfully validated in silico against the A. spinosa L. genome, with 99.6% representing high-resolution markers yielding no more than three products. Additionally, the SSR markers demonstrated a low rate of transferability through in-silico verification in two species within the Sapotaceae family. Furthermore, we developed an online database, the \"Argania spinosa L. SSR database: https://as-fmmdb.shinyapps.io/asssrdb/\" (AsSSRdb) to provide access to the CDS-associated SSRs identified in this study. Overall, this research provides valuable marker resources for DNA fingerprinting, genetic studies, and molecular breeding in argan and related species. Database URL: https://as-fmmdb.shinyapps.io/asssrdb/.</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2024 ","pages":""},"PeriodicalIF":3.4,"publicationDate":"2024-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11602033/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142738569","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Genomic SSR Millets Database (GSMDB): enhancing genetic resources for sustainable agriculture. 基因组 SSR Millets 数据库(GSMDB):为可持续农业加强遗传资源。
IF 3.4 4区 生物学
Database: The Journal of Biological Databases and Curation Pub Date : 2024-11-15 DOI: 10.1093/database/baae114
Sonu Kumar, Sangeeta Singh, Rakesh Kumar, Dinesh Gupta
{"title":"The Genomic SSR Millets Database (GSMDB): enhancing genetic resources for sustainable agriculture.","authors":"Sonu Kumar, Sangeeta Singh, Rakesh Kumar, Dinesh Gupta","doi":"10.1093/database/baae114","DOIUrl":"10.1093/database/baae114","url":null,"abstract":"<p><p>The global population surge demands increased food production and nutrient-rich options to combat rising food insecurity. Climate-resilient crops are vital, with millets emerging as superfoods due to nutritional richness and stress tolerance. Given limited genomic information, a comprehensive genetic resource is crucial to advance millet research. Whole-genome sequencing provides an unprecedented opportunity, and molecular genetic methodologies, particularly simple sequence repeats (SSRs), play a pivotal role in DNA fingerprinting, constructing linkage maps, and conducting population genetic studies. SSRs are composed of repetitive DNA sequences where one to six nucleotides are repeated in tandem and distributed throughout the genome. Different millet species exhibit genomic variations attributed to the presence of SSRs. While SSRs have been identified in a few millet species, the existing information only covers some of the sequenced genomes. Moreover, there is an absence of complete gene annotation and visualization features for SSRs. Addressing this disparity and leveraging the de-novo millet genome assembly available from the NCBI, we have developed the Genomic SSR Millets Database (GSMDB; https://bioinfo.icgeb.res.in/gsmdb/). This open-access repository provides a web-based tool offering search functionalities and comprehensive details on 6.747645 million SSRs mined from the genomic sequences of seven millet species. The database, featuring unrestricted public access and JBrowse visualization, is a pioneering resource for the research community dedicated to advancing millet cultivars and related species. GSMDB holds immense potential to support myriad studies, including genetic diversity assessments, genetic mapping, marker-assisted selection, and comparative population investigations aiming to facilitate the millet breeding programs geared toward ensuring global food security. Database URL: https://bioinfo.icgeb.res.in/gsmdb/.</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2024 ","pages":""},"PeriodicalIF":3.4,"publicationDate":"2024-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11566590/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142638644","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Peptipedia v2.0: a peptide sequence database and user-friendly web platform. A major update. Peptipedia v2.0:多肽序列数据库和用户友好型网络平台。重大更新。
IF 3.4 4区 生物学
Database: The Journal of Biological Databases and Curation Pub Date : 2024-11-08 DOI: 10.1093/database/baae113
Gabriel Cabas-Mora, Anamaría Daza, Nicole Soto-García, Valentina Garrido, Diego Alvarez, Marcelo Navarrete, Lindybeth Sarmiento-Varón, Julieta H Sepúlveda Yañez, Mehdi D Davari, Frederic Cadet, Álvaro Olivera-Nappa, Roberto Uribe-Paredes, David Medina-Ortiz
{"title":"Peptipedia v2.0: a peptide sequence database and user-friendly web platform. A major update.","authors":"Gabriel Cabas-Mora, Anamaría Daza, Nicole Soto-García, Valentina Garrido, Diego Alvarez, Marcelo Navarrete, Lindybeth Sarmiento-Varón, Julieta H Sepúlveda Yañez, Mehdi D Davari, Frederic Cadet, Álvaro Olivera-Nappa, Roberto Uribe-Paredes, David Medina-Ortiz","doi":"10.1093/database/baae113","DOIUrl":"10.1093/database/baae113","url":null,"abstract":"<p><p>In recent years, peptides have gained significant relevance due to their therapeutic properties. The surge in peptide production and synthesis has generated vast amounts of data, enabling the creation of comprehensive databases and information repositories. Advances in sequencing techniques and artificial intelligence have further accelerated the design of tailor-made peptides. However, leveraging these techniques requires versatile and continuously updated storage systems, along with tools that facilitate peptide research and the implementation of machine learning for predictive systems. This work introduces Peptipedia v2.0, one of the most comprehensive public repositories of peptides, supporting biotechnological research by simplifying peptide study and annotation. Peptipedia v2.0 has expanded its collection by over 45% with peptide sequences that have reported biological activities. The functional biological activity tree has been revised and enhanced, incorporating new categories such as cosmetic and dermatological activities, molecular binding, and antiageing properties. Utilizing protein language models and machine learning, more than 90 binary classification models have been trained, validated, and incorporated into Peptipedia v2.0. These models exhibit average sensitivities and specificities of 0.877±0.0530 and 0.873±0.054, respectively, facilitating the annotation of more than 3.6 million peptide sequences with unknown biological activities, also registered in Peptipedia v2.0. Additionally, Peptipedia v2.0 introduces description tools based on structural and ontological properties and user-friendly machine learning tools to facilitate the application of machine learning strategies to study peptide sequences. Database URL: https://peptipedia.cl/.</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2024 ","pages":""},"PeriodicalIF":3.4,"publicationDate":"2024-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11734279/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142603627","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
athisomiRDB: A comprehensive database of Arabidopsis isomiRs. athisomiRDB:拟南芥同源异构体综合数据库。
IF 3.4 4区 生物学
Database: The Journal of Biological Databases and Curation Pub Date : 2024-11-08 DOI: 10.1093/database/baae115
A T Vivek, Ajay Arya, Supriya P Swain, Shailesh Kumar
{"title":"athisomiRDB: A comprehensive database of Arabidopsis isomiRs.","authors":"A T Vivek, Ajay Arya, Supriya P Swain, Shailesh Kumar","doi":"10.1093/database/baae115","DOIUrl":"10.1093/database/baae115","url":null,"abstract":"<p><p>Several pieces of evidence challenge the traditional view of miRNAs as static molecules, revealing dynamic isomiRs originating from each miRNA precursor arm. In plants, isomiRs, which result from imprecise cleavage during pre-miRNA processing and post-transcriptional alterations, serve as crucial regulators of target microRNAs (miRNAs). Despite numerous studies on Arabidopsis miRNAs, the systematic identification and annotation of isomiRs across various tissues and conditions remain limited. Due to the lack of systematically collected isomiR information, we introduce the athisomiRDB database, which houses 20 764 isomiRs from Arabidopsis small RNA-sequencing (sRNA-seq) libraries. It comprises >2700 diverse samples and allows exploration at the sample, miRNA, or isomiR levels, offering insights into the presence or absence of isomiRs. The athisomiRDB includes exclusive and ambiguous isomiRs, each with features such as transcriptional origin, variant-containing isomiRs, and identifiers for frequent single-nucleotide polymorphisms from the 1001 Genomes Project. It also provides 3' nontemplated post-transcriptional additions, isomiR-target interactions, and trait associations for each isomiR. We anticipate that athisomiRDB will play a pivotal role in unraveling the regulatory nature of the Arabidopsis miRNAome and enhancing sRNA research by leveraging isomiR profiles from extensive sRNA-seq datasets. Database URL: https://www.nipgr.ac.in/athisomiRDB.</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2024 ","pages":""},"PeriodicalIF":3.4,"publicationDate":"2024-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11544919/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142603625","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信