Database: The Journal of Biological Databases and Curation最新文献

筛选
英文 中文
A database on the historical and current occurrences of snakes in Eswatini. 关于史瓦蒂尼蛇的历史和当前事件的数据库。
IF 3.4 4区 生物学
Database: The Journal of Biological Databases and Curation Pub Date : 2025-07-08 DOI: 10.1093/database/baaf040
Ara Monadjem, Richard C Boycott, Thea Litscha-Koen, Adam Kane, Wisdom M Dlamini, Lindelwa Mmema, Katharine L Strutton, Zakhele Hlophe, Sara Padidar
{"title":"A database on the historical and current occurrences of snakes in Eswatini.","authors":"Ara Monadjem, Richard C Boycott, Thea Litscha-Koen, Adam Kane, Wisdom M Dlamini, Lindelwa Mmema, Katharine L Strutton, Zakhele Hlophe, Sara Padidar","doi":"10.1093/database/baaf040","DOIUrl":"https://doi.org/10.1093/database/baaf040","url":null,"abstract":"<p><p>Snakes are among the most difficult terrestrial vertebrates to survey, resulting in poor distributional information on most species. This database comprises of 3812 records of 58 species of snakes in 37 genera reported from within the boundaries of Eswatini. The data were compiled from multiple sources including museum specimens, iNaturalist records, literature records, and snake rescue operations. For each specimen reported in the database, we provide the scientific name, latitude and longitude coordinates, and location. Most records also have an associated date. This comprehensive database will be useful to biodiversity experts, conservationists, medical practitioners, researchers, and snake enthusiasts, especially for mapping and modelling snake distributions in the country. To allow easy viewing of the distribution of snakes in the country, we provide an online visualization tool, which should allow a greater number of non-scientists to utilize this database.</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":" ","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144583337","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Correction to: GymnoTOA-db: a database and application to optimize functional annotation in gymnosperms. 更正:GymnoTOA-db:一个优化裸子植物功能注释的数据库和应用程序。
IF 3.4 4区 生物学
Database: The Journal of Biological Databases and Curation Pub Date : 2025-07-08 DOI: 10.1093/database/baaf041
{"title":"Correction to: GymnoTOA-db: a database and application to optimize functional annotation in gymnosperms.","authors":"","doi":"10.1093/database/baaf041","DOIUrl":"https://doi.org/10.1093/database/baaf041","url":null,"abstract":"","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2025 ","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144583336","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PLoV: a comprehensive database of genetic variants leading to pregnancy loss. PLoV:导致流产的基因变异的综合数据库。
IF 3.4 4区 生物学
Database: The Journal of Biological Databases and Curation Pub Date : 2025-07-08 DOI: 10.1093/database/baaf037
Evgeniia M Maksiutenko, Igor V Bezdvornykh, Yury A Barbitoff, Yulia A Nasykhova, Andrey S Glotov
{"title":"PLoV: a comprehensive database of genetic variants leading to pregnancy loss.","authors":"Evgeniia M Maksiutenko, Igor V Bezdvornykh, Yury A Barbitoff, Yulia A Nasykhova, Andrey S Glotov","doi":"10.1093/database/baaf037","DOIUrl":"https://doi.org/10.1093/database/baaf037","url":null,"abstract":"<p><p>Pregnancy loss is an important reproductive health problem that affects many couples. Genetic factors play an important role in both spontaneous miscarriage and recurrent pregnancy loss, and the effect of genomic variants is recognized as one of the major causes of pregnancy loss in euploid foetuses. In this work, we extend our previous analysis of the genetic landscape of pregnancy loss and develop a Pregnancy Loss genetic Variant (PLoV) database to aggregate information about mutations that have been implicated in pregnancy loss. The database contains information about 534 genetic variants that have been observed in 421 cases across 47 studies, including foetus-only, parent-only, and trio-based studies. For each case, the database includes a detailed description of the phenotype, including ultrasound data (if provided in the original article). The genetic variants are scattered across all chromosomes in the human genome and affect a total of 292 unique genes. We provide a public access to the PLoV database at https://plovdb.ott.ru/. Database URL: https://plovdb.ott.ru/.</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":" ","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144583339","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
BgDB: a comprehensive genomic resource information system of bitter gourd for accelerated breeding programme. BgDB:用于加快苦瓜育种计划的综合基因组资源信息系统。
IF 3.4 4区 生物学
Database: The Journal of Biological Databases and Curation Pub Date : 2025-07-08 DOI: 10.1093/database/baaf039
Princy Saini, Ankita Singh, Tilak Chandra, Dheeraj Kumar Chaurasia, Kunal Chaudhary, Priyanka Jain, G Boopalakrishnan, Sarika Jaiswal, Shyam Sunder Dey, Tusar Kanti Behera, Ulavappa Basavanneppa Angadi, Mir Asif Iquebal, Dinesh Kumar
{"title":"BgDB: a comprehensive genomic resource information system of bitter gourd for accelerated breeding programme.","authors":"Princy Saini, Ankita Singh, Tilak Chandra, Dheeraj Kumar Chaurasia, Kunal Chaudhary, Priyanka Jain, G Boopalakrishnan, Sarika Jaiswal, Shyam Sunder Dey, Tusar Kanti Behera, Ulavappa Basavanneppa Angadi, Mir Asif Iquebal, Dinesh Kumar","doi":"10.1093/database/baaf039","DOIUrl":"https://doi.org/10.1093/database/baaf039","url":null,"abstract":"<p><p>Bitter gourd, scientifically known as Momordica charantia L. with 2n = 22, is a widely recognized medicinal vegetable, renowned for its multifaceted health benefits, primarily acclaimed for its lipid- and glucose-lowering effects. Its growing demands as a food source and for industrial applications necessitate value addition in ongoing breeding initiatives to enhance genotypic traits in multifarious ways. A thorough understanding of the underlying molecular footprint is warranted for characterization, which still remains underexplored relative to other cash crops. Though a chromosome-level genome assembly of bitter gourd is available, scattered and fragmented information becomes an obstacle for assisted breeding and gene editing. Therefore, it is crucial to further dissect structural and molecular variants, noncoding RNAs (ncRNAs), transcription factors, and transcripts from whole-genome and resequencing projects. The present study leads to the development of a comprehensive genomic resource, BgDB (Bitter Gourd Resource Database) at a single platform, vital for advanced bitter gourd breeding programmes for raising bitter gourd varieties with traits of significant social and economic value. BgDB, available at https://bgdb.daasbioinfromaticsteam.in/index.php, is a user-friendly, three-tier database that offers a comprehensive interface with detailed analysed information, including 114 598 transcripts, 4914 differentially expressed genes, 32 570 predicted simple sequence repeat markers, and 162 850 primers for downstream applications. It also catalogues extensive annotations of bitter gourd-specific single nucleotide polymorphisms/insertions and deletions, long noncoding RNAs, circular RNAs, microRNAs, 1220 transcription factors, 295 transcription regulators, and 146 quantitative trait loci (QTL) distributed throughout the chromosomes. This genomic resource is poised to significantly advance genetic diversity analyses, population and varietal differentiation, and trait optimization. It further facilitates the exploration of regulatory ncRNA elements, key transcripts, and essential transcription factors and regulators. The discovery of QTL will aid in the development of improved bitter gourd varieties in the endeavour of enhanced productivity. Beyond comprehensive datasets, the future integration of multi-omics resources could profoundly advance and fully unlock the potential of databases. Database URL: https://bgdb.daasbioinfromaticsteam.in/index.php.</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":" ","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144583338","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CAS: enhancing implicit constrained data augmentation with semantic enrichment for biomedical relation extraction and beyond. CAS:增强隐式约束数据增强与语义丰富的生物医学关系提取及其他。
IF 3.4 4区 生物学
Database: The Journal of Biological Databases and Curation Pub Date : 2025-07-03 DOI: 10.1093/database/baaf025
Fang-Yi Su, Gia-Han Ngo, Ben Phan, Jung-Hsien Chiang
{"title":"CAS: enhancing implicit constrained data augmentation with semantic enrichment for biomedical relation extraction and beyond.","authors":"Fang-Yi Su, Gia-Han Ngo, Ben Phan, Jung-Hsien Chiang","doi":"10.1093/database/baaf025","DOIUrl":"10.1093/database/baaf025","url":null,"abstract":"<p><p>Biomedical relation extraction often involves datasets with implicit constraints, where structural, syntactic, or semantic rules must be strictly preserved to maintain data integrity. Traditional data augmentation techniques struggle in these scenarios, as they risk violating domain-specific constraints. To address these challenges, we propose CAS (Constrained Augmentation and Semantic-Quality), a novel framework designed for constrained datasets. CAS employs large language models to generate diverse data variations while adhering to predefined rules, and it integrates the SemQ Filter. This self-evaluation mechanism ensures the quality and consistency of augmented data by filtering out noisy or semantically incongruent samples. Although CAS is primarily designed for biomedical relation extraction, its versatile design extends its applicability to tasks with implicit constraints, such as code completion, mathematical reasoning, and information retrieval. Through extensive experiments across multiple domains, CAS demonstrates its ability to enhance model performance by maintaining structural fidelity and semantic accuracy in augmented data. These results highlight the potential of CAS not only in advancing biomedical NLP research but also in addressing data augmentation challenges in diverse constrained-task settings within natural language processing. Database URL: https://github.com/ngogiahan149/CAS.</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2025 ","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12224179/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144552558","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Protein Sequence Analysis landscape: A Systematic Review of Task Types, Databases, Datasets, Word Embeddings Methods, and Language Models. 蛋白质序列分析领域:任务类型、数据库、数据集、词嵌入方法和语言模型的系统回顾。
IF 3.4 4区 生物学
Database: The Journal of Biological Databases and Curation Pub Date : 2025-05-30 DOI: 10.1093/database/baaf027
Muhammad Nabeel Asim, Tayyaba Asif, Faiza Hassan, Andreas Dengel
{"title":"Protein Sequence Analysis landscape: A Systematic Review of Task Types, Databases, Datasets, Word Embeddings Methods, and Language Models.","authors":"Muhammad Nabeel Asim, Tayyaba Asif, Faiza Hassan, Andreas Dengel","doi":"10.1093/database/baaf027","DOIUrl":"10.1093/database/baaf027","url":null,"abstract":"<p><p>Protein sequence analysis examines the order of amino acids within protein sequences to unlock diverse types of a wealth of knowledge about biological processes and genetic disorders. It helps in forecasting disease susceptibility by finding unique protein signatures, or biomarkers that are linked to particular disease states. Protein Sequence analysis through wet-lab experiments is expensive, time-consuming and error prone. To facilitate large-scale proteomics sequence analysis, the biological community is striving for utilizing AI competence for transitioning from wet-lab to computer aided applications. However, Proteomics and AI are two distinct fields and development of AI-driven protein sequence analysis applications requires knowledge of both domains. To bridge the gap between both fields, various review articles have been written. However, these articles focus revolves around few individual tasks or specific applications rather than providing a comprehensive overview about wide tasks and applications. Following the need of a comprehensive literature that presents a holistic view of wide array of tasks and applications, contributions of this manuscript are manifold: It bridges the gap between Proteomics and AI fields by presenting a comprehensive array of AI-driven applications for 63 distinct protein sequence analysis tasks. It equips AI researchers by facilitating biological foundations of 63 protein sequence analysis tasks. It enhances development of AI-driven protein sequence analysis applications by providing comprehensive details of 68 protein databases. It presents a rich data landscape, encompassing 627 benchmark datasets of 63 diverse protein sequence analysis tasks. It highlights the utilization of 25 unique word embedding methods and 13 language models in AI-driven protein sequence analysis applications. It accelerates the development of AI-driven applications by facilitating current state-of-the-art performances across 63 protein sequence analysis tasks.</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2025 ","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12125710/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144191613","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhancing biomedical relation extraction through data-centric and preprocessing-robust ensemble learning approach. 通过以数据为中心和预处理鲁棒集成学习方法增强生物医学关系提取。
IF 3.4 4区 生物学
Database: The Journal of Biological Databases and Curation Pub Date : 2025-05-22 DOI: 10.1093/database/baae127
Wilailack Meesawad, Jen-Chieh Han, Chun-Yu Hsueh, Yu Zhang, Hsi-Chuan Hung, Richard Tzong-Han Tsai
{"title":"Enhancing biomedical relation extraction through data-centric and preprocessing-robust ensemble learning approach.","authors":"Wilailack Meesawad, Jen-Chieh Han, Chun-Yu Hsueh, Yu Zhang, Hsi-Chuan Hung, Richard Tzong-Han Tsai","doi":"10.1093/database/baae127","DOIUrl":"10.1093/database/baae127","url":null,"abstract":"<p><p>The paper describes our biomedical relation extraction system, which is designed to participate in the BioCreative VIII challenge Track 1: BioRED Track, which emphasizes the relation extraction from biomedical literature. Our system employs an ensemble learning method, leveraging the PubTator API in conjunction with multiple pretrained bidirectional encoder representations from transformer (BERT) models. Various preprocessing inputs are incorporated, encompassing prompt questions, entity ID pairs, and co-occurrence contexts. To enhance model comprehension, special tokens and boundary tags are incorporated. Specifically, we utilize PubMedBERT alongside the Max Rule ensemble learning mechanism to amalgamate outputs from diverse classifiers. Our findings surpass the established benchmark score, thereby providing a robust benchmark for evaluating performance in this task. Moreover, our study introduces and demonstrates the effectiveness of a data-centric approach, emphasizing the significance of prioritizing high-quality data instances in enhancing model performance and robustness.</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2025 ","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12097206/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144126742","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An exploratory study combining Virtual Reality and Semantic Web for life science research using Graph2VR. 基于Graph2VR的虚拟现实与语义网在生命科学研究中的结合探索性研究。
IF 3.4 4区 生物学
Database: The Journal of Biological Databases and Curation Pub Date : 2025-05-20 DOI: 10.1093/database/baaf008
Alexander J Kellmann, Sander van den Hoek, Max Postema, W T Kars Maassen, Brenda S Hijmans, Marije A van der Geest, K Joeri van der Velde, Esther J van Enckevort, Morris A Swertz
{"title":"An exploratory study combining Virtual Reality and Semantic Web for life science research using Graph2VR.","authors":"Alexander J Kellmann, Sander van den Hoek, Max Postema, W T Kars Maassen, Brenda S Hijmans, Marije A van der Geest, K Joeri van der Velde, Esther J van Enckevort, Morris A Swertz","doi":"10.1093/database/baaf008","DOIUrl":"https://doi.org/10.1093/database/baaf008","url":null,"abstract":"<p><p>We previously described Graph2VR, a prototype that enables researchers to use virtual reality (VR) to explore and navigate through Linked Data graphs using SPARQL queries (see https://doi.org/10.1093/database/baae008). Here we evaluate the use of Graph2VR in three realistic life science use cases. The first use case visualizes metadata from large-scale multi-center cohort studies across Europe and Canada via the EUCAN Connect catalogue. The second use case involves a set of genomic data from synthetic rare disease patients, which was processed through the Variant Interpretation Pipeline and then converted into Resource Description Format for visualization. The third use case involves enriching a graph with additional information, in this case, the Dutch Anatomical Therapeutic Chemical code Ontology with the DrugID from Drugbank. These examples collectively showcase Graph2VR's potential for data exploration and enrichment, as well as some of its limitations. We conclude that the endless three-dimensional space provided by VR indeed shows much potential for the navigation of very large knowledge graphs, and we provide recommendations for data preparation and VR tooling moving forward. Database URL: https://doi.org/10.1093/database/baaf008.</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2025 ","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144126211","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An exploratory study combining Virtual Reality and Semantic Web for life science research using Graph2VR. 基于Graph2VR的虚拟现实与语义网在生命科学研究中的结合探索性研究。
IF 3.4 4区 生物学
Database: The Journal of Biological Databases and Curation Pub Date : 2025-05-20 DOI: 10.1093/database/baaf008
Alexander J Kellmann, Sander van den Hoek, Max Postema, W T Kars Maassen, Brenda S Hijmans, Marije A van der Geest, K Joeri van der Velde, Esther J van Enckevort, Morris A Swertz
{"title":"An exploratory study combining Virtual Reality and Semantic Web for life science research using Graph2VR.","authors":"Alexander J Kellmann, Sander van den Hoek, Max Postema, W T Kars Maassen, Brenda S Hijmans, Marije A van der Geest, K Joeri van der Velde, Esther J van Enckevort, Morris A Swertz","doi":"10.1093/database/baaf008","DOIUrl":"10.1093/database/baaf008","url":null,"abstract":"<p><p>We previously described Graph2VR, a prototype that enables researchers to use virtual reality (VR) to explore and navigate through Linked Data graphs using SPARQL queries (see https://doi.org/10.1093/database/baae008). Here we evaluate the use of Graph2VR in three realistic life science use cases. The first use case visualizes metadata from large-scale multi-center cohort studies across Europe and Canada via the EUCAN Connect catalogue. The second use case involves a set of genomic data from synthetic rare disease patients, which was processed through the Variant Interpretation Pipeline and then converted into Resource Description Format for visualization. The third use case involves enriching a graph with additional information, in this case, the Dutch Anatomical Therapeutic Chemical code Ontology with the DrugID from Drugbank. These examples collectively showcase Graph2VR's potential for data exploration and enrichment, as well as some of its limitations. We conclude that the endless three-dimensional space provided by VR indeed shows much potential for the navigation of very large knowledge graphs, and we provide recommendations for data preparation and VR tooling moving forward. Database URL: https://doi.org/10.1093/database/baaf008.</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2025 ","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12090995/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144110024","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
GenDiS3 database: census on the prevalence of protein domain superfamilies of known structure in the entire sequence database. GenDiS3数据库:对整个序列数据库中已知结构的蛋白质结构域超家族的流行情况进行普查。
IF 3.4 4区 生物学
Database: The Journal of Biological Databases and Curation Pub Date : 2025-05-09 DOI: 10.1093/database/baaf035
Sarthak Joshi, Shailendu Mohapatra, Dhwani Kumar, Adwait Joshi, Meenakshi Iyer, Ramanathan Sowdhamini
{"title":"GenDiS3 database: census on the prevalence of protein domain superfamilies of known structure in the entire sequence database.","authors":"Sarthak Joshi, Shailendu Mohapatra, Dhwani Kumar, Adwait Joshi, Meenakshi Iyer, Ramanathan Sowdhamini","doi":"10.1093/database/baaf035","DOIUrl":"10.1093/database/baaf035","url":null,"abstract":"<p><p>Despite the vast amount of sequence data available, a significant disparity exists between the number of protein sequences identified and the relatively few structures that have been resolved. This disparity highlights the challenge in structural biology to bridge the gap between sequence information and 3D structural data, and the necessity for robust databases capable of linking distant homologs to known structures. Studies have indicated that there are a limited number of structural folds, despite the vast diversity of proteins. Hence, computational tools can enhance our ability to classify protein sequences, much before their structures are determined or their functions are characterized, thereby bridging the gap between sequence and structural data. GenDiS (Genomic Distribution of Superfamilies) is a repository with information on the genomic distribution of protein domain superfamilies, involving a one-time computational exercise to search for trusted homologs of protein domains of known structures against the vast sequence database. We have updated this database employing advanced bioinformatics tools, including DELTA-BLAST (domain enhanced lookup time accelerated BLAST) for initial detection of hits and HMMSCAN for validation, significantly improving the accuracy of domain identification. Using these tools, over 151 million sequence homologs for 2060 superfamilies [SCOPe (Structural Classification of Proteins extended)] were identified and 116 million out of them were validated as true positives. Through a case study on glycolysis-related enzymes, variations in domain architectures of these enzymes are explored, revealing evolutionary changes and functional diversity among these essential proteins. We present another case, LOG gene, where one can tune in and find significant mutations across the evolutionary lineage. The GenDiS database, GenDiS3, and the associated tools made available at https://caps.ncbs.res.in/gendis3/ offer a powerful resource for researchers in functional annotation and evolutionary studies. Database URL: https://caps.ncbs.res.in/gendis3/.</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2025 ","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12063530/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143978712","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信