{"title":"SoDCoD: a comprehensive database of Cu/Zn superoxide dismutase conformational diversity caused by ALS-linked gene mutations and other perturbations.","authors":"Riko Tabuchi, Yurika Momozawa, Yuki Hayashi, Hisashi Noma, Hidenori Ichijo, Takao Fujisawa","doi":"10.1093/database/baae064","DOIUrl":"10.1093/database/baae064","url":null,"abstract":"<p><p>A structural alteration in copper/zinc superoxide dismutase (SOD1) is one of the common features caused by amyotrophic lateral sclerosis (ALS)-linked mutations. Although a large number of SOD1 variants have been reported in ALS patients, the detailed structural properties of each variant are not well summarized. We present SoDCoD, a database of superoxide dismutase conformational diversity, collecting our comprehensive biochemical analyses of the structural changes in SOD1 caused by ALS-linked gene mutations and other perturbations. SoDCoD version 1.0 contains information about the properties of 188 types of SOD1 mutants, including structural changes and their binding to Derlin-1, as well as a set of genes contributing to the proteostasis of mutant-like wild-type SOD1. This database provides valuable insights into the diagnosis and treatment of ALS, particularly by targeting conformational alterations in SOD1. Database URL: https://fujisawagroup.github.io/SoDCoDweb/.</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":null,"pages":null},"PeriodicalIF":3.4,"publicationDate":"2024-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11315765/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141912108","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rezarta Islamaj, Po-Ting Lai, Chih-Hsuan Wei, Ling Luo, Tiago Almeida, Richard A A Jonker, Sofia I R Conceição, Diana F Sousa, Cong-Phuoc Phan, Jung-Hsien Chiang, Jiru Li, Dinghao Pan, Wilailack Meesawad, Richard Tzong-Han Tsai, M Janina Sarol, Gibong Hong, Airat Valiev, Elena Tutubalina, Shao-Man Lee, Yi-Yu Hsu, Mingjie Li, Karin Verspoor, Zhiyong Lu
{"title":"The overview of the BioRED (Biomedical Relation Extraction Dataset) track at BioCreative VIII.","authors":"Rezarta Islamaj, Po-Ting Lai, Chih-Hsuan Wei, Ling Luo, Tiago Almeida, Richard A A Jonker, Sofia I R Conceição, Diana F Sousa, Cong-Phuoc Phan, Jung-Hsien Chiang, Jiru Li, Dinghao Pan, Wilailack Meesawad, Richard Tzong-Han Tsai, M Janina Sarol, Gibong Hong, Airat Valiev, Elena Tutubalina, Shao-Man Lee, Yi-Yu Hsu, Mingjie Li, Karin Verspoor, Zhiyong Lu","doi":"10.1093/database/baae069","DOIUrl":"10.1093/database/baae069","url":null,"abstract":"<p><p>The BioRED track at BioCreative VIII calls for a community effort to identify, semantically categorize, and highlight the novelty factor of the relationships between biomedical entities in unstructured text. Relation extraction is crucial for many biomedical natural language processing (NLP) applications, from drug discovery to custom medical solutions. The BioRED track simulates a real-world application of biomedical relationship extraction, and as such, considers multiple biomedical entity types, normalized to their specific corresponding database identifiers, as well as defines relationships between them in the documents. The challenge consisted of two subtasks: (i) in Subtask 1, participants were given the article text and human expert annotated entities, and were asked to extract the relation pairs, identify their semantic type and the novelty factor, and (ii) in Subtask 2, participants were given only the article text, and were asked to build an end-to-end system that could identify and categorize the relationships and their novelty. We received a total of 94 submissions from 14 teams worldwide. The highest F-score performances achieved for the Subtask 1 were: 77.17% for relation pair identification, 58.95% for relation type identification, 59.22% for novelty identification, and 44.55% when evaluating all of the above aspects of the comprehensive relation extraction. The highest F-score performances achieved for the Subtask 2 were: 55.84% for relation pair, 43.03% for relation type, 42.74% for novelty, and 32.75% for comprehensive relation extraction. The entire BioRED track dataset and other challenge materials are available at https://ftp.ncbi.nlm.nih.gov/pub/lu/BC8-BioRED-track/ and https://codalab.lisn.upsaclay.fr/competitions/13377 and https://codalab.lisn.upsaclay.fr/competitions/13378. Database URL: https://ftp.ncbi.nlm.nih.gov/pub/lu/BC8-BioRED-track/https://codalab.lisn.upsaclay.fr/competitions/13377https://codalab.lisn.upsaclay.fr/competitions/13378.</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":null,"pages":null},"PeriodicalIF":3.4,"publicationDate":"2024-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11306928/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141901212","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sumit Madan, Lisa Kühnel, Holger Fröhlich, Martin Hofmann-Apitius, Juliane Fluck
{"title":"Dataset of miRNA-disease relations extracted from textual data using transformer-based neural networks.","authors":"Sumit Madan, Lisa Kühnel, Holger Fröhlich, Martin Hofmann-Apitius, Juliane Fluck","doi":"10.1093/database/baae066","DOIUrl":"10.1093/database/baae066","url":null,"abstract":"<p><p>MicroRNAs (miRNAs) play important roles in post-transcriptional processes and regulate major cellular functions. The abnormal regulation of expression of miRNAs has been linked to numerous human diseases such as respiratory diseases, cancer, and neurodegenerative diseases. Latest miRNA-disease associations are predominantly found in unstructured biomedical literature. Retrieving these associations manually can be cumbersome and time-consuming due to the continuously expanding number of publications. We propose a deep learning-based text mining approach that extracts normalized miRNA-disease associations from biomedical literature. To train the deep learning models, we build a new training corpus that is extended by distant supervision utilizing multiple external databases. A quantitative evaluation shows that the workflow achieves an area under receiver operator characteristic curve of 98% on a holdout test set for the detection of miRNA-disease associations. We demonstrate the applicability of the approach by extracting new miRNA-disease associations from biomedical literature (PubMed and PubMed Central). We have shown through quantitative analysis and evaluation on three different neurodegenerative diseases that our approach can effectively extract miRNA-disease associations not yet available in public databases. Database URL: https://zenodo.org/records/10523046.</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":null,"pages":null},"PeriodicalIF":3.4,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11300841/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141893078","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chunhui Xu, Trey Shaw, Sai Akhil Choppararu, Yiwei Lu, Shaik Naveed Farooq, Yongfang Qin, Matt Hudson, Brock Weekley, Michael Fisher, Fei He, Jose Roberto Da Silva Nascimento, Nicholas Wergeles, Trupti Joshi, Philip D Bates, Abraham J Koo, Doug K Allen, Edgar B Cahoon, Jay J Thelen, Dong Xu
{"title":"FatPlants: a comprehensive information system for lipid-related genes and metabolic pathways in plants.","authors":"Chunhui Xu, Trey Shaw, Sai Akhil Choppararu, Yiwei Lu, Shaik Naveed Farooq, Yongfang Qin, Matt Hudson, Brock Weekley, Michael Fisher, Fei He, Jose Roberto Da Silva Nascimento, Nicholas Wergeles, Trupti Joshi, Philip D Bates, Abraham J Koo, Doug K Allen, Edgar B Cahoon, Jay J Thelen, Dong Xu","doi":"10.1093/database/baae074","DOIUrl":"10.1093/database/baae074","url":null,"abstract":"<p><p>FatPlants, an open-access, web-based database, consolidates data, annotations, analysis results, and visualizations of lipid-related genes, proteins, and metabolic pathways in plants. Serving as a minable resource, FatPlants offers a user-friendly interface for facilitating studies into the regulation of plant lipid metabolism and supporting breeding efforts aimed at increasing crop oil content. This web resource, developed using data derived from our own research, curated from public resources, and gleaned from academic literature, comprises information on known fatty-acid-related proteins, genes, and pathways in multiple plants, with an emphasis on Glycine max, Arabidopsis thaliana, and Camelina sativa. Furthermore, the platform includes machine-learning based methods and navigation tools designed to aid in characterizing metabolic pathways and protein interactions. Comprehensive gene and protein information cards, a Basic Local Alignment Search Tool search function, similar structure search capacities from AphaFold, and ChatGPT-based query for protein information are additional features. Database URL: https://www.fatplants.net/.</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":null,"pages":null},"PeriodicalIF":3.4,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11300840/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141893079","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Richard A A Jonker, Tiago Almeida, Rui Antunes, João R Almeida, Sérgio Matos
{"title":"Multi-head CRF classifier for biomedical multi-class named entity recognition on Spanish clinical notes.","authors":"Richard A A Jonker, Tiago Almeida, Rui Antunes, João R Almeida, Sérgio Matos","doi":"10.1093/database/baae068","DOIUrl":"10.1093/database/baae068","url":null,"abstract":"<p><p>The identification of medical concepts from clinical narratives has a large interest in the biomedical scientific community due to its importance in treatment improvements or drug development research. Biomedical named entity recognition (NER) in clinical texts is crucial for automated information extraction, facilitating patient record analysis, drug development, and medical research. Traditional approaches often focus on single-class NER tasks, yet recent advancements emphasize the necessity of addressing multi-class scenarios, particularly in complex biomedical domains. This paper proposes a strategy to integrate a multi-head conditional random field (CRF) classifier for multi-class NER in Spanish clinical documents. Our methodology overcomes overlapping entity instances of different types, a common challenge in traditional NER methodologies, by using a multi-head CRF model. This architecture enhances computational efficiency and ensures scalability for multi-class NER tasks, maintaining high performance. By combining four diverse datasets, SympTEMIST, MedProcNER, DisTEMIST, and PharmaCoNER, we expand the scope of NER to encompass five classes: symptoms, procedures, diseases, chemicals, and proteins. To the best of our knowledge, these datasets combined create the largest Spanish multi-class dataset focusing on biomedical entity recognition and linking for clinical notes, which is important to train a biomedical model in Spanish. We also provide entity linking to the multi-lingual Systematized Nomenclature of Medicine Clinical Terms (SNOMED CT) vocabulary, with the eventual goal of performing biomedical relation extraction. Through experimentation and evaluation of Spanish clinical documents, our strategy provides competitive results against single-class NER models. For NER, our system achieves a combined micro-averaged F1-score of 78.73, with clinical mentions normalized to SNOMED CT with an end-to-end F1-score of 54.51. The code to run our system is publicly available at https://github.com/ieeta-pt/Multi-Head-CRF. Database URL: https://github.com/ieeta-pt/Multi-Head-CRF.</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":null,"pages":null},"PeriodicalIF":3.4,"publicationDate":"2024-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11290360/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141859304","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Improving biomedical entity linking for complex entity mentions with LLM-based text simplification.","authors":"Florian Borchert, Ignacio Llorca, Matthieu-P Schapranow","doi":"10.1093/database/baae067","DOIUrl":"10.1093/database/baae067","url":null,"abstract":"<p><p>Large amounts of important medical information are captured in free-text documents in biomedical research and within healthcare systems, which can be made accessible through natural language processing (NLP). A key component in most biomedical NLP pipelines is entity linking, i.e. grounding textual mentions of named entities to a reference of medical concepts, usually derived from a terminology system, such as the Systematized Nomenclature of Medicine Clinical Terms. However, complex entity mentions, spanning multiple tokens, are notoriously hard to normalize due to the difficulty of finding appropriate candidate concepts. In this work, we propose an approach to preprocess such mentions for candidate generation, building upon recent advances in text simplification with generative large language models. We evaluate the feasibility of our method in the context of the entity linking track of the BioCreative VIII SympTEMIST shared task. We find that instructing the latest Generative Pre-trained Transformer model with a few-shot prompt for text simplification results in mention spans that are easier to normalize. Thus, we can improve recall during candidate generation by 2.9 percentage points compared to our baseline system, which achieved the best score in the original shared task evaluation. Furthermore, we show that this improvement in recall can be fully translated into top-1 accuracy through careful initialization of a subsequent reranking model. Our best system achieves an accuracy of 63.6% on the SympTEMIST test set. The proposed approach has been integrated into the open-source xMEN toolkit, which is available online via https://github.com/hpi-dhc/xmen.</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":null,"pages":null},"PeriodicalIF":3.4,"publicationDate":"2024-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11281847/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141765753","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"CO-19 PDB 2.0: A Comprehensive COVID-19 Database with Global Auto-Alerts, Statistical Analysis, and Cancer Correlations.","authors":"Shahid Ullah, Yingmei Li, Wajeeha Rahman, Farhan Ullah, Muhammad Ijaz, Anees Ullah, Gulzar Ahmad, Hameed Ullah, Tianshun Gao","doi":"10.1093/database/baae072","DOIUrl":"10.1093/database/baae072","url":null,"abstract":"<p><p>Biological databases serve as critical basics for modern research, and amid the dynamic landscape of biology, the COVID-19 database has emerged as an indispensable resource. The global outbreak of Covid-19, commencing in December 2019, necessitates comprehensive databases to unravel the intricate connections between this novel virus and cancer. Despite existing databases, a crucial need persists for a centralized and accessible method to acquire precise information within the research community. The main aim of the work is to develop a database which has all the COVID-19-related data available in just one click with auto global notifications. This gap is addressed by the meticulously designed COVID-19 Pandemic Database (CO-19 PDB 2.0), positioned as a comprehensive resource for researchers navigating the complexities of COVID-19 and cancer. Between December 2019 and June 2024, the CO-19 PDB 2.0 systematically collected and organized 120 datasets into six distinct categories, each catering to specific functionalities. These categories encompass a chemical structure database, a digital image database, a visualization tool database, a genomic database, a social science database, and a literature database. Functionalities range from image analysis and gene sequence information to data visualization and updates on environmental events. CO-19 PDB 2.0 has the option to choose either the search page for the database or the autonotification page, providing a seamless retrieval of information. The dedicated page introduces six predefined charts, providing insights into crucial criteria such as the number of cases and deaths', country-wise distribution, 'new cases and recovery', and rates of death and recovery. The global impact of COVID-19 on cancer patients has led to extensive collaboration among research institutions, producing numerous articles and computational studies published in international journals. A key feature of this initiative is auto daily notifications for standardized information updates. Users can easily navigate based on different categories or use a direct search option. The study offers up-to-date COVID-19 datasets and global statistics on COVID-19 and cancer, highlighting the top 10 cancers diagnosed in the USA in 2022. Breast and prostate cancers are the most common, representing 30% and 26% of new cases, respectively. The initiative also ensures the removal or replacement of dead links, providing a valuable resource for researchers, healthcare professionals, and individuals. The database has been implemented in PHP, HTML, CSS and MySQL and is available freely at https://www.co-19pdb.habdsk.org/. Database URL: https://www.co-19pdb.habdsk.org/.</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":null,"pages":null},"PeriodicalIF":3.4,"publicationDate":"2024-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11281848/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141765713","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Enio Gjerga, Matthias Dewenter, Thiago Britto-Borges, Johannes Grosso, Frank Stein, Jessica Eschenbach, Mandy Rettel, Johannes Backs, Christoph Dieterich
{"title":"Transverse aortic constriction multi-omics analysis uncovers pathophysiological cardiac molecular mechanisms.","authors":"Enio Gjerga, Matthias Dewenter, Thiago Britto-Borges, Johannes Grosso, Frank Stein, Jessica Eschenbach, Mandy Rettel, Johannes Backs, Christoph Dieterich","doi":"10.1093/database/baae060","DOIUrl":"10.1093/database/baae060","url":null,"abstract":"<p><p>Time-course multi-omics data of a murine model of progressive heart failure (HF) induced by transverse aortic constriction (TAC) provide insights into the molecular mechanisms that are causatively involved in contractile failure and structural cardiac remodelling. We employ Illumina-based transcriptomics, Nanopore sequencing and mass spectrometry-based proteomics on samples from the left ventricle (LV) and right ventricle (RV, RNA only) of the heart at 1, 7, 21 and 56 days following TAC and Sham surgery. Here, we present Transverse Aortic COnstriction Multi-omics Analysis (TACOMA), as an interactive web application that integrates and visualizes transcriptomics and proteomics data collected in a TAC time-course experiment. TACOMA enables users to visualize the expression profile of known and novel genes and protein products thereof. Importantly, we capture alternative splicing events by assessing differential transcript and exon usage as well. Co-expression-based clustering algorithms and functional enrichment analysis revealed overrepresented annotations of biological processes and molecular functions at the protein and gene levels. To enhance data integration, TACOMA synchronizes transcriptomics and proteomics profiles, enabling cross-omics comparisons. With TACOMA (https://shiny.dieterichlab.org/app/tacoma), we offer a rich web-based resource to uncover molecular events and biological processes implicated in contractile failure and cardiac hypertrophy. For example, we highlight: (i) changes in metabolic genes and proteins in the time course of hypertrophic growth and contractile impairment; (ii) identification of RNA splicing changes in the expression of Tpm2 isoforms between RV and LV; and (iii) novel transcripts and genes likely contributing to the pathogenesis of HF. We plan to extend these data with additional environmental and genetic models of HF to decipher common and distinct molecular changes in heart diseases of different aetiologies. Database URL: https://shiny.dieterichlab.org/app/tacoma.</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":null,"pages":null},"PeriodicalIF":3.4,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11270014/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141757630","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Laura Krumpholz, Aleksandra Klimczyk, Wiktoria Bieniek, Sebastian Polak, Barbara Wiśniowska
{"title":"Data set of fraction unbound values in the in vitro incubations for metabolic studies for better prediction of human clearance.","authors":"Laura Krumpholz, Aleksandra Klimczyk, Wiktoria Bieniek, Sebastian Polak, Barbara Wiśniowska","doi":"10.1093/database/baae063","DOIUrl":"10.1093/database/baae063","url":null,"abstract":"<p><p>In vitro-in vivo extrapolation is a commonly applied technique for liver clearance prediction. Various in vitro models are available such as hepatocytes, human liver microsomes, or recombinant cytochromes P450. According to the free drug theory, only the unbound fraction (fu) of a chemical can undergo metabolic changes. Therefore, to ensure the reliability of predictions, both specific and nonspecific binding in the model should be accounted. However, the fraction unbound in the experiment is often not reported. The study aimed to provide a detailed repository of the literature data on the compound's fu value in various in vitro systems used for drug metabolism evaluation and corresponding human plasma binding levels. Data on the free fraction in plasma and different in vitro models were supplemented with the following information: the experimental method used for the assessment of the degree of drug binding, protein or cell concentration in the incubation, and other experimental conditions, if different from the standard ones, species, reference to the source publication, and the author's name and date of publication. In total, we collected 129 literature studies on 1425 different compounds. The provided data set can be used as a reference for scientists involved in pharmacokinetic/physiologically based pharmacokinetic modelling as well as researchers interested in Quantitative Structure-Activity Relationship models for the prediction of fraction unbound based on compound structure. Database URL: https://data.mendeley.com/datasets/3bs5526htd/1.</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":null,"pages":null},"PeriodicalIF":3.4,"publicationDate":"2024-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11269425/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141757629","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ofer Isakov, Dina Marek-Yagel, Rotem Greenberg, Michal Naftali, Shay Ben-Shachar
{"title":"PANGEN: an online platform for the comparison and creation of diagnostic gene panels.","authors":"Ofer Isakov, Dina Marek-Yagel, Rotem Greenberg, Michal Naftali, Shay Ben-Shachar","doi":"10.1093/database/baae065","DOIUrl":"10.1093/database/baae065","url":null,"abstract":"<p><p>Targeted gene panel sequencing is used to limit the search for causative genetic variants solely to genes with an established association with the phenotype. The design of gene panels is challenging due to the lack of consensus regarding phenotypic associations for some genes, which results in high variation in gene composition for the same panel offered by different laboratories. We developed PANGEN, a platform that provides a centralized resource for gene panel information, with the ability to compare and generate new intelligent diagnostic panels. Gene-phenotype associations were collected from 12 public and commercial sources (Blueprint, Cegat, Centogene, ClinGen, Fulgent, GeneDx, Health in Code, Human Phenotype Ontology, Invitae, PanelApp, Prevention genetics, and Pronto diagnostics). Gene-phenotype associations are categorized into tiers according to categories derived from the original source panel. Pairwise panel similarity was calculated by dividing the number of common genes by the total number of genes in both panels. Regions with extreme guanine-cytosine (GC) content were collected from the Genome in a Bottle stratifications dataset, and putative genomic duplications were retrieved from the University of Santa Cruz database. Overall, 1533 panels, 9759 phenotypes, and 6979 genes were collected. The platform provides an interface to (i) explore and compare collected panels, (ii) find similar panels, (iii) identify genes with high GC content or duplication levels, (iv) generate gene panels by combining panels from various sources, and (v) stratify a generated panel into genes with a strong phenotype association ('core') and those with a weaker association ('extended'). The presented platform represents a unique resource for gene panel exploration and comparison that facilitates the generation of tailored diagnostic panels through a public online web server. Database URL: https://c-gc.shinyapps.io/PANGEN/.</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":null,"pages":null},"PeriodicalIF":3.4,"publicationDate":"2024-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11265858/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141751328","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}