Genomics, Proteomics & Bioinformatics最新文献

筛选
英文 中文
A Beloved Bioinformatician Buddy—In Memory of Professor Weimin Zhu 敬爱的生物信息学伙伴——纪念朱为民教授
IF 9.5 2区 生物学
Genomics, Proteomics & Bioinformatics Pub Date : 2022-12-01 DOI: 10.1016/j.gpb.2022.12.006
Yixue Li
{"title":"A Beloved Bioinformatician Buddy—In Memory of Professor Weimin Zhu","authors":"Yixue Li","doi":"10.1016/j.gpb.2022.12.006","DOIUrl":"10.1016/j.gpb.2022.12.006","url":null,"abstract":"","PeriodicalId":12528,"journal":{"name":"Genomics, Proteomics & Bioinformatics","volume":"20 6","pages":"Pages 1037-1039"},"PeriodicalIF":9.5,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10225480/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9595396","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-omics Analyses Provide Insight into the Biosynthesis Pathways of Fucoxanthin in Isochrysis galbana 多组学分析揭示了岩藻黄素在褐藻等溶酶中的生物合成途径
IF 9.5 2区 生物学
Genomics, Proteomics & Bioinformatics Pub Date : 2022-12-01 DOI: 10.1016/j.gpb.2022.05.010
Duo Chen , Xue Yuan , Xuehai Zheng , Jingping Fang , Gang Lin , Rongmao Li , Jiannan Chen , Wenjin He , Zhen Huang , Wenfang Fan , Limin Liang , Chentao Lin , Jinmao Zhu , Youqiang Chen , Ting Xue
{"title":"Multi-omics Analyses Provide Insight into the Biosynthesis Pathways of Fucoxanthin in Isochrysis galbana","authors":"Duo Chen ,&nbsp;Xue Yuan ,&nbsp;Xuehai Zheng ,&nbsp;Jingping Fang ,&nbsp;Gang Lin ,&nbsp;Rongmao Li ,&nbsp;Jiannan Chen ,&nbsp;Wenjin He ,&nbsp;Zhen Huang ,&nbsp;Wenfang Fan ,&nbsp;Limin Liang ,&nbsp;Chentao Lin ,&nbsp;Jinmao Zhu ,&nbsp;Youqiang Chen ,&nbsp;Ting Xue","doi":"10.1016/j.gpb.2022.05.010","DOIUrl":"10.1016/j.gpb.2022.05.010","url":null,"abstract":"<div><p><strong><em>Isochrysis galbana</em></strong> is considered an ideal bait for functional foods and nutraceuticals of humans because of its high <strong>fucoxanthin</strong> (Fx) content. However, multi-omics analysis of the regulatory networks for Fx biosynthesis in <em>I</em>. <em>galbana</em> has not been reported. In this study, we report a high-quality genome assembly of <em>I</em>. <em>galbana</em> LG007, which has a genome size of 92.73 Mb, with a contig N50 of 6.99 Mb and 14,900 protein-coding genes. Phylogenetic analysis confirmed the monophyly of Haptophyta, with <em>I</em>. <em>galbana</em> sister to <em>Emiliania huxleyi</em> and <em>Chrysochromulina tobinii.</em> Evolutionary analysis revealed an estimated divergence time between <em>I</em>. <em>galbana</em> and <em>E. huxleyi</em> of ∼ 133 million years ago. Gene family analysis indicated that lipid metabolism-related genes exhibited significant expansion, including <em>IgPLMT</em>, <em>IgOAR1</em>, and <em>IgDEGS1</em>. <strong>Metabolome</strong> analysis showed that the content of carotenoids in <em>I</em>. <em>galbana</em> cultured under green light for 7 days was higher than that under white light, and β-carotene was the main carotenoid, accounting for 79.09% of the total carotenoids. Comprehensive multi-omics analysis revealed that the content of β-carotene, antheraxanthin, zeaxanthin, and Fx was increased by green light induction, which was significantly correlated with the expression of <em>IgMYB98</em>, <em>IgZDS</em>, <em>IgPDS</em>, <em>IgLHCX2</em>, <em>IgZEP</em>, <em>IgLCYb</em>, and <em>IgNSY</em>. These findings contribute to the understanding of Fx biosynthesis and its regulation, providing a valuable reference for food and pharmaceutical applications.</p></div>","PeriodicalId":12528,"journal":{"name":"Genomics, Proteomics & Bioinformatics","volume":"20 6","pages":"Pages 1138-1153"},"PeriodicalIF":9.5,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10225490/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9896384","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
JAX-CNV: A Whole-genome Sequencing-based Algorithm for Copy Number Detection at Clinical Grade Level JAX-CNV:一种基于全基因组测序的临床级拷贝数检测算法
IF 9.5 2区 生物学
Genomics, Proteomics & Bioinformatics Pub Date : 2022-12-01 DOI: 10.1016/j.gpb.2021.06.003
Wan-Ping Lee , Qihui Zhu , Xiaofei Yang , Silvia Liu , Eliza Cerveira , Mallory Ryan , Adam Mil-Homens , Lauren Bellfy , Kai Ye , Charles Lee , Chengsheng Zhang
{"title":"JAX-CNV: A Whole-genome Sequencing-based Algorithm for Copy Number Detection at Clinical Grade Level","authors":"Wan-Ping Lee ,&nbsp;Qihui Zhu ,&nbsp;Xiaofei Yang ,&nbsp;Silvia Liu ,&nbsp;Eliza Cerveira ,&nbsp;Mallory Ryan ,&nbsp;Adam Mil-Homens ,&nbsp;Lauren Bellfy ,&nbsp;Kai Ye ,&nbsp;Charles Lee ,&nbsp;Chengsheng Zhang","doi":"10.1016/j.gpb.2021.06.003","DOIUrl":"10.1016/j.gpb.2021.06.003","url":null,"abstract":"<div><p>We aimed to develop a <strong>whole-genome sequencing</strong> (WGS)-based <strong>copy number variant</strong> (CNV) calling algorithm with the potential of replacing <strong>chromosomal microarray assay</strong> (CMA) for clinical diagnosis. <strong>JAX-CNV</strong> is thus developed for CNV detection from WGS data. The performance of this CNV calling algorithm was evaluated in a blinded manner on 31 samples and compared to the 112 CNVs reported by clinically validated CMAs for these 31 samples. The result showed that JAX-CNV recalled 100% of these CNVs. Besides, JAX-CNV identified an average of 30 CNVs per individual, respresenting an approximately seven-fold increase compared to calls of clinically validated CMAs. Experimental validation of 24 randomly selected CNVs showed one false positive, <em>i.e.</em>, a false discovery rate (FDR) of 4.17%. A robustness test on lower-coverage data revealed a 100% sensitivity for CNVs larger than 300 kb (the current threshold for College of American Pathologists) down to 10× coverage. For CNVs larger than 50 kb, sensitivities were 100% for coverages deeper than 20×, 97% for 15×, and 95% for 10×. We developed a WGS-based CNV pipeline, including this newly developed CNV caller JAX-CNV, and found it capable of detecting CMA-reported CNVs at a sensitivity of 100% with about a FDR of 4%. We propose that JAX-CNV could be further examined in a multi-institutional study to justify the transition of first-tier <strong>genetic testing</strong> from CMAs to WGS. JAX-CNV is available at <span>https://github.com/TheJacksonLaboratory/JAX-CNV</span><svg><path></path></svg>.</p></div>","PeriodicalId":12528,"journal":{"name":"Genomics, Proteomics & Bioinformatics","volume":"20 6","pages":"Pages 1197-1206"},"PeriodicalIF":9.5,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10225484/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9545849","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Machine Learning Modeling of Protein-intrinsic Features Predicts Tractability of Targeted Protein Degradation 蛋白质内在特征的机器学习建模预测目标蛋白质降解的可追溯性
IF 9.5 2区 生物学
Genomics, Proteomics & Bioinformatics Pub Date : 2022-10-01 DOI: 10.1016/j.gpb.2022.11.008
Wubing Zhang , Shourya S. Roy Burman , Jiaye Chen , Katherine A. Donovan , Yang Cao , Chelsea Shu , Boning Zhang , Zexian Zeng , Shengqing Gu , Yi Zhang , Dian Li , Eric S. Fischer , Collin Tokheim , X. Shirley Liu
{"title":"Machine Learning Modeling of Protein-intrinsic Features Predicts Tractability of Targeted Protein Degradation","authors":"Wubing Zhang ,&nbsp;Shourya S. Roy Burman ,&nbsp;Jiaye Chen ,&nbsp;Katherine A. Donovan ,&nbsp;Yang Cao ,&nbsp;Chelsea Shu ,&nbsp;Boning Zhang ,&nbsp;Zexian Zeng ,&nbsp;Shengqing Gu ,&nbsp;Yi Zhang ,&nbsp;Dian Li ,&nbsp;Eric S. Fischer ,&nbsp;Collin Tokheim ,&nbsp;X. Shirley Liu","doi":"10.1016/j.gpb.2022.11.008","DOIUrl":"10.1016/j.gpb.2022.11.008","url":null,"abstract":"<div><p><strong>Targeted protein degradation</strong> (TPD) has rapidly emerged as a therapeutic modality to eliminate previously undruggable proteins by repurposing the cell’s endogenous protein degradation machinery. However, the susceptibility of proteins for targeting by TPD approaches, termed “<strong>degradability</strong>”, is largely unknown. Here, we developed a <strong>machine learning</strong> model, model-free analysis of protein degradability (MAPD), to predict degradability from features intrinsic to protein targets. MAPD shows accurate performance in predicting kinases that are degradable by TPD compounds [with an area under the precision–recall curve (AUPRC) of 0.759 and an area under the receiver operating characteristic curve (AUROC) of 0.775] and is likely generalizable to independent non-kinase proteins. We found five features with statistical significance to achieve optimal prediction, with <strong>ubiquitination</strong> potential being the most predictive. By structural modeling, we found that E2-accessible ubiquitination sites, but not lysine residues in general, are particularly associated with kinase degradability. Finally, we extended MAPD predictions to the entire proteome to find 964 disease-causing proteins (including proteins encoded by 278 cancer genes) that may be tractable to TPD drug development.</p></div>","PeriodicalId":12528,"journal":{"name":"Genomics, Proteomics & Bioinformatics","volume":"20 5","pages":"Pages 882-898"},"PeriodicalIF":9.5,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/cc/c8/main.PMC10025769.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9159819","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DGMP: Identifying Cancer Driver Genes by Jointing DGCN and MLP from Multi-omics Genomic Data DGMP:从多组学基因组数据中通过连接DGCN和MLP来鉴定癌症驱动基因
IF 9.5 2区 生物学
Genomics, Proteomics & Bioinformatics Pub Date : 2022-10-01 DOI: 10.1016/j.gpb.2022.11.004
Shao-Wu Zhang, Jing-Yu Xu, Tong Zhang
{"title":"DGMP: Identifying Cancer Driver Genes by Jointing DGCN and MLP from Multi-omics Genomic Data","authors":"Shao-Wu Zhang,&nbsp;Jing-Yu Xu,&nbsp;Tong Zhang","doi":"10.1016/j.gpb.2022.11.004","DOIUrl":"10.1016/j.gpb.2022.11.004","url":null,"abstract":"<div><p>Identification of cancer <strong>driver genes</strong> plays an important role in precision oncology research, which is helpful to understand cancer initiation and progression. However, most existing computational methods mainly used the protein–protein interaction (PPI) networks, or treated the directed <strong>gene regulatory networks</strong> (GRNs) as the undirected gene–gene association networks to identify the cancer driver genes, which will lose the unique structure regulatory information in the directed GRNs, and then affect the outcome of the cancer driver gene identification. Here, based on the multi-omics pan-cancer data (<em>i.e.</em>, gene expression, mutation, copy number variation, and DNA methylation), we propose a novel method (called DGMP) to identify cancer driver genes by jointing <strong>directed graph convolutional network</strong> (DGCN) and <strong>multilayer perceptron</strong> (MLP). DGMP learns the multi-omics features of genes as well as the topological structure features in GRN with the DGCN model and uses MLP to weigh more on gene features for mitigating the bias toward the graph topological features in the DGCN learning process. The results on three GRNs show that DGMP outperforms other existing state-of-the-art methods. The ablation experimental results on the DawnNet network indicate that introducing MLP into DGCN can offset the performance degradation of DGCN, and jointing MLP and DGCN can effectively improve the performance of identifying cancer driver genes. DGMP can identify not only the highly mutated cancer driver genes but also the driver genes harboring other kinds of alterations (<em>e.g.</em>, differential expression and aberrant DNA methylation) or genes involved in GRNs with other cancer genes. The source code of DGMP can be freely downloaded from <span>https://github.com/NWPU-903PR/DGMP</span><svg><path></path></svg>.</p></div>","PeriodicalId":12528,"journal":{"name":"Genomics, Proteomics & Bioinformatics","volume":"20 5","pages":"Pages 928-938"},"PeriodicalIF":9.5,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/17/4d/main.PMC10025764.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9159807","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Application of Deep Learning on Single-cell RNA Sequencing Data Analysis: A Review 深度学习在单细胞RNA测序数据分析中的应用综述
IF 9.5 2区 生物学
Genomics, Proteomics & Bioinformatics Pub Date : 2022-10-01 DOI: 10.1016/j.gpb.2022.11.011
Matthew Brendel , Chang Su , Zilong Bai , Hao Zhang , Olivier Elemento , Fei Wang
{"title":"Application of Deep Learning on Single-cell RNA Sequencing Data Analysis: A Review","authors":"Matthew Brendel ,&nbsp;Chang Su ,&nbsp;Zilong Bai ,&nbsp;Hao Zhang ,&nbsp;Olivier Elemento ,&nbsp;Fei Wang","doi":"10.1016/j.gpb.2022.11.011","DOIUrl":"10.1016/j.gpb.2022.11.011","url":null,"abstract":"<div><p><strong>Single-cell RNA sequencing</strong> (scRNA-seq) has become a routinely used technique to quantify the gene expression profile of thousands of single cells simultaneously. Analysis of scRNA-seq data plays an important role in the study of cell states and phenotypes, and has helped elucidate biological processes, such as those occurring during the development of complex organisms, and improved our understanding of disease states, such as cancer, diabetes, and coronavirus disease 2019 (COVID-19). <strong>Deep learning</strong>, a recent advance of <strong>artificial intelligence</strong> that has been used to address many problems involving large datasets, has also emerged as a promising tool for scRNA-seq data analysis, as it has a capacity to extract informative and compact features from noisy, heterogeneous, and high-dimensional scRNA-seq data to improve downstream analysis. The present review aims at surveying recently developed deep learning techniques in scRNA-seq data analysis, identifying key steps within the scRNA-seq data analysis pipeline that have been advanced by deep learning, and explaining the benefits of deep learning over more conventional analytic tools. Finally, we summarize the challenges in current deep learning approaches faced within scRNA-seq data and discuss potential directions for improvements in deep learning algorithms for scRNA-seq data analysis.</p></div>","PeriodicalId":12528,"journal":{"name":"Genomics, Proteomics & Bioinformatics","volume":"20 5","pages":"Pages 814-835"},"PeriodicalIF":9.5,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10025684/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9209084","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
TIST: Transcriptome and Histopathological Image Integrative Analysis for Spatial Transcriptomics 空间转录组学的转录组学和组织病理学图像整合分析
IF 9.5 2区 生物学
Genomics, Proteomics & Bioinformatics Pub Date : 2022-10-01 DOI: 10.1016/j.gpb.2022.11.012
Yiran Shan , Qian Zhang , Wenbo Guo , Yanhong Wu , Yuxin Miao , Hongyi Xin , Qiuyu Lian , Jin Gu
{"title":"TIST: Transcriptome and Histopathological Image Integrative Analysis for Spatial Transcriptomics","authors":"Yiran Shan ,&nbsp;Qian Zhang ,&nbsp;Wenbo Guo ,&nbsp;Yanhong Wu ,&nbsp;Yuxin Miao ,&nbsp;Hongyi Xin ,&nbsp;Qiuyu Lian ,&nbsp;Jin Gu","doi":"10.1016/j.gpb.2022.11.012","DOIUrl":"10.1016/j.gpb.2022.11.012","url":null,"abstract":"<div><p>Sequencing-based <strong>spatial transcriptomics</strong> (ST) is an emerging technology to study <em>in situ</em> gene expression patterns at the whole-genome scale. Currently, ST data analysis is still complicated by high technical noises and low resolution. In addition to the transcriptomic data, matched histopathological images are usually generated for the same tissue sample along the ST experiment. The matched high-resolution histopathological images provide complementary cellular phenotypical information, providing an opportunity to mitigate the noises in ST data. We present a novel ST data analysis method called transcriptome and histopathological image integrative analysis for ST (TIST), which enables the identification of spatial clusters (SCs) and the enhancement of spatial gene expression patterns by integrative analysis of matched transcriptomic data and images. TIST devises a histopathological feature extraction method based on Markov random field (MRF) to learn the cellular features from histopathological images, and integrates them with the transcriptomic data and location information as a network, termed TIST-net. Based on TIST-net, SCs are identified by a random walk-based strategy, and gene expression patterns are enhanced by neighborhood smoothing. We benchmark TIST on both simulated datasets and 32 real samples against several state-of-the-art methods. Results show that TIST is robust to technical noises on multiple analysis tasks for sequencing-based ST data and can find interesting microstructures in different biological scenarios. TIST is available at <span>http://lifeome.net/software/tist/</span><svg><path></path></svg> and <span>https://ngdc.cncb.ac.cn/biocode/tools/BT007317</span><svg><path></path></svg>.</p></div>","PeriodicalId":12528,"journal":{"name":"Genomics, Proteomics & Bioinformatics","volume":"20 5","pages":"Pages 974-988"},"PeriodicalIF":9.5,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10025771/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9151285","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Machine Learning for Lung Cancer Diagnosis, Treatment, and Prognosis 肺癌诊断、治疗和预后的机器学习
IF 9.5 2区 生物学
Genomics, Proteomics & Bioinformatics Pub Date : 2022-10-01 DOI: 10.1016/j.gpb.2022.11.003
Yawei Li , Xin Wu , Ping Yang , Guoqian Jiang , Yuan Luo
{"title":"Machine Learning for Lung Cancer Diagnosis, Treatment, and Prognosis","authors":"Yawei Li ,&nbsp;Xin Wu ,&nbsp;Ping Yang ,&nbsp;Guoqian Jiang ,&nbsp;Yuan Luo","doi":"10.1016/j.gpb.2022.11.003","DOIUrl":"10.1016/j.gpb.2022.11.003","url":null,"abstract":"<div><p>The recent development of imaging and sequencing technologies enables systematic advances in the clinical study of lung cancer. Meanwhile, the human mind is limited in effectively handling and fully utilizing the accumulation of such enormous amounts of data. Machine learning-based approaches play a critical role in integrating and analyzing these large and complex datasets, which have extensively characterized lung cancer through the use of different perspectives from these accrued data. In this review, we provide an overview of machine learning-based approaches that strengthen the varying aspects of lung cancer diagnosis and therapy, including early detection, auxiliary diagnosis, prognosis <strong>prediction</strong>, and <strong>immunotherapy</strong> practice. Moreover, we highlight the challenges and opportunities for future applications of machine learning in lung cancer.</p></div>","PeriodicalId":12528,"journal":{"name":"Genomics, Proteomics & Bioinformatics","volume":"20 5","pages":"Pages 850-866"},"PeriodicalIF":9.5,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10025752/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9153059","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
TripletGO: Integrating Transcript Expression Profiles with Protein Homology Inferences for Gene Function Prediction TripletGO:整合转录表达谱与蛋白质同源性推断基因功能预测
IF 9.5 2区 生物学
Genomics, Proteomics & Bioinformatics Pub Date : 2022-10-01 DOI: 10.1016/j.gpb.2022.03.001
Yi-Heng Zhu , Chengxin Zhang , Yan Liu , Gilbert S. Omenn , Peter L. Freddolino , Dong-Jun Yu , Yang Zhang
{"title":"TripletGO: Integrating Transcript Expression Profiles with Protein Homology Inferences for Gene Function Prediction","authors":"Yi-Heng Zhu ,&nbsp;Chengxin Zhang ,&nbsp;Yan Liu ,&nbsp;Gilbert S. Omenn ,&nbsp;Peter L. Freddolino ,&nbsp;Dong-Jun Yu ,&nbsp;Yang Zhang","doi":"10.1016/j.gpb.2022.03.001","DOIUrl":"10.1016/j.gpb.2022.03.001","url":null,"abstract":"<div><p><strong>Gene Ontology</strong> (GO) has been widely used to annotate functions of genes and gene products. Here, we proposed a new method, TripletGO, to deduce GO terms of protein-coding and non-coding genes, through the integration of four complementary pipelines built on <strong>transcript expression profile</strong>, genetic sequence alignment, protein sequence alignment, and naïve probability. TripletGO was tested on a large set of 5754 genes from 8 species (human, mouse, <em>Arabidopsis</em>, rat, fly, budding yeast, fission yeast, and nematoda) and 2433 proteins with available expression data from the third Critical Assessment of Protein Function Annotation challenge (CAFA3). Experimental results show that TripletGO achieves function annotation accuracy significantly beyond the current state-of-the-art approaches. Detailed analyses show that the major advantage of TripletGO lies in the coupling of a new <strong>triplet network</strong>-based profiling method with the feature space mapping technique, which can accurately recognize function patterns from transcript expression profiles. Meanwhile, the combination of multiple complementary models, especially those from transcript expression and <strong>protein-level alignments</strong>, improves the coverage and accuracy of the final GO annotation results. The standalone package and an online server of TripletGO are freely available at <span>https://zhanggroup.org/TripletGO/</span><svg><path></path></svg>.</p></div>","PeriodicalId":12528,"journal":{"name":"Genomics, Proteomics & Bioinformatics","volume":"20 5","pages":"Pages 1013-1027"},"PeriodicalIF":9.5,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10025770/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9834982","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Assessment and Optimization of Explainable Machine Learning Models Applied to Transcriptomic Data 应用于转录组学数据的可解释机器学习模型的评估和优化
IF 9.5 2区 生物学
Genomics, Proteomics & Bioinformatics Pub Date : 2022-10-01 DOI: 10.1016/j.gpb.2022.07.003
Yongbing Zhao , Jinfeng Shao , Yan W. Asmann
{"title":"Assessment and Optimization of Explainable Machine Learning Models Applied to Transcriptomic Data","authors":"Yongbing Zhao ,&nbsp;Jinfeng Shao ,&nbsp;Yan W. Asmann","doi":"10.1016/j.gpb.2022.07.003","DOIUrl":"10.1016/j.gpb.2022.07.003","url":null,"abstract":"<div><p>Explainable artificial intelligence aims to interpret how <strong>machine learning</strong> models make decisions, and many model explainers have been developed in the computer vision field. However, understanding of the applicability of these model explainers to biological data is still lacking. In this study, we comprehensively evaluated multiple explainers by interpreting pre-trained models for predicting tissue types from transcriptomic data and by identifying the top contributing genes from each sample with the greatest impacts on model prediction. To improve the reproducibility and interpretability of results generated by model explainers, we proposed a series of optimization strategies for each explainer on two different model architectures of multilayer perceptron (MLP) and convolutional neural network (CNN). We observed three groups of explainer and model architecture combinations with high reproducibility. Group II, which contains three model explainers on aggregated MLP models, identified top contributing genes in different tissues that exhibited tissue-specific manifestation and were potential cancer biomarkers. In summary, our work provides novel insights and guidance for exploring biological mechanisms using explainable machine learning models.</p></div>","PeriodicalId":12528,"journal":{"name":"Genomics, Proteomics & Bioinformatics","volume":"20 5","pages":"Pages 899-911"},"PeriodicalIF":9.5,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10025763/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9152040","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信