Briefings in bioinformatics最新文献

筛选
英文 中文
Deep learning in template-free de novo biosynthetic pathway design of natural products. 天然产品无模板从头生物合成途径设计中的深度学习。
IF 6.8 2区 生物学
Briefings in bioinformatics Pub Date : 2024-09-23 DOI: 10.1093/bib/bbae495
Xueying Xie, Lin Gui, Baixue Qiao, Guohua Wang, Shan Huang, Yuming Zhao, Shanwen Sun
{"title":"Deep learning in template-free de novo biosynthetic pathway design of natural products.","authors":"Xueying Xie, Lin Gui, Baixue Qiao, Guohua Wang, Shan Huang, Yuming Zhao, Shanwen Sun","doi":"10.1093/bib/bbae495","DOIUrl":"10.1093/bib/bbae495","url":null,"abstract":"<p><p>Natural products (NPs) are indispensable in drug development, particularly in combating infections, cancer, and neurodegenerative diseases. However, their limited availability poses significant challenges. Template-free de novo biosynthetic pathway design provides a strategic solution for NP production, with deep learning standing out as a powerful tool in this domain. This review delves into state-of-the-art deep learning algorithms in NP biosynthesis pathway design. It provides an in-depth discussion of databases like Kyoto Encyclopedia of Genes and Genomes (KEGG), Reactome, and UniProt, which are essential for model training, along with chemical databases such as Reaxys, SciFinder, and PubChem for transfer learning to expand models' understanding of the broader chemical space. It evaluates the potential and challenges of sequence-to-sequence and graph-to-graph translation models for accurate single-step prediction. Additionally, it discusses search algorithms for multistep prediction and deep learning algorithms for predicting enzyme function. The review also highlights the pivotal role of deep learning in improving catalytic efficiency through enzyme engineering, which is essential for enhancing NP production. Moreover, it examines the application of large language models in pathway design, enzyme discovery, and enzyme engineering. Finally, it addresses the challenges and prospects associated with template-free approaches, offering insights into potential advancements in NP biosynthesis pathway design.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":null,"pages":null},"PeriodicalIF":6.8,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11456888/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142380028","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DeepPBI-KG: a deep learning method for the prediction of phage-bacteria interactions based on key genes. DeepPBI-KG:基于关键基因预测噬菌体-细菌相互作用的深度学习方法。
IF 6.8 2区 生物学
Briefings in bioinformatics Pub Date : 2024-09-23 DOI: 10.1093/bib/bbae484
Tongqing Wei, Chenqi Lu, Hanxiao Du, Qianru Yang, Xin Qi, Yankun Liu, Yi Zhang, Chen Chen, Yutong Li, Yuanhao Tang, Wen-Hong Zhang, Xu Tao, Ning Jiang
{"title":"DeepPBI-KG: a deep learning method for the prediction of phage-bacteria interactions based on key genes.","authors":"Tongqing Wei, Chenqi Lu, Hanxiao Du, Qianru Yang, Xin Qi, Yankun Liu, Yi Zhang, Chen Chen, Yutong Li, Yuanhao Tang, Wen-Hong Zhang, Xu Tao, Ning Jiang","doi":"10.1093/bib/bbae484","DOIUrl":"10.1093/bib/bbae484","url":null,"abstract":"<p><p>Phages, the natural predators of bacteria, were discovered more than 100 years ago. However, increasing antimicrobial resistance rates have revitalized phage research. Methods that are more time-consuming and efficient than wet-laboratory experiments are needed to help screen phages quickly for therapeutic use. Traditional computational methods usually ignore the fact that phage-bacteria interactions are achieved by key genes and proteins. Methods for intraspecific prediction are rare since almost all existing methods consider only interactions at the species and genus levels. Moreover, most strains in existing databases contain only partial genome information because whole-genome information for species is difficult to obtain. Here, we propose a new approach for interaction prediction by constructing new features from key genes and proteins via the application of K-means sampling to select high-quality negative samples for prediction. Finally, we develop DeepPBI-KG, a corresponding prediction tool based on feature selection and a deep neural network. The results show that the average area under the curve for prediction reached 0.93 for each strain, and the overall AUC and area under the precision-recall curve reached 0.89 and 0.92, respectively, on the independent test set; these values are greater than those of other existing prediction tools. The forward and reverse validation results indicate that key genes and key proteins regulate and influence the interaction, which supports the reliability of the model. In addition, intraspecific prediction experiments based on Klebsiella pneumoniae data demonstrate the potential applicability of DeepPBI-KG for intraspecific prediction. In summary, the feature engineering and interaction prediction approaches proposed in this study can effectively improve the robustness and stability of interaction prediction, can achieve high generalizability, and may provide new directions and insights for rapid phage screening for therapy.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":null,"pages":null},"PeriodicalIF":6.8,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11440089/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142341940","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CMFHMDA: a prediction framework for human disease-microbe associations based on cross-domain matrix factorization. CMFHMDA:基于跨域矩阵因式分解的人类疾病-微生物关联预测框架。
IF 6.8 2区 生物学
Briefings in bioinformatics Pub Date : 2024-09-23 DOI: 10.1093/bib/bbae481
Jing Chen, Ran Tao, Yi Qiu, Qun Yuan
{"title":"CMFHMDA: a prediction framework for human disease-microbe associations based on cross-domain matrix factorization.","authors":"Jing Chen, Ran Tao, Yi Qiu, Qun Yuan","doi":"10.1093/bib/bbae481","DOIUrl":"https://doi.org/10.1093/bib/bbae481","url":null,"abstract":"<p><p>Predicting associations between microbes and diseases opens up new avenues for developing diagnostic, preventive, and therapeutic strategies. Given that laboratory-based biological tests to verify these associations are often time-consuming and expensive, there is a critical need for innovative computational frameworks to predict new microbe-disease associations. In this work, we introduce a novel prediction algorithm called Predicting Human Disease-Microbe Associations using Cross-Domain Matrix Factorization (CMFHMDA). Initially, we calculate the composite similarity of diseases and the Gaussian interaction profile similarity of microbes. We then apply the Weighted K Nearest Known Neighbors (WKNKN) algorithm to refine the microbe-disease association matrix. Our CMFHMDA model is subsequently developed by integrating the network data of both microbes and diseases to predict potential associations. The key innovations of this method include using the WKNKN algorithm to preprocess missing values in the association matrix and incorporating cross-domain information from microbes and diseases into the CMFHMDA model. To validate CMFHMDA, we employed three different cross-validation techniques to evaluate the model's accuracy. The results indicate that the CMFHMDA model achieved Area Under the Receiver Operating Characteristic Curve scores of 0.9172, 0.8551, and 0.9351$pm $0.0052 in global Leave-One-Out Cross-Validation (LOOCV), local LOOCV, and five-fold CV, respectively. Furthermore, many predicted associations have been confirmed by published experimental studies, establishing CMFHMDA as an effective tool for predicting potential disease-associated microbes.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":null,"pages":null},"PeriodicalIF":6.8,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11427075/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142341938","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
3t-seq: automatic gene expression analysis of single-copy genes, transposable elements, and tRNAs from RNA-seq data. 3t-seq:从 RNA-seq 数据中自动分析单拷贝基因、转座元件和 tRNA 的基因表达。
IF 6.8 2区 生物学
Briefings in bioinformatics Pub Date : 2024-09-23 DOI: 10.1093/bib/bbae467
Francesco Tabaro, Matthieu Boulard
{"title":"3t-seq: automatic gene expression analysis of single-copy genes, transposable elements, and tRNAs from RNA-seq data.","authors":"Francesco Tabaro, Matthieu Boulard","doi":"10.1093/bib/bbae467","DOIUrl":"https://doi.org/10.1093/bib/bbae467","url":null,"abstract":"<p><p>RNA sequencing is the gold-standard method to quantify transcriptomic changes between two conditions. The overwhelming majority of data analysis methods available are focused on polyadenylated RNA transcribed from single-copy genes and overlook transcripts from repeated sequences such as transposable elements (TEs). These self-autonomous genetic elements are increasingly studied, and specialized tools designed to handle multimapping sequencing reads are available. Transfer RNAs are transcribed by RNA polymerase III and are essential for protein translation. There is a need for integrated software that is able to analyze multiple types of RNA. Here, we present 3t-seq, a Snakemake pipeline for integrated differential expression analysis of transcripts from single-copy genes, TEs, and tRNA. 3t-seq produces an accessible report and easy-to-use results for downstream analysis starting from raw sequencing data and performing quality control, genome mapping, gene expression quantification, and statistical testing. It implements three methods to quantify TEs expression and one for tRNA genes. It provides an easy-to-configure method to manage software dependencies that lets the user focus on results. 3t-seq is released under MIT license and is available at https://github.com/boulardlab/3t-seq.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":null,"pages":null},"PeriodicalIF":6.8,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11424182/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142341931","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhancing RNA-seq analysis by addressing all co-existing biases using a self-benchmarking approach with 2D structural insights. 利用具有二维结构洞察力的自我基准方法解决所有并存的偏差,从而加强 RNA-seq 分析。
IF 6.8 2区 生物学
Briefings in bioinformatics Pub Date : 2024-09-23 DOI: 10.1093/bib/bbae532
Qiang Su, Yi Long, Deming Gou, Junmin Quan, Qizhou Lian
{"title":"Enhancing RNA-seq analysis by addressing all co-existing biases using a self-benchmarking approach with 2D structural insights.","authors":"Qiang Su, Yi Long, Deming Gou, Junmin Quan, Qizhou Lian","doi":"10.1093/bib/bbae532","DOIUrl":"10.1093/bib/bbae532","url":null,"abstract":"<p><p>We introduce a groundbreaking approach: the minimum free energy-based Gaussian Self-Benchmarking (MFE-GSB) framework, designed to combat the myriad of biases inherent in RNA-seq data. Central to our methodology is the MFE concept, facilitating the adoption of a Gaussian distribution model tailored to effectively mitigate all co-existing biases within a k-mer counting scheme. The MFE-GSB framework operates on a sophisticated dual-model system, juxtaposing modeling data of uniform k-mer distribution against the real, observed sequencing data characterized by nonuniform k-mer distributions. The framework applies a Gaussian function, guided by the predetermined parameters-mean and SD-derived from modeling data, to fit unknown sequencing data. This dual comparison allows for the accurate prediction of k-mer abundances across MFE categories, enabling simultaneous correction of biases at the single k-mer level. Through validation with both engineered RNA constructs and human tissue RNA samples, its wide-ranging efficacy and applicability are demonstrated.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":null,"pages":null},"PeriodicalIF":6.8,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11491153/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142458372","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A robust statistical approach for finding informative spatially associated pathways. 寻找信息丰富的空间关联路径的稳健统计方法。
IF 6.8 2区 生物学
Briefings in bioinformatics Pub Date : 2024-09-23 DOI: 10.1093/bib/bbae543
Leqi Tian, Jiashun Xiao, Tianwei Yu
{"title":"A robust statistical approach for finding informative spatially associated pathways.","authors":"Leqi Tian, Jiashun Xiao, Tianwei Yu","doi":"10.1093/bib/bbae543","DOIUrl":"https://doi.org/10.1093/bib/bbae543","url":null,"abstract":"<p><p>Spatial transcriptomics offers deep insights into cellular functional localization and communication by mapping gene expression to spatial locations. Traditional approaches that focus on selecting spatially variable genes often overlook the complexity of biological pathways and the interactions among genes. Here, we introduce a novel framework that shifts the focus towards directly identifying functional pathways associated with spatial variability by adapting the Brownian distance covariance test in an innovative manner to explore the heterogeneity of biological functions over space. Unlike most other methods, this statistical testing approach is free of gene selection and parameter selection and allows nonlinear and complex dependencies. It allows for a deeper understanding of how cells coordinate their activities across different spatial domains through biological pathways. By analyzing real human and mouse datasets, the method found significant pathways that were associated with spatial variation, as well as different pathway patterns among inner- and edge-cancer regions. This innovative framework offers a new perspective on analyzing spatial transcriptomic data, contributing to our understanding of tissue architecture and disease pathology. The implementation is publicly available at https://github.com/tianlq-prog/STpathway.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":null,"pages":null},"PeriodicalIF":6.8,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11503753/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142495333","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Prototype-based contrastive substructure identification for molecular property prediction. 用于分子特性预测的基于原型的对比子结构识别。
IF 6.8 2区 生物学
Briefings in bioinformatics Pub Date : 2024-09-23 DOI: 10.1093/bib/bbae565
Gaoqi He, Shun Liu, Zhuoran Liu, Changbo Wang, Kai Zhang, Honglin Li
{"title":"Prototype-based contrastive substructure identification for molecular property prediction.","authors":"Gaoqi He, Shun Liu, Zhuoran Liu, Changbo Wang, Kai Zhang, Honglin Li","doi":"10.1093/bib/bbae565","DOIUrl":"10.1093/bib/bbae565","url":null,"abstract":"<p><p>Substructure-based representation learning has emerged as a powerful approach to featurize complex attributed graphs, with promising results in molecular property prediction (MPP). However, existing MPP methods mainly rely on manually defined rules to extract substructures. It remains an open challenge to adaptively identify meaningful substructures from numerous molecular graphs to accommodate MPP tasks. To this end, this paper proposes Prototype-based cOntrastive Substructure IdentificaTion (POSIT), a self-supervised framework to autonomously discover substructural prototypes across graphs so as to guide end-to-end molecular fragmentation. During pre-training, POSIT emphasizes two key aspects of substructure identification: firstly, it imposes a soft connectivity constraint to encourage the generation of topologically meaningful substructures; secondly, it aligns resultant substructures with derived prototypes through a prototype-substructure contrastive clustering objective, ensuring attribute-based similarity within clusters. In the fine-tuning stage, a cross-scale attention mechanism is designed to integrate substructure-level information to enhance molecular representations. The effectiveness of the POSIT framework is demonstrated by experimental results from diverse real-world datasets, covering both classification and regression tasks. Moreover, visualization analysis validates the consistency of chemical priors with identified substructures. The source code is publicly available at https://github.com/VRPharmer/POSIT.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":null,"pages":null},"PeriodicalIF":6.8,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11533112/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142567282","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AESurv: autoencoder survival analysis for accurate early prediction of coronary heart disease. AESurv:用于准确早期预测冠心病的自动编码器生存分析。
IF 6.8 2区 生物学
Briefings in bioinformatics Pub Date : 2024-09-23 DOI: 10.1093/bib/bbae479
Yike Shen, Arce Domingo-Relloso, Allison Kupsco, Marianthi-Anna Kioumourtzoglou, Maria Tellez-Plaza, Jason G Umans, Amanda M Fretts, Ying Zhang, Peter F Schnatz, Ramon Casanova, Lisa Warsinger Martin, Steve Horvath, JoAnn E Manson, Shelley A Cole, Haotian Wu, Eric A Whitsel, Andrea A Baccarelli, Ana Navas-Acien, Feng Gao
{"title":"AESurv: autoencoder survival analysis for accurate early prediction of coronary heart disease.","authors":"Yike Shen, Arce Domingo-Relloso, Allison Kupsco, Marianthi-Anna Kioumourtzoglou, Maria Tellez-Plaza, Jason G Umans, Amanda M Fretts, Ying Zhang, Peter F Schnatz, Ramon Casanova, Lisa Warsinger Martin, Steve Horvath, JoAnn E Manson, Shelley A Cole, Haotian Wu, Eric A Whitsel, Andrea A Baccarelli, Ana Navas-Acien, Feng Gao","doi":"10.1093/bib/bbae479","DOIUrl":"https://doi.org/10.1093/bib/bbae479","url":null,"abstract":"<p><p>Coronary heart disease (CHD) is one of the leading causes of mortality and morbidity in the United States. Accurate time-to-event CHD prediction models with high-dimensional DNA methylation and clinical features may assist with early prediction and intervention strategies. We developed a state-of-the-art deep learning autoencoder survival analysis model (AESurv) to effectively analyze high-dimensional blood DNA methylation features and traditional clinical risk factors by learning low-dimensional representation of participants for time-to-event CHD prediction. We demonstrated the utility of our model in two cohort studies: the Strong Heart Study cohort (SHS), a prospective cohort studying cardiovascular disease and its risk factors among American Indians adults; the Women's Health Initiative (WHI), a prospective cohort study including randomized clinical trials and observational study to improve postmenopausal women's health with one of the main focuses on cardiovascular disease. Our AESurv model effectively learned participant representations in low-dimensional latent space and achieved better model performance (concordance index-C index of 0.864 ± 0.009 and time-to-event mean area under the receiver operating characteristic curve-AUROC of 0.905 ± 0.009) than other survival analysis models (Cox proportional hazard, Cox proportional hazard deep neural network survival analysis, random survival forest, and gradient boosting survival analysis models) in the SHS. We further validated the AESurv model in WHI and also achieved the best model performance. The AESurv model can be used for accurate CHD prediction and assist health care professionals and patients to perform early intervention strategies. We suggest using AESurv model for future time-to-event CHD prediction based on DNA methylation features.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":null,"pages":null},"PeriodicalIF":6.8,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11424508/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142341932","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
scEGG: an exogenous gene-guided clustering method for single-cell transcriptomic data. scEGG:单细胞转录组数据的外源基因引导聚类方法。
IF 6.8 2区 生物学
Briefings in bioinformatics Pub Date : 2024-09-23 DOI: 10.1093/bib/bbae483
Dayu Hu, Renxiang Guan, Ke Liang, Hao Yu, Hao Quan, Yawei Zhao, Xinwang Liu, Kunlun He
{"title":"scEGG: an exogenous gene-guided clustering method for single-cell transcriptomic data.","authors":"Dayu Hu, Renxiang Guan, Ke Liang, Hao Yu, Hao Quan, Yawei Zhao, Xinwang Liu, Kunlun He","doi":"10.1093/bib/bbae483","DOIUrl":"10.1093/bib/bbae483","url":null,"abstract":"<p><p>In recent years, there has been significant advancement in the field of single-cell data analysis, particularly in the development of clustering methods. Despite these advancements, most algorithms continue to focus primarily on analyzing the provided single-cell matrix data. However, within medical contexts, single-cell data often encompasses a wealth of exogenous information, such as gene networks. Overlooking this aspect could result in information loss and produce clustering outcomes lacking significant clinical relevance. To address this limitation, we introduce an innovative deep clustering method for single-cell data that leverages exogenous gene information to generate discriminative cell representations. Specifically, an attention-enhanced graph autoencoder has been developed to efficiently capture topological signal patterns among cells. Concurrently, a random walk on an exogenous protein-protein interaction network enabled the acquisition of the gene's embeddings. Ultimately, the clustering process entailed integrating and reconstructing gene-cell cooperative embeddings, which yielded a discriminative representation. Extensive experiments have demonstrated the effectiveness of the proposed method. This research provides enhanced insights into the characteristics of cells, thus laying the foundation for the early diagnosis and treatment of diseases. The datasets and code can be publicly accessed in the repository at https://github.com/DayuHuu/scEGG.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":null,"pages":null},"PeriodicalIF":6.8,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11440090/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142341952","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Diagnostics of viral infections using high-throughput genome sequencing data. 利用高通量基因组测序数据诊断病毒感染。
IF 6.8 2区 生物学
Briefings in bioinformatics Pub Date : 2024-09-23 DOI: 10.1093/bib/bbae501
Haochen Ning, Ian Boyes, Ibrahim Numanagić, Michael Rott, Li Xing, Xuekui Zhang
{"title":"Diagnostics of viral infections using high-throughput genome sequencing data.","authors":"Haochen Ning, Ian Boyes, Ibrahim Numanagić, Michael Rott, Li Xing, Xuekui Zhang","doi":"10.1093/bib/bbae501","DOIUrl":"https://doi.org/10.1093/bib/bbae501","url":null,"abstract":"<p><p>Plant viral infections cause significant economic losses, totalling $350 billion USD in 2021. With no treatment for virus-infected plants, accurate and efficient diagnosis is crucial to preventing and controlling these diseases. High-throughput sequencing (HTS) enables cost-efficient identification of known and unknown viruses. However, existing diagnostic pipelines face challenges. First, many methods depend on subjectively chosen parameter values, undermining their robustness across various data sources. Second, artifacts (e.g. false peaks) in the mapped sequence data can lead to incorrect diagnostic results. While some methods require manual or subjective verification to address these artifacts, others overlook them entirely, affecting the overall method performance and leading to imprecise or labour-intensive outcomes. To address these challenges, we introduce IIMI, a new automated analysis pipeline using machine learning to diagnose infections from 1583 plant viruses with HTS data. It adopts a data-driven approach for parameter selection, reducing subjectivity, and automatically filters out regions affected by artifacts, thus improving accuracy. Testing with in-house and published data shows IIMI's superiority over existing methods. Besides a prediction model, IIMI also provides resources on plant virus genomes, including annotations of regions prone to artifacts. The method is available as an R package (iimi) on CRAN and will integrate with the web application www.virtool.ca, enhancing accessibility and user convenience.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":null,"pages":null},"PeriodicalIF":6.8,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11483527/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142486027","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信