Bioinformatics最新文献

筛选
英文 中文
PQSDC: a parallel lossless compressor for quality scores data via sequences partition and Run-Length prediction mapping. PQSDC:通过序列分区和运行长度预测映射的并行无损质量分数数据压缩器。
IF 5.8 3区 生物学
Bioinformatics Pub Date : 2024-05-17 DOI: 10.1093/bioinformatics/btae323
Hui Sun, Yingfeng Zheng, Haonan Xie, Huidong Ma, Cheng Zhong, Meng Yan, Xiaoguang Liu, Gang Wang
{"title":"PQSDC: a parallel lossless compressor for quality scores data via sequences partition and Run-Length prediction mapping.","authors":"Hui Sun, Yingfeng Zheng, Haonan Xie, Huidong Ma, Cheng Zhong, Meng Yan, Xiaoguang Liu, Gang Wang","doi":"10.1093/bioinformatics/btae323","DOIUrl":"https://doi.org/10.1093/bioinformatics/btae323","url":null,"abstract":"MOTIVATION\u0000The quality scores data (QSD) account for 70% in compressed FastQ files obtained from the short and long reads sequencing technologies. Designing effective compressors for QSD that counterbalance compression ratio, time cost, and memory consumption is essential in scenarios such as large-scale genomics data sharing and long-term data backup. This study presents a novel parallel lossless QSD-dedicated compression algorithm named PQSDC, which fulfills the above requirements well. PQSDC is based on two core components: a parallel sequences-partition model designed to reduce peak memory consumption and time cost during compression and decompression processes, as well as a parallel four-level run-length prediction mapping model to enhance compression ratio. Besides, the PQSDC algorithm is also designed to be highly concurrent using multi-core CPU clusters.\u0000\u0000\u0000RESULTS\u0000We evaluate PQSDC and 4 state-of-the-art compression algorithms on 27 real-world datasets, including 61.857 billion QSD characters and 632.908 million QSD sequences. (1) For short reads, compared to baselines, the maximum improvement of PQSDC reaches 7.06% in average compression ratio, and 8.01% in weighted average compression ratio. During compression and decompression, the maximum total time savings of PQSDC are 79.96% and 84.56%, respectively; the maximum average memory savings are 68.34% and 77.63%, respectively. (2) For long reads, the maximum improvement of PQSDC reaches 12.51% and 13.42% in average and weighted average compression ratio, respectively. The maximum total time savings during compression and decompression are 53.51% and 72.53%, respectively; the maximum average memory savings are 19.44% and 17.42%, respectively. (3) Furthermore, PQSDC ranks second in compression robustness among the tested algorithms, indicating that it is less affected by the probability distribution of the QSD collections. Overall, our work provides a promising solution for QSD parallel compression, which balances storage cost, time consumption, and memory occupation primely.\u0000\u0000\u0000AVAILABILITY\u0000The proposed PQSDC compressor can be downloaded from https://github.com/fahaihi/PQSDC.\u0000\u0000\u0000SUPPLEMENTARY INFORMATION\u0000Supplementary data are available at Bioinformatics online.","PeriodicalId":8903,"journal":{"name":"Bioinformatics","volume":null,"pages":null},"PeriodicalIF":5.8,"publicationDate":"2024-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140962816","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MUSE-XAE: MUtational Signature Extraction with eXplainable AutoEncoder enhances tumour types classification. MUSE-XAE:MUtational Signature Extraction with eXplainable AutoEncoder 可增强肿瘤类型分类。
IF 5.8 3区 生物学
Bioinformatics Pub Date : 2024-05-16 DOI: 10.1093/bioinformatics/btae320
Corrado Pancotti, Cesare Rollo, Francesco Codicè, G. Birolo, Piero Fariselli, T. Sanavia
{"title":"MUSE-XAE: MUtational Signature Extraction with eXplainable AutoEncoder enhances tumour types classification.","authors":"Corrado Pancotti, Cesare Rollo, Francesco Codicè, G. Birolo, Piero Fariselli, T. Sanavia","doi":"10.1093/bioinformatics/btae320","DOIUrl":"https://doi.org/10.1093/bioinformatics/btae320","url":null,"abstract":"MOTIVATION\u0000Mutational signatures are a critical component in deciphering the genetic alterations that underlie cancer development and have become a valuable resource to understand the genomic changes during tumorigenesis. Therefore, it is essential to employ precise and accurate methods for their extraction to ensure that the underlying patterns are reliably identified and can be effectively utilized in new strategies for diagnosis, prognosis and treatment of cancer patients.\u0000\u0000\u0000RESULTS\u0000We present MUSE-XAE, a novel method for mutational signature extraction from cancer genomes using an explainable autoencoder. Our approach employs a hybrid architecture consisting of a nonlinear encoder that can capture nonlinear interactions among features, and a linear decoder which ensures the interpretability of the active signatures. We evaluated and compared MUSE-XAE with other available tools on both synthetic and real cancer datasets and demonstrated that it achieves superior performance in terms of precision and sensitivity in recovering mutational signature profiles. MUSE-XAE extracts highly discriminative mutational signature profiles by enhancing the classification of primary tumour types and subtypes in real world settings. This approach could facilitate further research in this area, with neural networks playing a critical role in advancing our understanding of cancer genomics.\u0000\u0000\u0000AVAILABILITY\u0000MUSE-XAE software is freely available at https://github.com/compbiomed-unito/MUSE-XAE.\u0000\u0000\u0000SUPPLEMENTARY INFORMATION\u0000Supplementary data are available at Bioinformatics online.","PeriodicalId":8903,"journal":{"name":"Bioinformatics","volume":null,"pages":null},"PeriodicalIF":5.8,"publicationDate":"2024-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140966527","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CopyVAE: a variational autoencoder-based approach for copy number variation inference using single-cell transcriptomics CopyVAE:基于变异自动编码器的单细胞转录组学拷贝数变异推断方法
IF 5.8 3区 生物学
Bioinformatics Pub Date : 2024-04-27 DOI: 10.1093/bioinformatics/btae284
Semih Kurt, Mandi Chen, Hosein Toosi, Xinsong Chen, Camilla Engblom, Jeff Mold, Johan Hartman, Jens Lagergren
{"title":"CopyVAE: a variational autoencoder-based approach for copy number variation inference using single-cell transcriptomics","authors":"Semih Kurt, Mandi Chen, Hosein Toosi, Xinsong Chen, Camilla Engblom, Jeff Mold, Johan Hartman, Jens Lagergren","doi":"10.1093/bioinformatics/btae284","DOIUrl":"https://doi.org/10.1093/bioinformatics/btae284","url":null,"abstract":"\u0000 \u0000 \u0000 Copy number variations (CNVs) are common genetic alterations in tumour cells. The delineation of CNVs holds promise for enhancing our comprehension of cancer progression. Moreover, accurate inference of CNVs from single-cell sequencing data is essential for unravelling intratumoral heterogeneity. However, existing inference methods face limitations in resolution and sensitivity.\u0000 \u0000 \u0000 \u0000 To address these challenges, we present CopyVAE, a deep learning framework based on a variational autoencoder architecture. Through experiments, we demonstrated that CopyVAE can accurately and reliably detect copy number variations (CNVs) from data obtained using single-cell RNA sequencing. CopyVAE surpasses existing methods in terms of sensitivity and specificity. We also discussed CopyVAE’s potential to advance our understanding of genetic alterations and their impact on disease advancement.\u0000 \u0000 \u0000 \u0000 CopyVAE is implemented and freely available under MIT license at https://github.com/kurtsemih/copyVAE\u0000 \u0000 \u0000 \u0000 Supplementary data are available at Bioinformatics online.\u0000","PeriodicalId":8903,"journal":{"name":"Bioinformatics","volume":null,"pages":null},"PeriodicalIF":5.8,"publicationDate":"2024-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140651944","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
LMCrot: An enhanced protein crotonylation site predictor by leveraging an interpretable window-level embedding from a transformer-based protein language model. LMCrot:通过利用基于转换器的蛋白质语言模型的可解释窗口级嵌入,增强蛋白质巴豆酰化位点预测器。
IF 5.8 3区 生物学
Bioinformatics Pub Date : 2024-04-25 DOI: 10.1093/bioinformatics/btae290
Pawel Pratyush, Soufia Bahmani, Suresh Pokharel, Hamid D Ismail, Dukka B Kc
{"title":"LMCrot: An enhanced protein crotonylation site predictor by leveraging an interpretable window-level embedding from a transformer-based protein language model.","authors":"Pawel Pratyush, Soufia Bahmani, Suresh Pokharel, Hamid D Ismail, Dukka B Kc","doi":"10.1093/bioinformatics/btae290","DOIUrl":"https://doi.org/10.1093/bioinformatics/btae290","url":null,"abstract":"MOTIVATION\u0000Recent advancements in natural language processing have highlighted the effectiveness of global contextualized representations from Protein Language Models (pLMs) in numerous downstream tasks. Nonetheless, strategies to encode the site-of-interest leveraging pLMs for per-residue prediction tasks, such as crotonylation (Kcr) prediction, remain largely uncharted.\u0000\u0000\u0000RESULTS\u0000Herein, we adopt a range of approaches for utilizing pLMs by experimenting with different input sequence types (full-length protein sequence versus window sequence), assessing the implications of utilizing per-residue embedding of the site-of-interest as well as embeddings of window residues centered around it. Building upon these insights, we developed a novel residual ConvBiLSTM network designed to process window-level embeddings of the site-of-interest generated by the ProtT5-XL-UniRef50 pLM using full-length sequences as input. This model, termed T5ResConvBiLSTM, surpasses existing state-of-the-art Kcr predictors in performance across three diverse datasets. To validate our approach of utilizing full sequence-based window-level embeddings, we also delved into the interpretability of ProtT5-derived embedding tensors in two ways: firstly, by scrutinizing the attention weights obtained from the transformer's encoder block; and secondly, by computing SHAP values for these tensors, providing a model-agnostic interpretation of the prediction results. Additionally, we enhance the latent representation of ProtT5 by incorporating two additional local representations, one derived from amino acid properties and the other from supervised embedding layer, through an intermediate-fusion stacked generalization approach, using an n-mer window sequence (or, peptide fragment). The resultant stacked model, dubbed LMCrot, exhibits a more pronounced improvement in predictive performance across the tested datasets.\u0000\u0000\u0000AVAILABILITY AND IMPLEMENTATION\u0000LMCrot is publicly available at https://github.com/KCLabMTU/LMCrot.","PeriodicalId":8903,"journal":{"name":"Bioinformatics","volume":null,"pages":null},"PeriodicalIF":5.8,"publicationDate":"2024-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140656696","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CORDAX web server: An online platform for the prediction and 3D visualization of aggregation motifs in protein sequences. CORDAX 网络服务器:用于预测蛋白质序列中聚集图案并将其三维可视化的在线平台。
IF 5.8 3区 生物学
Bioinformatics Pub Date : 2024-04-25 DOI: 10.1093/bioinformatics/btae279
Nikolaos N. Louros, F. Rousseau, J. Schymkowitz
{"title":"CORDAX web server: An online platform for the prediction and 3D visualization of aggregation motifs in protein sequences.","authors":"Nikolaos N. Louros, F. Rousseau, J. Schymkowitz","doi":"10.1093/bioinformatics/btae279","DOIUrl":"https://doi.org/10.1093/bioinformatics/btae279","url":null,"abstract":"MOTIVATION\u0000Proteins, the molecular workhorses of biological systems, execute a multitude of critical functions dictated by their precise three-dimensional structures. In a complex and dynamic cellular environment, proteins can undergo misfolding, leading to the formation of aggregates that take up various forms, including amorphous and ordered aggregation in the shape of amyloid fibrils. This phenomenon is closely linked to a spectrum of widespread debilitating pathologies, such as Alzheimer's disease, Parkinson's disease, type-II diabetes, and several other proteinopathies, but also hampers the engineering of soluble agents, as in the case of antibody development. As such, the accurate prediction of aggregation propensity within protein sequences has become pivotal due to profound implications in understanding disease mechanisms, as well as in improving biotechnological and therapeutic applications.\u0000\u0000\u0000RESULTS\u0000We previously developed Cordax, a structure-based predictor that utilizes logistic regression to detect aggregation motifs in protein sequences based on their structural complementarity to the amyloid cross-beta architecture. Here, we present a dedicated web server interface for Cordax. This online platform combines several features including detailed scoring of sequence aggregation propensity, as well as 3D visualization with several customization options for topology models of the structural cores formed by predicted aggregation motifs. In addition, information is provided on experimentally determined aggregation-prone regions that exhibit sequence similarity to predicted motifs, scores, and links to other predictor outputs, as well as simultaneous predictions of relevant sequence propensities, such as solubility, hydrophobicity, and secondary structure propensity.\u0000\u0000\u0000AVAILABILITY\u0000The Cordax webserver is freely accessible at https://cordax.switchlab.org/.","PeriodicalId":8903,"journal":{"name":"Bioinformatics","volume":null,"pages":null},"PeriodicalIF":5.8,"publicationDate":"2024-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140654240","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CASCC: a co-expression assisted single-cell RNA-seq data clustering method. CASCC:共表达辅助单细胞 RNA-seq 数据聚类方法。
IF 5.8 3区 生物学
Bioinformatics Pub Date : 2024-04-25 DOI: 10.1093/bioinformatics/btae283
Lingyi Cai, Dimitris Anastassiou
{"title":"CASCC: a co-expression assisted single-cell RNA-seq data clustering method.","authors":"Lingyi Cai, Dimitris Anastassiou","doi":"10.1093/bioinformatics/btae283","DOIUrl":"https://doi.org/10.1093/bioinformatics/btae283","url":null,"abstract":"SUMMARY\u0000Existing clustering methods for characterizing cell populations from single-cell RNA sequencing are constrained by several limitations stemming from the fact that clusters often cannot be homogeneous, particularly for transitioning populations. On the other hand, dominant cell populations within samples can be identified independently by their strong gene co-expression signatures using methods unrelated to partitioning. Here, we introduce a clustering method, CASCC, designed to improve biological accuracy using gene co-expression features identified using an unsupervised adaptive attractor algorithm. CASCC outperformed other methods as evidenced by multiple evaluation metrics, and our results suggest that CASCC can improve the analysis of single-cell transcriptomics, enabling potential new discoveries related to underlying biological mechanisms.\u0000\u0000\u0000AVAILABILITY AND IMPLEMENTATION\u0000The CASCC R package is publicly available at https://github.com/LingyiC/CASCC and https://zenodo.org/doi/10.5281/zenodo.10648327.\u0000\u0000\u0000SUPPLEMENTARY INFORMATION\u0000Supplementary data are available at Bioinformatics online.","PeriodicalId":8903,"journal":{"name":"Bioinformatics","volume":null,"pages":null},"PeriodicalIF":5.8,"publicationDate":"2024-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140657084","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AmplificationTimeR: An R Package for Timing Sequential Amplification Events. AmplificationTimeR:用于为顺序放大事件计时的 R 软件包。
IF 5.8 3区 生物学
Bioinformatics Pub Date : 2024-04-24 DOI: 10.1093/bioinformatics/btae281
G. M. Jakobsdottir, Stefan C Dentro, Robert G Bristow, David C Wedge
{"title":"AmplificationTimeR: An R Package for Timing Sequential Amplification Events.","authors":"G. M. Jakobsdottir, Stefan C Dentro, Robert G Bristow, David C Wedge","doi":"10.1093/bioinformatics/btae281","DOIUrl":"https://doi.org/10.1093/bioinformatics/btae281","url":null,"abstract":"MOTIVATION\u0000Few methods exist for timing individual amplification events in regions of focal amplification. Current methods are also limited in the copy number states that they are able to time. Here we introduce AmplificationTimeR, a method for timing higher level copy number gains and inferring the most parsimonious order of events for regions that have undergone both single gains and whole genome duplication. Our method is an extension of established approaches for timing genomic gains.\u0000\u0000\u0000RESULTS\u0000We can time more copy number states, and in states covered by other methods our results are comparable to previously published methods.\u0000\u0000\u0000AVAILABILITY\u0000AmplificationTimer is freely available as an R package hosted at https://github.com/Wedge-lab/AmplificationTimeR.\u0000\u0000\u0000SUPPLEMENTARY INFORMATION\u0000Supplementary data are available at Bioinformatics online.","PeriodicalId":8903,"journal":{"name":"Bioinformatics","volume":null,"pages":null},"PeriodicalIF":5.8,"publicationDate":"2024-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140659774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DIMet: An open-source tool for Differential analysis of targeted Isotope-labeled Metabolomics data. DIMet:用于靶向同位素标记代谢组学数据差异分析的开源工具。
IF 5.8 3区 生物学
Bioinformatics Pub Date : 2024-04-24 DOI: 10.1093/bioinformatics/btae282
Johanna Galvis, J. Guyon, Benjamin Dartigues, Helge Hecht, Björn Grüning, Florian Specque, Hayssam Soueidan, S. Karkar, Thomas Daubon, M. Nikolski
{"title":"DIMet: An open-source tool for Differential analysis of targeted Isotope-labeled Metabolomics data.","authors":"Johanna Galvis, J. Guyon, Benjamin Dartigues, Helge Hecht, Björn Grüning, Florian Specque, Hayssam Soueidan, S. Karkar, Thomas Daubon, M. Nikolski","doi":"10.1093/bioinformatics/btae282","DOIUrl":"https://doi.org/10.1093/bioinformatics/btae282","url":null,"abstract":"MOTIVATION\u0000Many diseases, such as cancer, are characterized by an alteration of cellular metabolism allowing cells to adapt to changes in the microenvironment. Stable isotope-resolved metabolomics and downstream data analyses are widely used techniques for unraveling cells' metabolic activity to understand the altered functioning of metabolic pathways in the diseased state. While a number of bioinformatic solutions exist for the differential analysis of Stable Isotope-Resolved Metabolomics data, there is currently no available resource providing a comprehensive toolbox.\u0000\u0000\u0000RESULTS\u0000In this work, we present DIMet, a one-stop comprehensive tool for differential analysis of targeted tracer data. DIMet accepts metabolite total abundances, isotopologue contributions, and isotopic mean enrichment, and supports differential comparison (pairwise and multi-group), time-series analyses, and labeling profile comparison. Moreover, it integrates transcriptomics and targeted metabolomics data through network-based metabolograms. We illustrate the use of DIMet in real SIRM datasets obtained from Glioblastoma P3 cell-line samples. DIMet is open-source, and is readily available for routine downstream analysis of isotope-labeled targeted metabolomics data, as it can be used both in the command line interface or as a complete toolkit in the public Galaxy Europe and Workfow4Metabolomics web platforms.\u0000\u0000\u0000AVAILABILITY\u0000DIMet is freely available at https://github.com/cbib/DIMet, and through https://usegalaxy.eu and https://workflow4metabolomics.usegalaxy.fr. All the datasets are available at Zenodo https://zenodo.org/records/10925786.\u0000\u0000\u0000SUPPLEMENTARY INFORMATION\u0000Supplementary data are available at Bioinformatics online.","PeriodicalId":8903,"journal":{"name":"Bioinformatics","volume":null,"pages":null},"PeriodicalIF":5.8,"publicationDate":"2024-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140659536","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MammalMethylClock R package: Software for DNA Methylation-Based epigenetic clocks in mammals. MammalMethylClock R 软件包:哺乳动物基于 DNA 甲基化的表观遗传时钟软件。
IF 5.8 3区 生物学
Bioinformatics Pub Date : 2024-04-24 DOI: 10.1093/bioinformatics/btae280
J. Zoller, Steve Horvath
{"title":"MammalMethylClock R package: Software for DNA Methylation-Based epigenetic clocks in mammals.","authors":"J. Zoller, Steve Horvath","doi":"10.1093/bioinformatics/btae280","DOIUrl":"https://doi.org/10.1093/bioinformatics/btae280","url":null,"abstract":"MOTIVATION\u0000Epigenetic clocks are prediction methods based on DNA methylation levels in a given species or set of species. Defined as multivariate regression models, these DNA methylation-based biomarkers of age or mortality risk are useful in species conservation efforts and in preclinical studies.\u0000\u0000\u0000RESULTS\u0000We present an R package called MammalMethylClock for the construction, assessment, and application of epigenetic clocks in different mammalian species. The R package includes the utility for implementing pre-existing mammalian clocks from the Mammalian Methylation Consortium.\u0000\u0000\u0000AVAILABILITY\u0000The source code and documentation manual for MammalMethylClock, and clock coefficient .csv files that are included within this software package, can be found on Zenodo at https://doi.org/10.5281/zenodo.10971037.\u0000\u0000\u0000SUPPLEMENTARY INFORMATION\u0000Supplementary data are available at Bioinformatics online.","PeriodicalId":8903,"journal":{"name":"Bioinformatics","volume":null,"pages":null},"PeriodicalIF":5.8,"publicationDate":"2024-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140660126","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Large-scale Structure-Informed multiple sequence alignment of proteins with SIMSApiper. 利用 SIMSApiper 对蛋白质进行大规模结构信息多序列比对。
IF 5.8 3区 生物学
Bioinformatics Pub Date : 2024-04-22 DOI: 10.1093/bioinformatics/btae276
Charlotte Crauwels, Sophie-Luise Heidig, Adrián Díaz, Wim F Vranken
{"title":"Large-scale Structure-Informed multiple sequence alignment of proteins with SIMSApiper.","authors":"Charlotte Crauwels, Sophie-Luise Heidig, Adrián Díaz, Wim F Vranken","doi":"10.1093/bioinformatics/btae276","DOIUrl":"https://doi.org/10.1093/bioinformatics/btae276","url":null,"abstract":"SUMMARY\u0000SIMSApiper is a Nextflow pipeline that creates reliable, structure-informed MSAs of thousands of protein sequences in time-frames faster than standard structure-based alignment methods. Structural information can be provided by the user or collected by the pipeline from online resources. Parallelization with sequence identity based subsets can be activated to significantly speed up the alignment process. Finally, the number of gaps in the final alignment can be reduced by leveraging the position of conserved secondary structure elements.\u0000\u0000\u0000AVAILABILITY AND IMPLEMENTATION\u0000The pipeline is implemented using Nextflow, Python3 and Bash. It is publicly available on github.com/Bio2Byte/simsapiper.\u0000\u0000\u0000SUPPLEMENTARY INFORMATION\u0000All data is available on GitHub.","PeriodicalId":8903,"journal":{"name":"Bioinformatics","volume":null,"pages":null},"PeriodicalIF":5.8,"publicationDate":"2024-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140676036","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信