BMC Bioinformatics最新文献

筛选
英文 中文
Predicting viral proteins that evade the innate immune system: a machine learning-based immunoinformatics tool. 预测逃避先天性免疫系统的病毒蛋白:基于机器学习的免疫信息学工具。
IF 2.9 3区 生物学
BMC Bioinformatics Pub Date : 2024-11-09 DOI: 10.1186/s12859-024-05972-7
Jorge F Beltrán, Lisandra Herrera Belén, Alejandro J Yáñez, Luis Jimenez
{"title":"Predicting viral proteins that evade the innate immune system: a machine learning-based immunoinformatics tool.","authors":"Jorge F Beltrán, Lisandra Herrera Belén, Alejandro J Yáñez, Luis Jimenez","doi":"10.1186/s12859-024-05972-7","DOIUrl":"10.1186/s12859-024-05972-7","url":null,"abstract":"<p><p>Viral proteins that evade the host's innate immune response play a crucial role in pathogenesis, significantly impacting viral infections and potential therapeutic strategies. Identifying these proteins through traditional methods is challenging and time-consuming due to the complexity of virus-host interactions. Leveraging advancements in computational biology, we present VirusHound-II, a novel tool that utilizes machine learning techniques to predict viral proteins evading the innate immune response with high accuracy. We evaluated a comprehensive range of machine learning models, including ensemble methods, neural networks, and support vector machines. Using a dataset of 1337 viral proteins known to evade the innate immune response (VPEINRs) and an equal number of non-VPEINRs, we employed pseudo amino acid composition as the molecular descriptor. Our methodology involved a tenfold cross-validation strategy on 80% of the data for training, followed by testing on an independent dataset comprising the remaining 20%. The random forest model demonstrated superior performance metrics, achieving 0.9290 accuracy, 0.9283 F1 score, 0.9354 precision, and 0.9213 sensitivity in the independent testing phase. These results establish VirusHound-II as an advancement in computational virology, accessible via a user-friendly web application. We anticipate that VirusHound-II will be a crucial resource for researchers, enabling the rapid and reliable prediction of viral proteins evading the innate immune response. This tool has the potential to accelerate the identification of therapeutic targets and enhance our understanding of viral evasion mechanisms, contributing to the development of more effective antiviral strategies and advancing our knowledge of virus-host interactions.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"25 1","pages":"351"},"PeriodicalIF":2.9,"publicationDate":"2024-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11550529/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142614182","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Abstraction-based segmental simulation of reaction networks using adaptive memoization. 基于抽象的反应网络分段仿真,使用自适应内存化。
IF 2.9 3区 生物学
BMC Bioinformatics Pub Date : 2024-11-08 DOI: 10.1186/s12859-024-05966-5
Martin Helfrich, Roman Andriushchenko, Milan Češka, Jan Křetínský, Štefan Martiček, David Šafránek
{"title":"Abstraction-based segmental simulation of reaction networks using adaptive memoization.","authors":"Martin Helfrich, Roman Andriushchenko, Milan Češka, Jan Křetínský, Štefan Martiček, David Šafránek","doi":"10.1186/s12859-024-05966-5","DOIUrl":"10.1186/s12859-024-05966-5","url":null,"abstract":"<p><strong>Background: </strong> Stochastic models are commonly employed in the system and synthetic biology to study the effects of stochastic fluctuations emanating from reactions involving species with low copy-numbers. Many important models feature complex dynamics, involving a state-space explosion, stiffness, and multimodality, that complicate the quantitative analysis needed to understand their stochastic behavior. Direct numerical analysis of such models is typically not feasible and generating many simulation runs that adequately approximate the model's dynamics may take a prohibitively long time.</p><p><strong>Results: </strong> We propose a new memoization technique that leverages a population-based abstraction and combines previously generated parts of simulations, called segments, to generate new simulations more efficiently while preserving the original system's dynamics and its diversity. Our algorithm adapts online to identify the most important abstract states and thus utilizes the available memory efficiently.</p><p><strong>Conclusion: </strong> We demonstrate that in combination with a novel fully automatic and adaptive hybrid simulation scheme, we can speed up the generation of trajectories significantly and correctly predict the transient behavior of complex stochastic systems.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"25 1","pages":"350"},"PeriodicalIF":2.9,"publicationDate":"2024-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11549863/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142614163","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Graph-based machine learning model for weight prediction in protein-protein networks. 基于图的蛋白质-蛋白质网络权重预测机器学习模型。
IF 2.9 3区 生物学
BMC Bioinformatics Pub Date : 2024-11-08 DOI: 10.1186/s12859-024-05973-6
Hajer Akid, Kirsley Chennen, Gabriel Frey, Julie Thompson, Mounir Ben Ayed, Nicolas Lachiche
{"title":"Graph-based machine learning model for weight prediction in protein-protein networks.","authors":"Hajer Akid, Kirsley Chennen, Gabriel Frey, Julie Thompson, Mounir Ben Ayed, Nicolas Lachiche","doi":"10.1186/s12859-024-05973-6","DOIUrl":"10.1186/s12859-024-05973-6","url":null,"abstract":"<p><p>Proteins interact with each other in complex ways to perform significant biological functions. These interactions, known as protein-protein interactions (PPIs), can be depicted as a graph where proteins are nodes and their interactions are edges. The development of high-throughput experimental technologies allows for the generation of numerous data which permits increasing the sophistication of PPI models. However, despite significant progress, current PPI networks remain incomplete. Discovering missing interactions through experimental techniques can be costly, time-consuming, and challenging. Therefore, computational approaches have emerged as valuable tools for predicting missing interactions. In PPI networks, a graph is usually used to model the interactions between proteins. An edge between two proteins indicates a known interaction, while the absence of an edge means the interaction is not known or missed. However, this binary representation overlooks the reliability of known interactions when predicting new ones. To address this challenge, we propose a novel approach for link prediction in weighted protein-protein networks, where interaction weights denote confidence scores. By leveraging data from the yeast Saccharomyces cerevisiae obtained from the STRING database, we introduce a new model that combines similarity-based algorithms and aggregated confidence score weights for accurate link prediction purposes. Our model significantly improves prediction accuracy, surpassing traditional approaches in terms of Mean Absolute Error, Mean Relative Absolute Error, and Root Mean Square Error. Our proposed approach holds the potential for improved accuracy in predicting PPIs, which is crucial for better understanding the underlying biological processes.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"25 1","pages":"349"},"PeriodicalIF":2.9,"publicationDate":"2024-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11546293/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142602864","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Rapid bacterial identification through volatile organic compound analysis and deep learning. 通过挥发性有机化合物分析和深度学习快速识别细菌。
IF 2.9 3区 生物学
BMC Bioinformatics Pub Date : 2024-11-06 DOI: 10.1186/s12859-024-05967-4
Bowen Yan, Lin Zeng, Yanyi Lu, Min Li, Weiping Lu, Bangfu Zhou, Qinghua He
{"title":"Rapid bacterial identification through volatile organic compound analysis and deep learning.","authors":"Bowen Yan, Lin Zeng, Yanyi Lu, Min Li, Weiping Lu, Bangfu Zhou, Qinghua He","doi":"10.1186/s12859-024-05967-4","DOIUrl":"10.1186/s12859-024-05967-4","url":null,"abstract":"<p><strong>Background: </strong>The increasing antimicrobial resistance caused by the improper use of antibiotics poses a significant challenge to humanity. Rapid and accurate identification of microbial species in clinical settings is crucial for precise medication and reducing the development of antimicrobial resistance. This study aimed to explore a method for automatic identification of bacteria using Volatile Organic Compounds (VOCs) analysis and deep learning algorithms.</p><p><strong>Results: </strong>AlexNet, where augmentation is applied, produces the best results. The average accuracy rate for single bacterial culture classification reached 99.24% using cross-validation, and the accuracy rates for identifying the three bacteria in randomly mixed cultures were SA:98.6%, EC:98.58% and PA:98.99%, respectively.</p><p><strong>Conclusion: </strong>This work provides a new approach to quickly identify bacterial microorganisms. Using this method can automatically identify bacteria in GC-IMS detection results, helping clinical doctors quickly detect bacterial species, accurately prescribe medication, thereby controlling epidemics, and minimizing the negative impact of bacterial resistance on society.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"25 1","pages":"347"},"PeriodicalIF":2.9,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11539783/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142590101","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Prediction of antibody-antigen interaction based on backbone aware with invariant point attention. 基于骨干意识和不变点注意力的抗体-抗原相互作用预测。
IF 2.9 3区 生物学
BMC Bioinformatics Pub Date : 2024-11-06 DOI: 10.1186/s12859-024-05961-w
Miao Gu, Weiyang Yang, Min Liu
{"title":"Prediction of antibody-antigen interaction based on backbone aware with invariant point attention.","authors":"Miao Gu, Weiyang Yang, Min Liu","doi":"10.1186/s12859-024-05961-w","DOIUrl":"10.1186/s12859-024-05961-w","url":null,"abstract":"<p><strong>Background: </strong>Antibodies play a crucial role in disease treatment, leveraging their ability to selectively interact with the specific antigen. However, screening antibody gene sequences for target antigens via biological experiments is extremely time-consuming and labor-intensive. Several computational methods have been developed to predict antibody-antigen interaction while suffering from the lack of characterizing the underlying structure of the antibody.</p><p><strong>Results: </strong>Beneficial from the recent breakthroughs in deep learning for antibody structure prediction, we propose a novel neural network architecture to predict antibody-antigen interaction. We first introduce AbAgIPA: an antibody structure prediction network to obtain the antibody backbone structure, where the structural features of antibodies and antigens are encoded into representation vectors according to the amino acid physicochemical features and Invariant Point Attention (IPA) computation methods. Finally, the antibody-antigen interaction is predicted by global max pooling, feature concatenation, and a fully connected layer. We evaluated our method on antigen diversity and antigen-specific antibody-antigen interaction datasets. Additionally, our model exhibits a commendable level of interpretability, essential for understanding underlying interaction mechanisms.</p><p><strong>Conclusions: </strong>Quantitative experimental results demonstrate that the new neural network architecture significantly outperforms the best sequence-based methods as well as the methods based on residue contact maps and graph convolution networks (GCNs). The source code is freely available on GitHub at https://github.com/gmthu66/AbAgIPA .</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"25 1","pages":"348"},"PeriodicalIF":2.9,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11542381/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142590097","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
REDalign: accurate RNA structural alignment using residual encoder-decoder network. REDalign:利用残差编码器-解码器网络进行精确的 RNA 结构配准。
IF 2.9 3区 生物学
BMC Bioinformatics Pub Date : 2024-11-05 DOI: 10.1186/s12859-024-05956-7
Chun-Chi Chen, Yi-Ming Chan, Hyundoo Jeong
{"title":"REDalign: accurate RNA structural alignment using residual encoder-decoder network.","authors":"Chun-Chi Chen, Yi-Ming Chan, Hyundoo Jeong","doi":"10.1186/s12859-024-05956-7","DOIUrl":"10.1186/s12859-024-05956-7","url":null,"abstract":"<p><strong>Background: </strong>RNA secondary structural alignment serves as a foundational procedure in identifying conserved structural motifs among RNA sequences, crucially advancing our understanding of novel RNAs via comparative genomic analysis. While various computational strategies for RNA structural alignment exist, they often come with high computational complexity. Specifically, when addressing a set of RNAs with unknown structures, the task of simultaneously predicting their consensus secondary structure and determining the optimal sequence alignment requires an overwhelming computational effort of <math><mrow><mi>O</mi> <mo>(</mo> <msup><mi>L</mi> <mn>6</mn></msup> <mo>)</mo></mrow> </math> for each RNA pair. Such an extremely high computational complexity makes these methods impractical for large-scale analysis despite their accurate alignment capabilities.</p><p><strong>Results: </strong>In this paper, we introduce REDalign, an innovative approach based on deep learning for RNA secondary structural alignment. By utilizing a residual encoder-decoder network, REDalign can efficiently capture consensus structures and optimize structural alignments. In this learning model, the encoder network leverages a hierarchical pyramid to assimilate high-level structural features. Concurrently, the decoder network, enhanced with residual skip connections, integrates multi-level encoded features to learn detailed feature hierarchies with fewer parameter sets. REDalign significantly reduces computational complexity compared to Sankoff-style algorithms and effectively handles non-nested structures, including pseudoknots, which are challenging for traditional alignment methods. Extensive evaluations demonstrate that REDalign provides superior accuracy and substantial computational efficiency.</p><p><strong>Conclusion: </strong>REDalign presents a significant advancement in RNA secondary structural alignment, balancing high alignment accuracy with lower computational demands. Its ability to handle complex RNA structures, including pseudoknots, makes it an effective tool for large-scale RNA analysis, with potential implications for accelerating discoveries in RNA research and comparative genomics.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"25 1","pages":"346"},"PeriodicalIF":2.9,"publicationDate":"2024-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11539752/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142581001","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PangeBlocks: customized construction of pangenome graphs via maximal blocks. PangeBlocks:通过最大块定制构建泛基因组图。
IF 2.9 3区 生物学
BMC Bioinformatics Pub Date : 2024-11-04 DOI: 10.1186/s12859-024-05958-5
Jorge Avila Cartes, Paola Bonizzoni, Simone Ciccolella, Gianluca Della Vedova, Luca Denti
{"title":"PangeBlocks: customized construction of pangenome graphs via maximal blocks.","authors":"Jorge Avila Cartes, Paola Bonizzoni, Simone Ciccolella, Gianluca Della Vedova, Luca Denti","doi":"10.1186/s12859-024-05958-5","DOIUrl":"10.1186/s12859-024-05958-5","url":null,"abstract":"<p><strong>Background: </strong>The construction of a pangenome graph is a fundamental task in pangenomics. A natural theoretical question is how to formalize the computational problem of building an optimal pangenome graph, making explicit the underlying optimization criterion and the set of feasible solutions. Current approaches build a pangenome graph with some heuristics, without assuming some explicit optimization criteria. Thus it is unclear how a specific optimization criterion affects the graph topology and downstream analysis, like read mapping and variant calling.</p><p><strong>Results: </strong>In this paper, by leveraging the notion of maximal block in a Multiple Sequence Alignment (MSA), we reframe the pangenome graph construction problem as an exact cover problem on blocks called Minimum Weighted Block Cover (MWBC). Then we propose an Integer Linear Programming (ILP) formulation for the MWBC problem that allows us to study the most natural objective functions for building a graph. We provide an implementation of the ILP approach for solving the MWBC and we evaluate it on SARS-CoV-2 complete genomes, showing how different objective functions lead to pangenome graphs that have different properties, hinting that the specific downstream task can drive the graph construction phase.</p><p><strong>Conclusion: </strong>We show that a customized construction of a pangenome graph based on selecting objective functions has a direct impact on the resulting graphs. In particular, our formalization of the MWBC problem, based on finding an optimal subset of blocks covering an MSA, paves the way to novel practical approaches to graph representations of an MSA where the user can guide the construction.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"25 1","pages":"344"},"PeriodicalIF":2.9,"publicationDate":"2024-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11533710/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142575328","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
GPCR-BSD: a database of binding sites of human G-protein coupled receptors under diverse states. GPCR-BSD:人类 G 蛋白偶联受体在不同状态下的结合位点数据库。
IF 2.9 3区 生物学
BMC Bioinformatics Pub Date : 2024-11-04 DOI: 10.1186/s12859-024-05962-9
Fan Liu, Han Zhou, Xiaonong Li, Liangliang Zhou, Chungong Yu, Haicang Zhang, Dongbo Bu, Xinmiao Liang
{"title":"GPCR-BSD: a database of binding sites of human G-protein coupled receptors under diverse states.","authors":"Fan Liu, Han Zhou, Xiaonong Li, Liangliang Zhou, Chungong Yu, Haicang Zhang, Dongbo Bu, Xinmiao Liang","doi":"10.1186/s12859-024-05962-9","DOIUrl":"10.1186/s12859-024-05962-9","url":null,"abstract":"<p><p>G-protein coupled receptors (GPCRs), the largest family of membrane proteins in human body, involve a great variety of biological processes and thus have become highly valuable drug targets. By binding with ligands (e.g., drugs), GPCRs switch between active and inactive conformational states, thereby performing functions such as signal transmission. The changes in binding pockets under different states are important for a better understanding of drug-target interactions. Therefore it is critical, as well as a practical need, to obtain binding sites in human GPCR structures. We report a database (called GPCR-BSD) that collects 127,990 predicted binding sites of 803 GPCRs under active and inactive states (thus 1,606 structures in total). The binding sites were identified from the predicted GPCR structures by executing three geometric-based pocket prediction methods, fpocket, CavityPlus and GHECOM. The server provides query, visualization, and comparison of the predicted binding sites for both GPCR predicted and experimentally determined structures recorded in PDB. We evaluated the identified pockets of 132 experimentally determined human GPCR structures in terms of pocket residue coverage, pocket center distance and redocking accuracy. The evaluation showed that fpocket and CavityPlus methods performed better and successfully predicted orthosteric binding sites in over 60% of the 132 experimentally determined structures. The GPCR Binding Site database is freely accessible at https://gpcrbs.bigdata.jcmsc.cn . This study not only provides a systematic evaluation of the commonly-used fpocket and CavityPlus methods for the first time but also meets the need for binding site information in GPCR studies.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"25 1","pages":"343"},"PeriodicalIF":2.9,"publicationDate":"2024-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11533411/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142575228","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MIPPIS: protein-protein interaction site prediction network with multi-information fusion. MIPPIS:多信息融合的蛋白质-蛋白质相互作用位点预测网络。
IF 2.9 3区 生物学
BMC Bioinformatics Pub Date : 2024-11-04 DOI: 10.1186/s12859-024-05964-7
Shuang Wang, Kaiyu Dong, Dingming Liang, Yunjing Zhang, Xue Li, Tao Song
{"title":"MIPPIS: protein-protein interaction site prediction network with multi-information fusion.","authors":"Shuang Wang, Kaiyu Dong, Dingming Liang, Yunjing Zhang, Xue Li, Tao Song","doi":"10.1186/s12859-024-05964-7","DOIUrl":"10.1186/s12859-024-05964-7","url":null,"abstract":"<p><strong>Background: </strong>The prediction of protein-protein interaction sites plays a crucial role in biochemical processes. Investigating the interaction between viruses and receptor proteins through biological techniques aids in understanding disease mechanisms and guides the development of corresponding drugs. While various methods have been proposed in the past, they often suffer from drawbacks such as long processing times, high costs, and low accuracy.</p><p><strong>Results: </strong>Addressing these challenges, we propose a novel protein-protein interaction site prediction network based on multi-information fusion. In our approach, the initial amino acid features are depicted by the position-specific scoring matrix, hidden Markov model, dictionary of protein secondary structure, and one-hot encoding. Simultaneously, we adopt a multi-channel approach to extract deep-level amino acids features from different perspectives. The graph convolutional network channel effectively extracts spatial structural information. The bidirectional long short-term memory channel treats the amino acid sequence as natural language, capturing the protein's primary structure information. The ProtT5 protein large language model channel outputs a more comprehensive amino acid embedding representation, providing a robust complement to the two aforementioned channels. Finally, the obtained amino acid features are fed into the prediction layer for the final prediction.</p><p><strong>Conclusion: </strong>Compared with six protein structure-based methods and six protein sequence-based methods, our model achieves optimal performance across evaluation metrics, including accuracy, precision, F<sub>1</sub>, Matthews correlation coefficient, and area under the precision recall curve, which demonstrates the superiority of our model.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"25 1","pages":"345"},"PeriodicalIF":2.9,"publicationDate":"2024-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11536593/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142575246","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CUDASW++4.0: ultra-fast GPU-based Smith-Waterman protein sequence database search. CUDASW++4.0:基于 GPU 的超快速史密斯-沃特曼蛋白质序列数据库搜索。
IF 2.9 3区 生物学
BMC Bioinformatics Pub Date : 2024-11-02 DOI: 10.1186/s12859-024-05965-6
Bertil Schmidt, Felix Kallenborn, Alejandro Chacon, Christian Hundt
{"title":"CUDASW++4.0: ultra-fast GPU-based Smith-Waterman protein sequence database search.","authors":"Bertil Schmidt, Felix Kallenborn, Alejandro Chacon, Christian Hundt","doi":"10.1186/s12859-024-05965-6","DOIUrl":"10.1186/s12859-024-05965-6","url":null,"abstract":"<p><strong>Background: </strong>The maximal sensitivity for local pairwise alignment makes the Smith-Waterman algorithm a popular choice for protein sequence database search. However, its quadratic time complexity makes it compute-intensive. Unfortunately, current state-of-the-art software tools are not able to leverage the massively parallel processing capabilities of modern GPUs with close-to-peak performance. This motivates the need for more efficient implementations.</p><p><strong>Results: </strong>CUDASW++4.0 is a fast software tool for scanning protein sequence databases with the Smith-Waterman algorithm on CUDA-enabled GPUs. Our approach achieves high efficiency for dynamic programming-based alignment computation by minimizing memory accesses and instructions. We provide both efficient matrix tiling, and sequence database partitioning schemes, and exploit next generation floating point arithmetic and novel DPX instructions. This leads to close-to-peak performance on modern GPU generations (Ampere, Ada, Hopper) with throughput rates of up to 1.94 TCUPS, 5.01 TCUPS, 5.71 TCUPS on an A100, L40S, and H100, respectively. Evaluation on the Swiss-Prot, UniRef50, and TrEMBL databases shows that CUDASW++4.0 gains over an order-of-magnitude performance improvements over previous GPU-based approaches (CUDASW++3.0, ADEPT, SW#DB). In addition, our algorithm demonstrates significant speedups over top-performing CPU-based tools (BLASTP, SWIPE, SWIMM2.0), can exploit multi-GPU nodes with linear scaling, and features an impressive energy efficiency of up to 15.7 GCUPS/Watt.</p><p><strong>Conclusion: </strong>CUDASW++4.0 changes the standing of GPUs in protein sequence database search with Smith-Waterman alignment by providing close-to-peak performance on modern GPUs. It is freely available at https://github.com/asbschmidt/CUDASW4 .</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"25 1","pages":"342"},"PeriodicalIF":2.9,"publicationDate":"2024-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11531700/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142563753","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信