Impact of Alignments on the Accuracy of Protein Subcellular Localization Predictions.

IF 3.2 4区生物学 Q2 BIOCHEMISTRY & MOLECULAR BIOLOGY

Proteins-Structure Function and Bioinformatics Pub Date : 2025-03-01 Epub Date: 2024-11-22 DOI:10.1002/prot.26767

Maryam Gillani, Gianluca Pollastri

{"title":"Impact of Alignments on the Accuracy of Protein Subcellular Localization Predictions.","authors":"Maryam Gillani, Gianluca Pollastri","doi":"10.1002/prot.26767","DOIUrl":null,"url":null,"abstract":"<p><p>Alignments in bioinformatics refer to the arrangement of sequences to identify regions of similarity that can indicate functional, structural, or evolutionary relationships. They are crucial for bioinformaticians as they enable accurate predictions and analyses in various applications, including protein subcellular localization. The predictive model used in this article is based on a deep - convolutional architecture. We tested configurations of Deep N-to-1 convolutional neural networks of various depths and widths during experimentation for the evaluation of better-performing values across a diverse set of eight classes. For without alignment assessment, sequences are encoded using one-hot encoding, converting each character into a numerical representation, which is straightforward for non-numerical data and useful for machine learning models. For with alignments assessment, multiple sequence alignments (MSAs) are created using PSI-BLAST, capturing evolutionary information by calculating frequencies of residues and gaps. The average difference in peak performance between models with alignments and without alignments is approximately 15.82%. The average difference in the highest accuracy achieved with alignments compared with without alignments is approximately 15.16%. Thus, extensive experimentation indicates that higher alignment accuracy implies a more reliable model and improved prediction accuracy, which can be trusted to deliver consistent performance across different layers and classes of subcellular localization predictions. This research provides valuable insights into prediction accuracies with and without alignments, offering bioinformaticians an effective tool for better understanding while potentially reducing the need for extensive experimental validations. The source code and datasets are available at http://distilldeep.ucd.ie/SCL8/.</p>","PeriodicalId":56271,"journal":{"name":"Proteins-Structure Function and Bioinformatics","volume":" ","pages":"745-759"},"PeriodicalIF":3.2000,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11809130/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proteins-Structure Function and Bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1002/prot.26767","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/11/22 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Alignments in bioinformatics refer to the arrangement of sequences to identify regions of similarity that can indicate functional, structural, or evolutionary relationships. They are crucial for bioinformaticians as they enable accurate predictions and analyses in various applications, including protein subcellular localization. The predictive model used in this article is based on a deep - convolutional architecture. We tested configurations of Deep N-to-1 convolutional neural networks of various depths and widths during experimentation for the evaluation of better-performing values across a diverse set of eight classes. For without alignment assessment, sequences are encoded using one-hot encoding, converting each character into a numerical representation, which is straightforward for non-numerical data and useful for machine learning models. For with alignments assessment, multiple sequence alignments (MSAs) are created using PSI-BLAST, capturing evolutionary information by calculating frequencies of residues and gaps. The average difference in peak performance between models with alignments and without alignments is approximately 15.82%. The average difference in the highest accuracy achieved with alignments compared with without alignments is approximately 15.16%. Thus, extensive experimentation indicates that higher alignment accuracy implies a more reliable model and improved prediction accuracy, which can be trusted to deliver consistent performance across different layers and classes of subcellular localization predictions. This research provides valuable insights into prediction accuracies with and without alignments, offering bioinformaticians an effective tool for better understanding while potentially reducing the need for extensive experimental validations. The source code and datasets are available at http://distilldeep.ucd.ie/SCL8/.

查看原文本刊更多论文

对齐对蛋白质亚细胞定位预测准确性的影响。

生物信息学中的排列指的是序列的排列，以确定可表明功能、结构或进化关系的相似性区域。它们对生物信息学家来说至关重要，因为它们能在包括蛋白质亚细胞定位在内的各种应用中进行准确的预测和分析。本文使用的预测模型基于深度卷积架构。在实验过程中，我们测试了不同深度和宽度的深度 N 对 1 卷积神经网络的配置，以评估八个不同类别中表现更好的值。对于无配准评估，序列使用单次编码，将每个字符转换为数字表示，这对于非数字数据来说非常简单，对机器学习模型也很有用。在有比对评估中，使用 PSI-BLAST 创建多序列比对（MSA），通过计算残基和间隙的频率来捕捉进化信息。有比对和无比对模型的峰值性能平均相差约 15.82%。有排列和无排列的最高准确率平均相差约 15.16%。因此，大量实验表明，更高的配准精度意味着更可靠的模型和更高的预测精度，可以信赖它在不同层级和类别的亚细胞定位预测中提供一致的性能。这项研究对有无配准的预测准确性提供了宝贵的见解，为生物信息学家提供了一个有效的工具，使他们能够更好地理解，同时可能减少对大量实验验证的需求。源代码和数据集可从 http://distilldeep.ucd.ie/SCL8/ 获取。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proteins-Structure Function and Bioinformatics 生物-生化与分子生物学

CiteScore

5.90

自引率

3.40%

发文量

172

审稿时长

3 months

期刊介绍： PROTEINS : Structure, Function, and Bioinformatics publishes original reports of significant experimental and analytic research in all areas of protein research: structure, function, computation, genetics, and design. The journal encourages reports that present new experimental or computational approaches for interpreting and understanding data from biophysical chemistry, structural studies of proteins and macromolecular assemblies, alterations of protein structure and function engineered through techniques of molecular biology and genetics, functional analyses under physiologic conditions, as well as the interactions of proteins with receptors, nucleic acids, or other specific ligands or substrates. Research in protein and peptide biochemistry directed toward synthesizing or characterizing molecules that simulate aspects of the activity of proteins, or that act as inhibitors of protein function, is also within the scope of PROTEINS. In addition to full-length reports, short communications (usually not more than 4 printed pages) and prediction reports are welcome. Reviews are typically by invitation; authors are encouraged to submit proposed topics for consideration.