{"title":"mACPpred 2.0: Stacked Deep Learning for Anticancer Peptide Prediction with Integrated Spatial and Probabilistic Feature Representations","authors":"","doi":"10.1016/j.jmb.2024.168687","DOIUrl":"10.1016/j.jmb.2024.168687","url":null,"abstract":"<div><p>Anticancer peptides (ACPs), naturally occurring molecules with remarkable potential to target and kill cancer cells. However, identifying ACPs based solely from their primary amino acid sequences remains a major hurdle in immunoinformatics. In the past, several web-based machine learning (ML) tools have been proposed to assist researchers in identifying potential ACPs for further testing. Notably, our meta-approach method, mACPpred, introduced in 2019, has significantly advanced the field of ACP research. Given the exponential growth in the number of characterized ACPs, there is now a pressing need to create an updated version of mACPpred. To develop mACPpred 2.0, we constructed an up-to-date benchmarking dataset by integrating all publicly available ACP datasets. We employed a large-scale of feature descriptors, encompassing both conventional feature descriptors and advanced pre-trained natural language processing (NLP)-based embeddings. We evaluated their ability to discriminate between ACPs and non-ACPs using eleven different classifiers. Subsequently, we employed a stacked deep learning (SDL) approach, incorporating 1D convolutional neural network (1D CNN) blocks and hybrid features. These features included the top seven performing NLP-based features and 90 probabilistic features, allowing us to identify hidden patterns within these diverse features and improve the accuracy of our ACP prediction model. This is the first study to integrate spatial and probabilistic feature representations for predicting ACPs. Rigorous cross-validation and independent tests conclusively demonstrated that mACPpred 2.0 not only surpassed its predecessor (mACPpred) but also outperformed the existing state-of-the-art predictors, highlighting the importance of advanced feature representation capabilities attained through SDL. To facilitate widespread use and accessibility, we have developed a user-friendly for mACPpred 2.0, available at <span><span>https://balalab-skku.org/mACPpred2/</span><svg><path></path></svg></span>.</p></div>","PeriodicalId":369,"journal":{"name":"Journal of Molecular Biology","volume":"436 17","pages":"Article 168687"},"PeriodicalIF":4.7,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0022283624002894/pdfft?md5=ecdf80bb684910ec5433145962a8f247&pid=1-s2.0-S0022283624002894-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141511139","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"CATH 2024: CATH-AlphaFlow Doubles the Number of Structures in CATH and Reveals Nearly 200 New Folds","authors":"","doi":"10.1016/j.jmb.2024.168551","DOIUrl":"10.1016/j.jmb.2024.168551","url":null,"abstract":"<div><p>CATH (<span><span>https://www.cathdb.info</span><svg><path></path></svg></span>) classifies domain structures from experimental protein structures in the PDB and predicted structures in the AlphaFold Database (AFDB). To cope with the scale of the predicted data a new NextFlow workflow (CATH-AlphaFlow), has been developed to classify high-quality domains into CATH superfamilies and identify novel fold groups and superfamilies. CATH-AlphaFlow uses a novel state-of-the-art structure-based domain boundary prediction method (ChainSaw) for identifying domains in multi-domain proteins. We applied CATH-AlphaFlow to process PDB structures not classified in CATH and AFDB structures from 21 model organisms, expanding CATH by over 100%.</p><p>Domains not classified in existing CATH superfamilies or fold groups were used to seed novel folds, giving 253 new folds from PDB structures (September 2023 release) and 96 from AFDB structures of proteomes of 21 model organisms. Where possible, functional annotations were obtained using (i) predictions from publicly available methods (ii) annotations from structural relatives in AFDB/UniProt50. We also predicted functional sites and highly conserved residues. Some folds are associated with important functions such as photosynthetic acclimation (in flowering plants), iron permease activity (in fungi) and post-natal spermatogenesis (in mice).</p><p>CATH-AlphaFlow will allow us to identify many more CATH relatives in the AFDB, further characterising the protein structure landscape.</p></div>","PeriodicalId":369,"journal":{"name":"Journal of Molecular Biology","volume":"436 17","pages":"Article 168551"},"PeriodicalIF":4.7,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0022283624001463/pdfft?md5=7f042c9d519839cc743c6f8330403192&pid=1-s2.0-S0022283624001463-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140317526","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Enricherator: A Bayesian Method for Inferring Regularized Genome-wide Enrichments from Sequencing Count Data","authors":"","doi":"10.1016/j.jmb.2024.168567","DOIUrl":"10.1016/j.jmb.2024.168567","url":null,"abstract":"<div><p>A pervasive question in biological research studying gene regulation, chromatin structure, or genomics is where, and to what extent, does a signal of interest arise genome-wide? This question is addressed using a variety of methods relying on high-throughput sequencing data as their final output, including ChIP-seq for protein-DNA interactions,<span><span><sup>1</sup></span></span> GapR-seq for measuring supercoiling,<span><span><sup>2</sup></span></span> and HBD-seq or DRIP-seq for R-loop positioning.<span><span>3</span></span>, <span><span>4</span></span> Current computational methods to calculate genome-wide enrichment of the signal of interest usually do not properly handle the count-based nature of sequencing data, they often do not make use of the local correlation structure of sequencing data, and they do not apply any regularization of enrichment estimates. This can result in unrealistic estimates of the true underlying biological enrichment of interest, unrealistically low estimates of confidence in point estimates of enrichment (or no estimates of confidence at all), unrealistic gyrations in enrichment estimates at very close (<10 bp) genomic loci due to noise inherent in sequencing data, and in a multiple-hypothesis testing problem during interpretation of genome-wide enrichment estimates. We developed a tool called Enricherator to infer genome-wide enrichments from sequencing count data. Enricherator uses the variational Bayes algorithm to fit a generalized linear model to sequencing count data and to sample from the approximate posterior distribution of enrichment estimates (<span><span>https://github.com/jwschroeder3/enricherator</span><svg><path></path></svg></span>). Enrichments inferred by Enricherator more precisely identify known binding sites in cases where low coverage between binding sites leads to false-positive peak calls in these noisy regions of the genome; these benefits extend to published datasets.</p></div>","PeriodicalId":369,"journal":{"name":"Journal of Molecular Biology","volume":"436 17","pages":"Article 168567"},"PeriodicalIF":4.7,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0022283624001621/pdfft?md5=12eadc9303ecf2b7325490d62b957d44&pid=1-s2.0-S0022283624001621-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140592026","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"GalaxySagittarius-AF: Predicting Targets for Drug-Like Compounds in the Extended Human 3D Proteome","authors":"","doi":"10.1016/j.jmb.2024.168617","DOIUrl":"10.1016/j.jmb.2024.168617","url":null,"abstract":"<div><p>In recent years, advancements in deep learning techniques have significantly expanded the structural coverage of the human proteome. GalaxySagittarius-AF translates these achievements in structure prediction into target prediction for druglike compounds by incorporating predicted structures. This web server searches the database of human protein structures using both similarity- and structure-based approaches, suggesting potential targets for a given druglike compound. In comparison to its predecessor, GalaxySagittarius, GalaxySagittarius-AF utilizes an enlarged structure database, incorporating curated AlphaFold model structures alongside their binding sites and ligands, predicted using an updated version of GalaxySite. GalaxySagittarius-AF covers a large human protein space compared to many other available computational target screening methods. The structure-based prediction method enhances the use of expanded structural information, differentiating it from other target prediction servers that rely on ligand-based methods. Additionally, the web server has undergone enhancements, operating two to three times faster than its predecessor. The updated report page provides comprehensive information on the sequence and structure of the predicted protein targets. GalaxySagittarius-AF is accessible at <span><span>https://galaxy.seoklab.org/sagittarius_af</span><svg><path></path></svg></span> without the need for registration.</p></div>","PeriodicalId":369,"journal":{"name":"Journal of Molecular Biology","volume":"436 17","pages":"Article 168617"},"PeriodicalIF":4.7,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0022283624002122/pdfft?md5=0e0c23dccda32199932ab93f923b91bb&pid=1-s2.0-S0022283624002122-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141051417","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fungtion: A Server for Predicting and Visualizing Fungal Effector Proteins","authors":"","doi":"10.1016/j.jmb.2024.168613","DOIUrl":"10.1016/j.jmb.2024.168613","url":null,"abstract":"<div><p>Fungal pathogens pose significant threats to plant health by secreting effectors that manipulate plant-host defences. However, identifying effector proteins remains challenging, in part because they lack common sequence motifs. Here, we introduce Fungtion (<u>Fung</u>al effector predic<u>tion</u>), a toolkit leveraging a hybrid framework to accurately predict and visualize fungal effectors. By combining global patterns learned from pretrained protein language models with refined information from known effectors, Fungtion achieves state-of-the-art prediction performance. Additionally, the interactive visualizations we have developed enable researchers to explore both sequence- and high-level relationships between the predicted and known effectors, facilitating effector function discovery, annotation, and hypothesis formulation regarding plant-pathogen interactions. We anticipate Fungtion to be a valuable resource for biologists seeking deeper insights into fungal effector functions and for computational biologists aiming to develop future methodologies for fungal effector prediction: <span><span>https://step3.erc.monash.edu/Fungtion/</span><svg><path></path></svg></span>.</p></div>","PeriodicalId":369,"journal":{"name":"Journal of Molecular Biology","volume":"436 17","pages":"Article 168613"},"PeriodicalIF":4.7,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0022283624002080/pdfft?md5=36d94fbec14088b549acb51c24012b05&pid=1-s2.0-S0022283624002080-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141143680","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"NucMap 2.0: An Updated Database of Genome-wide Nucleosome Positioning Maps Across Species","authors":"","doi":"10.1016/j.jmb.2024.168655","DOIUrl":"10.1016/j.jmb.2024.168655","url":null,"abstract":"<div><p>Nucleosome dynamics plays important roles in many biological processes, such as DNA replication and gene expression. NucMap (<span><span>https://ngdc.cncb.ac.cn/nucmap</span><svg><path></path></svg></span>) is the first database of genome-wide nucleosome positioning maps across species. Here, we present an updated version, NucMap 2.0, by incorporating more species and MNase-seq samples. In addition, we integrate other related omics data for each MNase-seq sample to provide a comprehensive view of nucleosome positioning, such as gene expression, transcription factor binding sites, histone modifications and DNA methylation. In particular, NucMap 2.0 integrates and pre-analyzes RNA-seq data and ChIP-seq data of human-related samples, which facilitates the interpretation of nucleosome positioning in humans. All processed data are integrated into an in-built genome browser, and users can make comprehensive side-by-side analyses. In addition, more online analytical functions are developed, which allows researchers to identify differential nucleosome regions and explore potential gene regulatory regions. All resources are open access with a user-friendly web interface.</p></div>","PeriodicalId":369,"journal":{"name":"Journal of Molecular Biology","volume":"436 17","pages":"Article 168655"},"PeriodicalIF":4.7,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S002228362400250X/pdfft?md5=05c0ca9f6c37361600fa1c82182f3970&pid=1-s2.0-S002228362400250X-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141327075","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"PPInterface: A Comprehensive Dataset of 3D Protein-Protein Interface Structures","authors":"","doi":"10.1016/j.jmb.2024.168686","DOIUrl":"10.1016/j.jmb.2024.168686","url":null,"abstract":"<div><p>The PPInterface dataset contains 815,082 interface structures, providing the most comprehensive structural information on protein–protein interfaces. This resource is extracted from over 215,000 three-dimensional protein structures stored in the Protein Data Bank (PDB). The dataset contains a wide range of protein complexes, providing a wealth of information for researchers investigating the structural properties of protein–protein interactions. The accompanying web server has a user-friendly interface that allows for efficient search and download functions. Researchers can access detailed information on protein interface structures, visualize them, and explore a variety of features, increasing the dataset’s utility and accessibility.</p><p>The dataset and web server can be found at <span><span>https://3dpath.ku.edu.tr/PPInt/</span><svg><path></path></svg></span>.</p></div>","PeriodicalId":369,"journal":{"name":"Journal of Molecular Biology","volume":"436 17","pages":"Article 168686"},"PeriodicalIF":4.7,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0022283624002882/pdfft?md5=8c06cb4d0f228da90e95d1e5dc422504&pid=1-s2.0-S0022283624002882-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141465122","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"3dRNA/DNA: 3D Structure Prediction from RNA to DNA","authors":"Yi Zhang, Yiduo Xiong, Chenxi Yang, Yi Xiao","doi":"10.1016/j.jmb.2024.168742","DOIUrl":"10.1016/j.jmb.2024.168742","url":null,"abstract":"<div><p>There is an increasing need for determining 3D structures of DNAs, e.g., for increasing the efficiency of DNA aptamer selection. Recently, we have proposed a computational method of 3D structure prediction of DNAs, called 3dDNA, which has been integrated into our original web server 3dRNA, now renamed 3dRNA/DNA (<span><span>http://biophy.hust.edu.cn/new/3dRNA</span><svg><path></path></svg></span>). Currently, 3dDNA can only output the predicted DNA 3D structures for users but cannot rank them as an energy function for assessing DNA 3D structures is still lacking. Here, we first provide a brief introduction to 3dDNA and then introduce a new energy function, 3dDNAscore, for the assessment of DNA 3D structures. 3dDNAscore is an all-atom knowledge-based potential by integrating 86 atomic types from nucleic acids. Benchmarks demonstrate that 3dDNAscore can effectively identify near-native structures from the decoys generated by 3dDNA, thus enhancing the completeness of 3dDNA.</p></div>","PeriodicalId":369,"journal":{"name":"Journal of Molecular Biology","volume":"436 17","pages":"Article 168742"},"PeriodicalIF":4.7,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142129516","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"AmyloComp: A Bioinformatic Tool for Prediction of Amyloid Co-aggregation","authors":"","doi":"10.1016/j.jmb.2024.168437","DOIUrl":"10.1016/j.jmb.2024.168437","url":null,"abstract":"<div><p>Typically, amyloid fibrils consist of multiple copies of the same protein. In these fibrils, each polypeptide chain adopts the same β-arc-containing conformation and these chains are stacked in a parallel and in-register manner. In the last few years, however, a considerable body of data has been accumulated about co-aggregation of different amyloid-forming proteins. Among known examples of the co-aggregation are heteroaggregates of different yeast prions and human proteins Rip1 and Rip3. Since the co-aggregation is linked to such important phenomena as infectivity of amyloids and molecular mechanisms of functional amyloids, we analyzed its structural aspects in more details. An axial stacking of different proteins within the same amyloid fibril is one of the most common type of co-aggregation. By using an approach based on structural similarity of the growing tips of amyloids, we developed a computational method to predict amyloidogenic β-arch structures that are able to interact with each other by the axial stacking. Furthermore, we compiled a dataset consisting of 26 experimentally known pairs of proteins capable or incapable to co-aggregate. We utilized this dataset to test and refine our algorithm. The developed method opens a way for a number of applications, including the identification of microbial proteins capable triggering amyloidosis in humans. AmyloComp is available on the website: <span><span>https://bioinfo.crbm.cnrs.fr/index.php?route=tools&tool=30</span><svg><path></path></svg></span>.</p></div>","PeriodicalId":369,"journal":{"name":"Journal of Molecular Biology","volume":"436 17","pages":"Article 168437"},"PeriodicalIF":4.7,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0022283624000032/pdfft?md5=7c4b0171bee8cb64ea160d5cea06ba57&pid=1-s2.0-S0022283624000032-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139104651","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"CAPRI-Q: The CAPRI resource evaluating the quality of predicted structures of protein complexes","authors":"","doi":"10.1016/j.jmb.2024.168540","DOIUrl":"10.1016/j.jmb.2024.168540","url":null,"abstract":"<div><p>Protein interactions are essential for cellular processes. In recent years there has been significant progress in computational prediction of 3D structures of individual protein chains, with the best-performing algorithms reaching sub-Ångström accuracy. These techniques are now finding their way into the prediction of protein interactions, adding to the existing modeling approaches. The community-wide Critical Assessment of Predicted Interactions (CAPRI) has been a catalyst for the development of procedures for the structural modeling of protein assemblies by organizing blind prediction experiments. The predicted structures are assessed against unpublished experimentally determined structures using a set of metrics with proven robustness that have been established in the CAPRI community. In addition, several advanced benchmarking databases provide targets against which users can test docking and assembly modeling software. These include the Protein-Protein Docking Benchmark, the CAPRI Scoreset, and the <span>Dockground</span> database, all developed by members of the CAPRI community. Here we present CAPRI-Q, a stand-alone model quality assessment tool, which can be freely downloaded or used via a publicly available web server. This tool applies the CAPRI metrics to assess the quality of query structures against given target structures, along with other popular quality metrics such as DockQ, TM-score and <em>l</em>-DDT, and classifies the models according to the CAPRI model quality criteria. The tool can handle a variety of protein complex types including those involving peptides, nucleic acids, and oligosaccharides. The source code is freely available from <span><span>https://gitlab.in2p3.fr/cmsb-public/CAPRI-Q</span><svg><path></path></svg></span> and its web interface through the <span>Dockground</span> resource at <span><span>https://dockground.compbio.ku.edu/assessment/</span><svg><path></path></svg></span>.</p></div>","PeriodicalId":369,"journal":{"name":"Journal of Molecular Biology","volume":"436 17","pages":"Article 168540"},"PeriodicalIF":4.7,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0022283624001359/pdfft?md5=4b997150389807ec96ba0668e678acea&pid=1-s2.0-S0022283624001359-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140156597","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}