{"title":"Machine and deep learning to predict viral fusion peptides","authors":"A.M. Sequeira , M. Rocha , Diana Lousa","doi":"10.1016/j.csbj.2025.02.011","DOIUrl":"10.1016/j.csbj.2025.02.011","url":null,"abstract":"<div><div>Viral fusion proteins, located on the surface of enveloped viruses like SARS-CoV-2, Influenza, and HIV, play a vital role in fusing the virus envelope with the host cell membrane. Fusion peptides, conserved segments within these proteins, are crucial for the fusion process and are potential targets for therapy. Experimental identification of fusion peptides is time-consuming and costly, which creates the need for bioinformatics tools that can predict the segment within the fusion protein sequence that corresponds to the FP. Although homology-based methods have been used towards this end, they fail to identify fusion peptides lacking overall sequence similarity to known counterparts. Therefore, alternative methods are needed to discover new putative fusion peptides, namely those based on machine learning. In this study, we explore various ML-based approaches to identify fusion peptides within a fusion protein sequence. We employ token classification methods and sliding window approaches coupled with machine and deep learning models. We evaluate different protein sequence representations, including one-hot encoding, physicochemical features, as well as representations from Natural Language Processing, such as word embeddings and transformers. Through the examination of over 50 combinations of models and features, we achieve promising results, particularly with models based on a state-of-the-art transformer for amino acid token classification. Furthermore, we utilize the best models to predict hypothetical fusion peptides for SARS-CoV-2, and critically analyse annotated peptides from existing research. Overall, our models effectively predict the location of fusion peptides, even in viruses for which limited experimental data is available.</div></div>","PeriodicalId":10715,"journal":{"name":"Computational and structural biotechnology journal","volume":"27 ","pages":"Pages 692-704"},"PeriodicalIF":4.4,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143473603","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yongtian Li , Bianli Gu , Lixia Ma , Li-Na He , Xiaoqiong Bao , Yuantai Huang , Rui Yang , Li Wang , Qingtao Yang , Haibo Yang , Zhixiang Zuo , Shegan Gao , Xueya Zhao , Ke Chen
{"title":"m6A2Circ: A comprehensive database for decoding the regulatory relationship between m6A modification and circular RNA","authors":"Yongtian Li , Bianli Gu , Lixia Ma , Li-Na He , Xiaoqiong Bao , Yuantai Huang , Rui Yang , Li Wang , Qingtao Yang , Haibo Yang , Zhixiang Zuo , Shegan Gao , Xueya Zhao , Ke Chen","doi":"10.1016/j.csbj.2025.02.018","DOIUrl":"10.1016/j.csbj.2025.02.018","url":null,"abstract":"<div><div>Circular RNA (circRNA) is a class of noncoding RNAs derived from back-splicing of pre-mRNAs. Recent studies have increasingly highlighted the pivotal roles of N6-methyladenosine (m6A) in regulating various aspects of circRNA metabolism, including biogenesis, localization, stability, and translation. Despite the importance of m6A in circRNA metabolism, there remains a substantial gap in comprehensive resources dedicated to exploring m6A modification in circRNA. To bridge this significant gap, we present m6A2Circ (<span><span>http://m6a2circ.canceromics.org/</span><svg><path></path></svg></span>), a pioneering database designed to systematically explore the regulatory interactions between m6A modification and circRNA. The m6A2Circ database encompasses 198,804 m6A-circRNA associations derived from diverse human and mouse tissues. These associations are meticulously categorized into four levels of evidence supported either by experimental data or by high-throughput sequencing data. Moreover, the database offers extensive annotations, facilitating research into circRNA function and its potential disease implications. Overall, m6A2Circ aims to benefit the research community and bolster novel discoveries in terms of crosstalk between m6A and circRNA.</div></div>","PeriodicalId":10715,"journal":{"name":"Computational and structural biotechnology journal","volume":"27 ","pages":"Pages 813-820"},"PeriodicalIF":4.4,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143520396","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Evaluation of predictions of disordered binding regions in the CAID2 experiment","authors":"Fuhao Zhang , Lukasz Kurgan","doi":"10.1016/j.csbj.2024.12.009","DOIUrl":"10.1016/j.csbj.2024.12.009","url":null,"abstract":"<div><div>A large portion of the Intrinsically Disordered Regions (IDRs) in protein sequences interact with proteins, nucleic acids, and other types of ligands. Correspondingly, dozens of sequence-based predictors of binding IDRs were developed. A recently completed second community-based Critical Assessments of protein Intrinsic Disorder prediction (CAID2) evaluated 32 predictors of binding IDRs. However, CAID2 considered a rather narrow scenario by testing on 78 proteins with binding IDRs and not differentiating between different ligands, in spite that virtually all predictors target IDRs that interact with specific types of ligands. In that scenario, several intrinsic disorder predictors predict binding IDRs with accuracy equivalent to the best predictors of binding IDRs since large majority of IDRs in the 78 test proteins are binding. We substantially extended the CAID2’s evaluation by using the entire CAID2 dataset of 348 proteins and considering several arguably more practical scenarios. We assessed whether predictors accurately differentiate binding IDRs from other types of IDRs and how they perform when predicting IDRs that interact with different ligand types. We found that intrinsic disorder predictors cannot accurately identify binding IDRs among other disordered regions, majority of the predictors of binding IDRs are ligand type agnostic (i.e., they cross predict binding in IDRs that interact with ligands that they do not cover), and only a handful of predictors of binding IDRs perform relatively well and generate reasonably low amounts of cross predictions. We also suggest a number of future research directions that would move this active field of research forward.</div></div>","PeriodicalId":10715,"journal":{"name":"Computational and structural biotechnology journal","volume":"27 ","pages":"Pages 78-88"},"PeriodicalIF":4.4,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11732247/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142982731","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Detection of circular permutations by Protein Language Models","authors":"Yue Hu , Bin Huang , Chun Zi Zang , Jia Jie Xu","doi":"10.1016/j.csbj.2024.12.029","DOIUrl":"10.1016/j.csbj.2024.12.029","url":null,"abstract":"<div><div>Protein circular permutations are crucial for understanding protein evolution and functionality. Traditional detection methods face challenges: sequence-based approaches struggle with detecting distant homologs, while structure-based approaches are limited by the need for structure generation and often treat proteins as rigid bodies. Protein Language Model-based alignment tools have shown advantages in utilizing sequence information to overcome the challenges of detecting distant homologs without requiring structural input. However, many current Protein Language Model-based alignment methods, which rely on sequence alignment algorithms like the Smith-Waterman algorithm, face significant difficulties when dealing with circular permutation (CP) due to their dependency on linear sequence order. This sequence order dependency makes them unsuitable for accurately detecting CP. Our approach, named plmCP, combines classical genetic principles with modern alignment techniques leveraging Protein Language Models to address these limitations. By integrating genetic knowledge, the plmCP method avoids the sequence order dependency, allowing for effective detection of circular permutations and contributing significantly to protein research and engineering by embracing structural flexibility.</div></div>","PeriodicalId":10715,"journal":{"name":"Computational and structural biotechnology journal","volume":"27 ","pages":"Pages 214-220"},"PeriodicalIF":4.4,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11757225/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143045039","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"InTiCAR: Network-based identification of significant inter-tissue communicators for autoimmune diseases","authors":"Kwansoo Kim, Manyoung Han, Doheon Lee","doi":"10.1016/j.csbj.2025.01.003","DOIUrl":"10.1016/j.csbj.2025.01.003","url":null,"abstract":"<div><div>Inter-tissue communicators (ITCs) are intricate and essential aspects of our body, as they are the keepers of homeostatic equilibrium. It is no surprise that the dysregulation of the exchange between tissues are at the core of various disorders. Among such conditions, autoimmune diseases (AIDs) refer to a collection of pathological conditions where the miscommunication drives the immune system to mistakenly attack one's own body. Due to their myriad and diverse pathophysiologies, AIDs cannot be easily diagnosed or treated, and continuous efforts are required to seek for potential diagnostic markers or therapeutic targets. The identification of ITCs with significant involvement in the disease states is therefore crucial. Here, we present InTiCAR, <u>In</u>ter-<u>Ti</u>ssue <u>C</u>ommunicators for <u>A</u>utoimmune diseases by <u>R</u>andom walk with restart, which is a network exploration-based analysis method that suggests disease-specific ITCs based on prior knowledge of disease genes, without the need for the external expression data. We first show that distinct ITC profile s can be acquired for various diseases by InTiCAR. We further illustrate that, for autoimmune diseases (AIDs) specifically, the disease-specific ITCs outperform disease genes in diagnosing patients using the UK Biobank plasma proteome dataset. Also, through CMap LINCS dataset, we find that high perturbation on the AIDs genes can be observed by the disease-specific ITCs. Our results provide and highlight unique perspectives on biological network analysis by focusing on the entities of extracellular communications.</div></div>","PeriodicalId":10715,"journal":{"name":"Computational and structural biotechnology journal","volume":"27 ","pages":"Pages 333-345"},"PeriodicalIF":4.4,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11782887/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143078815","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Nanopore sequencing of protozoa: Decoding biological information on a string of biochemical molecules into human-readable signals","authors":"Branden Hunter , Timothy Cromwell , Hyunjin Shim","doi":"10.1016/j.csbj.2025.01.002","DOIUrl":"10.1016/j.csbj.2025.01.002","url":null,"abstract":"<div><div>Biological information is encoded in a sequence of biochemical molecules such as nucleic acids and amino acids, and nanopore sequencing is a long-read sequencing technology capable of directly decoding these molecules into human-readable signals. The long reads from nanopore sequencing offer the advantage of obtaining contiguous information, which is particularly beneficial for decoding complex or repetitive regions in a genome. In this study, we investigated the efficacy of nanopore sequencing in decoding biological information from distinctive genomes in metagenomic samples, which pose significant challenges for traditional short-read sequencing technologies. Specifically, we sequenced blood and fecal samples from mice infected with <em>Trypanosoma brucei</em>, a unicellular protozoan known for its hypervariable and dynamic regions that help it evade host immunity. Such characteristics are also prevalent in other host-dependent parasites, such as bacteriophages. The taxonomic classification results showed a high proportion of nanopore reads identified as <em>T. brucei</em> in the infected blood samples, with no significant identification in the control blood samples and fecal samples. Furthermore, metagenomic de novo assembly of these nanopore reads yielded contigs that mapped to the reference genome of <em>T. brucei</em> in the infected blood samples with over 96 % accuracy. This exploratory work demonstrates the potential of nanopore sequencing for the challenging task of classifying and assembling hypervariable and dynamic genomes from metagenomic samples.</div></div>","PeriodicalId":10715,"journal":{"name":"Computational and structural biotechnology journal","volume":"27 ","pages":"Pages 440-450"},"PeriodicalIF":4.4,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143098821","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Siyuan Dai , Kai Ye , Charlie Zhan , Haoteng Tang , Liang Zhan
{"title":"SIN-Seg: A joint spatial-spectral information fusion model for medical image segmentation","authors":"Siyuan Dai , Kai Ye , Charlie Zhan , Haoteng Tang , Liang Zhan","doi":"10.1016/j.csbj.2025.02.024","DOIUrl":"10.1016/j.csbj.2025.02.024","url":null,"abstract":"<div><div>In recent years, the application of deep convolutional neural networks (DCNNs) to medical image segmentation has shown significant promise in computer-aided detection and diagnosis (CAD). Leveraging features from different spaces (i.e. Euclidean, non-Euclidean, and spectrum spaces) and multi-modalities of data have the potential to improve the information available to the CAD system, enhancing both effectiveness and efficiency. However, directly acquiring data from different spaces across multi-modalities is often prohibitively expensive and time-consuming. Consequently, most current medical image segmentation techniques are confined to the spatial domain, which is limited to utilizing scanned images from MRI, CT, PET, etc. Here, we introduce an innovative Joint Spatial-Spectral Information Fusion method which requires no additional data collection for CAD. We translate existing single-modality data into a new domain to extract features from an alternative space. Specifically, we apply Discrete Cosine Transformation (DCT) to enter the spectrum domain, thereby accessing supplementary feature information from an alternate space. Recognizing that information from different spaces typically necessitates complex alignment modules, we introduce a contrastive loss function for achieving feature alignment before synchronizing information across different feature spaces. Our empirical results illustrate the greater effectiveness of our model in harnessing additional information from the spectrum-based space and affirm its superior performance against influential state-of-the-art segmentation baselines. The code is available at <span><span>https://github.com/Auroradsy/SIN-Seg</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":10715,"journal":{"name":"Computational and structural biotechnology journal","volume":"27 ","pages":"Pages 744-752"},"PeriodicalIF":4.4,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143488607","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zongyuan Yu , Yating Liang , Meida Xiang , Kang Xu , Xiang Xu , Dongyang Ran , Yawen Luo , Bijia Chen , Xiaochen Bo , Hebing Chen
{"title":"Identifying associations between short tandem repeat sequences and gene expression in yeast reveals specific repeated motifs encoding transcriptional regulatory proteins","authors":"Zongyuan Yu , Yating Liang , Meida Xiang , Kang Xu , Xiang Xu , Dongyang Ran , Yawen Luo , Bijia Chen , Xiaochen Bo , Hebing Chen","doi":"10.1016/j.csbj.2025.02.003","DOIUrl":"10.1016/j.csbj.2025.02.003","url":null,"abstract":"<div><div>Tandem repeat sequences (TRs), a class of repetitive genomic elements, are broadly distributed in both coding and non-coding regions. Investigating the relationship between sequences and function is essential for understanding the genome. <em>Saccharomyces cerevisiae</em> serves as a vital model organism and is widely used as an engineered strain. Although the transcriptional regulatory functions of TRs in the promoters of <em>S.cerevisiae</em> have been elucidated, our understanding of their roles within coding sequences (CDS) remains limited. In this study, we integrate RNA-seq, ChIP-seq, ATAC-seq, Hi-C, and Micro-C data from <em>S.cerevisiae</em> to analyze the types and distribution of TRs, and their impact on gene expression. Our results indicate that genes containing short tandem repeats (STRs) in their CDS exhibit lower expression levels. Epigenetic analysis reveals that these regions are characterized by high levels of repressive histone modifications and low levels of activating marks, with reduced chromatin accessibility and fewer chromatin interactions. Furthermore, trinucleotide and hexanucleotide repeated motifs of STR are found primarily enriched in genes encoding transcriptional regulatory proteins. This study provides new insights into the functions and characteristics of STRs in the CDS of <em>S.cerevisiae</em>. The identification of key STR motifs offers potential targets for the design of transcriptional regulatory elements.</div></div>","PeriodicalId":10715,"journal":{"name":"Computational and structural biotechnology journal","volume":"27 ","pages":"Pages 705-716"},"PeriodicalIF":4.4,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143478780","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ewa Skrzetuska , Grzegorz Szparaga , Karolina Wilgocka
{"title":"Evaluation of the impact of printing and embroidery parameters in the process of obtaining utility comfort sensors used in protective clothing dedicated to premature babies","authors":"Ewa Skrzetuska , Grzegorz Szparaga , Karolina Wilgocka","doi":"10.1016/j.csbj.2025.02.035","DOIUrl":"10.1016/j.csbj.2025.02.035","url":null,"abstract":"<div><div>Biophysical comfort is one of the most important criteria for evaluating children’s clothing products, as it contributes to maintaining to the thermal balance between the human body and the surrounding environment in which the newborn resides. This article describes the influence of screen printing and machine embroidery on the development of sensors designed to measure skin parameters such as temperature and humidity using a paste containing carbon nanotubes and four different electrically conductive yarns. An additional parameter examined was the embroidery (density, with two variants: 80 % filling and 60 % filling). The experimental part of the research involved testing surface mass, material thickness, air permeability, heat resistance and water vapor resistance as well as assessing sensory and conductive properties. All prints and embroideries discussed in the study were applied to the author's original three-layer system which has thermal resistance and water vapor resistance properties at levels that ensure the safety of prematurely born children by protecting them from excessive moisture loss and maintaining thermal comfort when they are outside the incubator. The resistance of all electrodes was below 12.22 Ω, both for samples after the washing and sterilization processes.</div></div>","PeriodicalId":10715,"journal":{"name":"Computational and structural biotechnology journal","volume":"29 ","pages":"Pages 41-51"},"PeriodicalIF":4.4,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143534966","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}