Kai Shi , Qisheng He , Pengyang Zhao , Lin Li , Qiaohui Liu , Zhengxia Wu , Yanjun Wang , Huachen Dong , Juehua Yu
{"title":"BGMDB: A curated database linking gut microbiota dysbiosis to brain disorders","authors":"Kai Shi , Qisheng He , Pengyang Zhao , Lin Li , Qiaohui Liu , Zhengxia Wu , Yanjun Wang , Huachen Dong , Juehua Yu","doi":"10.1016/j.csbj.2025.02.034","DOIUrl":"10.1016/j.csbj.2025.02.034","url":null,"abstract":"<div><div>The gut microbiota is a fundamental component of human health and has been increasingly implicated in the etiology of neurological disorders. Neurotransmitters, acting as key mediators of gut-brain communication, are closely associated with both the progression and therapeutic modulation of brain diseases. Despite significant advancements in microbiome research, the complex interplay between gut microbiota and neurological disorders remains poorly understood, and a comprehensive resource integrating these associations is lacking. To bridge this gap, we developed the Brain Disease Gut Microbiota Database (BGMDB), a rigorously curated repository documenting experimentally validated relationships between gut microbiota and brain diseases. BGMDB encompasses 1419 associations involving 609 gut microbiota taxa and 43 brain disorders, along with 184 tripartite interactions linking brain diseases, neurotransmitters, and microbiota across six neurotransmitter systems. Additionally, BGMDB integrates genetic data from the gutMGene database, allowing users to explore microbiota-mediated genetic associations with brain disease pathology and neuroanatomical alterations. A user-friendly interface enables researchers to navigate relevant information through graphical query tools, comprehensive browsing functionalities, and data retrieval options. Our BGMDB provides an unparalleled resource for advancing mechanistic insights into gut-brain interactions, facilitating novel microbiota-targeted therapeutic strategies for neurological disorders. BGMDB is freely available at: <span><span>http://bgmdb.online/bgmdb</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":10715,"journal":{"name":"Computational and structural biotechnology journal","volume":"27 ","pages":"Pages 879-886"},"PeriodicalIF":4.4,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143548767","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Oleksandr Cherednichenko , Alan Herbert , Maria Poptsova
{"title":"Benchmarking DNA large language models on quadruplexes","authors":"Oleksandr Cherednichenko , Alan Herbert , Maria Poptsova","doi":"10.1016/j.csbj.2025.03.007","DOIUrl":"10.1016/j.csbj.2025.03.007","url":null,"abstract":"<div><div>Large language models (LLMs) in genomics have successfully predicted various functional genomic elements. While their performance is typically evaluated using genomic benchmark datasets, it remains unclear which LLM is best suited for specific downstream tasks, particularly for generating whole-genome annotations. Current LLMs in genomics fall into three main categories: transformer-based models, long convolution-based models, and state-space models (SSMs). In this study, we benchmarked three different types of LLM architectures for generating whole-genome maps of G-quadruplexes (GQ), a type of flipons, or non-B DNA structures, characterized by distinctive patterns and functional roles in diverse regulatory contexts. Although GQ forms from folding guanosine residues into tetrads, the computational task is challenging as the bases involved may be on different strands, separated by a large number of nucleotides, or made from RNA rather than DNA. All LLMs performed comparably well, with DNABERT-2 and HyenaDNA achieving superior results based on F1 and MCC. Analysis of whole-genome annotations revealed that HyenaDNA recovered more quadruplexes in distal enhancers and intronic regions. The models were better suited to detecting large GQ arrays that likely contribute to the nuclear condensates involved in gene transcription and chromosomal scaffolds. HyenaDNA and Caduceus formed a separate grouping in the generated de novo quadruplexes, while transformer-based models clustered together. Overall, our findings suggest that different types of LLMs complement each other. Genomic architectures with varying context lengths can detect distinct functional regulatory elements, underscoring the importance of selecting the appropriate model based on the specific genomic task.</div><div>The code and data underlying this article are available at <span><span>https://github.com/powidla/G4s-FMs</span><svg><path></path></svg></span></div></div>","PeriodicalId":10715,"journal":{"name":"Computational and structural biotechnology journal","volume":"27 ","pages":"Pages 992-1000"},"PeriodicalIF":4.4,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143609596","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SUMO-LMNet: Lossless mapping network for predicting SUMOylation sites in SUMO1 and SUMO2 using high-dimensional features","authors":"Cheng-Hsun Ho , Yen-Wei Chu , Lan-Ying Huang , Chi-Wei Chen","doi":"10.1016/j.csbj.2025.03.005","DOIUrl":"10.1016/j.csbj.2025.03.005","url":null,"abstract":"<div><div>Accurate SUMOylation site prediction is crucial for deciphering gene regulation and disease mechanisms. However, distinguishing SUMO1 and SUMO2 modifications remains a major challenge due to their structural similarities. Conventional prediction models often struggle to differentiate between these paralogues, limiting their applicability in biological research. To address this, we introduce SUMO-LMNet, a deep learning-based framework for the precise prediction of SUMO1 and SUMO2 sites. Unlike previous models, SUMO-LMNet integrates a lossless mapping strategy and deep learning architectures to enhance both prediction accuracy and interpretability. Our model extracts high-dimensional features from sequences and transforms them into two-dimensional feature maps, enabling convolutional neural networks (CNNs) to effectively capture both local and global dependencies within the data. By leveraging a Lossless Mapping Network (LM-Net), this approach preserves the original feature space, ensuring that feature integrity is retained without loss of spatial information. While Grad-CAM highlights key features in individual predictions, it lacks consistency across samples and does not provide a dataset-wide evaluation of feature importance. To address this, we introduce Combined Heatmap Feature Analysis (CHFA), which systematically aggregates feature importance across multiple samples, providing a more reliable and interpretable dataset-wide assessment. Experimental results reveal distinct feature dependencies between SUMO1 and SUMO2, underscoring the necessity of paralogue-specific predictive models. Through a systematic comparison of multiple neural network architectures, we demonstrate that our model achieves over 80 % accuracy in distinguishing SUMO1 and SUMO2 modification sites. By prioritizing candidate sites for further study, our model aids experimental design and accelerates the discovery of biologically relevant SUMOylation targets. SUMO-LMNet is publicly available at <span><span>https://predictor.isu.edu.tw/sumo-lmnet</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":10715,"journal":{"name":"Computational and structural biotechnology journal","volume":"27 ","pages":"Pages 1048-1059"},"PeriodicalIF":4.4,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143628580","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Computational analysis reveals temperature-induced stabilization of FAST-PETase","authors":"Peter Stockinger , Cornel Niederhauser , Sebastien Farnaud , Rebecca Buller","doi":"10.1016/j.csbj.2025.03.006","DOIUrl":"10.1016/j.csbj.2025.03.006","url":null,"abstract":"<div><div>More than 10 % of global solid waste consists of poly(ethyleneterephthalate) (PET). Among other techniques, PET hydrolases (PETases) can be used to depolymerize this plastic. However, wildtype PETases exhibit poor specific activities and insufficient thermostability, limiting their use in depolymerization processes which require high temperatures. In 2022, machine learning-aided enzyme engineering of a PETase stemming from the bacterium <em>Ideonella sakaiensis</em> (<em>Is</em>PETase) resulted in a more functional, active, stable, and tolerant variant (FAST-PETase). To rationalize the molecular basis of FAST-PETase’s improved thermal stability, we performed comparative Constraint Network Analysis (CNAnalysis) and Molecular Dynamics (MD) simulations of wildtype <em>Is</em>PETase (WT-PETase) and FAST-PETase at 30°C and 50°C identifying thermolabile sequence stretches in the wildtype enzyme. Further analysis of the backbone flexibility revealed that all mutations of FAST-PETase affected these critical regions. Counterintuitively, the <em>in-silico</em> analyses additionally highlighted that the flexibility of these regions decreased at 50°C in FAST-PETase, instead of exhibiting increased flexibility at higher temperature as would be expected from thermodynamic considerations. This effect was confirmed by physical energy calculations, which suggest that temperature-dependent conformational changes of FAST-PETase decrease the free energy of unfolding (ΔG(stability)) and rigidify the enzyme at elevated temperatures enhancing stability. Looking forward, these findings might help guide the rational engineering of protein thermostability and contribute to our understanding of the thermal adaptation of thermophilic enzymes.</div></div>","PeriodicalId":10715,"journal":{"name":"Computational and structural biotechnology journal","volume":"27 ","pages":"Pages 969-977"},"PeriodicalIF":4.4,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143601491","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kylie L. King , Hamed Abdollahi , Zoe Dinkel , Alannah Akins , Homayoun Valafar , Heather Dunn
{"title":"Pilot study: Initial investigation suggests differences in EMT-associated gene expression in breast tumor regions","authors":"Kylie L. King , Hamed Abdollahi , Zoe Dinkel , Alannah Akins , Homayoun Valafar , Heather Dunn","doi":"10.1016/j.csbj.2025.01.027","DOIUrl":"10.1016/j.csbj.2025.01.027","url":null,"abstract":"<div><div>Triple negative breast cancer (TNBC) is the most aggressive subtype and disproportionately affects African American women. The development of breast cancer is highly associated with interactions between tumor cells and the extracellular matrix (ECM), and recent research suggests that cellular components of the ECM vary between racial groups. This pilot study aimed to evaluate gene expression in TNBC samples from patients who identified as African American and Caucasian using traditional statistical methods and emerging Machine Learning (ML) approaches. ML enables the analysis of complex datasets and the extraction of useful information from small datasets. We selected four regions of interest from tumor biopsy samples and used laser microdissection to extract tissue for gene expression characterization via RT-qPCR. Both parametric and non-parametric statistical analyses identified genes differentially expressed between the two ethnic groups. Out of 40 genes analyzed, 4 were differentially expressed in the edge of tumor (ET) region and 8 in the ECM adjacent to the tumor (ECMT) region. In addition to statistical approach, ML was used to generate decision trees (DT) for a broader analysis of gene expression and ethnicity. Our DT models achieved 83.33 % accuracy and identified the most significant genes, including <em>CD29</em> and <em>EGF</em> from the ET region and <em>SNAI1</em> and <em>CHD2</em> from the ECMT region. All significant genes were analyzed for pathway enrichment using MSigDB and Gene Ontology databases, most notably the epithelial to mesenchymal transition and cell motility pathways. This pilot study highlights key genes of interest that are differentially expressed in African American and Caucasian TNBC samples.</div></div>","PeriodicalId":10715,"journal":{"name":"Computational and structural biotechnology journal","volume":"27 ","pages":"Pages 548-555"},"PeriodicalIF":4.4,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143098048","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Christopher Vorreiter , Dina Robaa, Wolfgang Sippl
{"title":"Predicting fragment binding modes using customized Lennard-Jones potentials in short molecular dynamics simulations","authors":"Christopher Vorreiter , Dina Robaa, Wolfgang Sippl","doi":"10.1016/j.csbj.2024.12.017","DOIUrl":"10.1016/j.csbj.2024.12.017","url":null,"abstract":"<div><div>Reliable in silico prediction of fragment binding modes remains a challenge in current drug design research. Due to their small size and generally low binding affinity, fragments can potentially interact with their target proteins in different ways. In the current study, we propose a workflow aimed at predicting favorable fragment binding sites and binding poses through multiple short molecular dynamics simulations. Tailored Lennard-Jones potentials enable the simulation of systems with high concentrations of identical fragment molecules surrounding their respective target proteins. In the present study, descriptors and binding free energy calculations were implemented to filter out the desired fragment position. The proposed method was tested for its performance using four epigenetic target proteins and their respective fragment binders and showed high accuracy in identifying the binding sites as well as predicting the native binding modes. The approach presented here represents an alternative method for the prediction of fragment binding modes and may be useful in fragment-based drug discovery when the corresponding experimental structural data are limited.</div></div>","PeriodicalId":10715,"journal":{"name":"Computational and structural biotechnology journal","volume":"27 ","pages":"Pages 102-116"},"PeriodicalIF":4.4,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11733276/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143001591","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Katarzyna Szleper , Mateusz Cebula , Oksana Kovalenko , Artur Góra , Agata Raczyńska
{"title":"PUR-GEN: A web server for automated generation of polyurethane fragment libraries","authors":"Katarzyna Szleper , Mateusz Cebula , Oksana Kovalenko , Artur Góra , Agata Raczyńska","doi":"10.1016/j.csbj.2024.12.004","DOIUrl":"10.1016/j.csbj.2024.12.004","url":null,"abstract":"<div><div>The biodegradation of synthetic polymers offers a promising solution for sustainable plastic recycling. Polyurethanes (PUR) stand out among these polymers due to their susceptibility to enzymatic hydrolysis. However, the intricate 3D structures formed by PUR chains present challenges for biodegradation studies, both computational and experimental. To facilitate <em>in silico</em> research, we introduce PUR-GEN, a web server tailored for the automated generation of PUR fragment libraries. PUR-GEN allows users to input isocyanate and alcohol structural units, facilitating the creation of combinatorial oligomer libraries enriched with conformers and compound property tables. PUR-GEN can serve as a valuable tool for designing PUR fragments to mimic PUR structure interactions with proteins, as well as characterising simplistic PUR models. To illustrate an application of the web server, we present a case study on selected four cutinases and three urethanases with experimentally confirmed PUR-degrading activity or ability to hydrolyse carbamates. The use of PUR-GEN in molecular docking of 414 generated oligomers provides an example of the pipeline for initiation of the PUR degrading enzymes discovery.</div></div>","PeriodicalId":10715,"journal":{"name":"Computational and structural biotechnology journal","volume":"27 ","pages":"Pages 127-136"},"PeriodicalIF":4.4,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11750484/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143022488","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mary Hauser , William J. Dearnaley , A. Cameron Varano , Michael Casasanta , Sarah M. McDonald , Deborah F. Kelly
{"title":"Withdrawal notice to “Corrigendum to: 'Cryo-EM reveals architectural diversity in active Rotavirus particles'. Comput. Struct. Biotechnol. J. 2019 Jul 31; 17:1178-1183” [Comput. Struct. Biotechnol. J. 23 (2024) 3702]","authors":"Mary Hauser , William J. Dearnaley , A. Cameron Varano , Michael Casasanta , Sarah M. McDonald , Deborah F. Kelly","doi":"10.1016/j.csbj.2024.11.017","DOIUrl":"10.1016/j.csbj.2024.11.017","url":null,"abstract":"","PeriodicalId":10715,"journal":{"name":"Computational and structural biotechnology journal","volume":"27 ","pages":"Page 57"},"PeriodicalIF":4.4,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11719286/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142969839","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nassim Nicholas Taleb , Pierre Zalloua , Khaled Elbassioni , Haralampos Hatzikirou , Andreas Henschel , Daniel E. Platt
{"title":"Informational rescaling of PCA maps with application to genetic distance","authors":"Nassim Nicholas Taleb , Pierre Zalloua , Khaled Elbassioni , Haralampos Hatzikirou , Andreas Henschel , Daniel E. Platt","doi":"10.1016/j.csbj.2024.11.042","DOIUrl":"10.1016/j.csbj.2024.11.042","url":null,"abstract":"<div><div>Principal Component Analysis (PCA) is a powerful multivariate tool allowing the projection of data in low-dimensional representations. Nevertheless, datapoint distances on these low-dimensional projections are challenging to interpret. Here, we propose a computationally simple heuristic to transform a map based on standard PCA (when the variables are asymptotically Gaussian) into an entropy-based map where distances are based on mutual information (MI). Moreover, we show that in certain instances our proposed scaled PCA can improve cluster identification. Rescaling principal component-based distances using MI results in a representation of relative statistical associations when, as in genetics, it is applied on bit measurements between individuals' genomic mutual information. This entropy-rescaled PCA, while preserving order relationships (along a dimension), quantifies relative distances into information units, such as “bits”. We illustrate the effect of this rescaling using genomics data derived from world populations and describe how the interpretation of results is impacted.</div></div>","PeriodicalId":10715,"journal":{"name":"Computational and structural biotechnology journal","volume":"27 ","pages":"Pages 48-56"},"PeriodicalIF":4.4,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11719279/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142969931","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tong Min Kim , Taehoon Ko , Byoung Woo Hwang , Hyung Goo Paek , Wan Yeon Lee
{"title":"Self-sovereign management scheme of personal health record with personal data store and decentralized identifier","authors":"Tong Min Kim , Taehoon Ko , Byoung Woo Hwang , Hyung Goo Paek , Wan Yeon Lee","doi":"10.1016/j.csbj.2024.11.036","DOIUrl":"10.1016/j.csbj.2024.11.036","url":null,"abstract":"<div><div>Conventional personal health record (PHR) management systems are centralized, making them vulnerable to privacy breaches and single points of failure. Despite progress in standardizing healthcare data with the FHIR format, hospitals often lack efficient platforms for transferring PHRs, leading to redundant tests and delayed treatments. To address these challenges, we propose a decentralized PHR management system leveraging Personal Data Stores (PDS) and Decentralized Identifiers (DIDs) in line with the Web 3.0 model. Our system features secure interoperability and personal identification masking. Interoperability is achieved through DID digital certificates for verifying PDS addresses and a dynamic access key (AK) system to minimize credential exposure. Data de-identification, including anonymization and encryption, ensures privacy and prevents unauthorized access. We developed a prototype using the Solid open-source library and Hyperledger Aries protocol. Testing showed efficient performance, with DID validations and AK generation under one second, and data operations for 500 MB-sized PHRs completing in two seconds. De-identification processes were both effective and timely. The system demonstrated the ability to manage PHRs securely, empower users with control over their healthcare data, facilitate seamless and secure data transfer between patients and medical entities, and prevent exposure of sensitive information. This approach advances decentralized PHR management, supporting improved healthcare outcomes and patient experiences in the digital era.</div></div>","PeriodicalId":10715,"journal":{"name":"Computational and structural biotechnology journal","volume":"28 ","pages":"Pages 16-28"},"PeriodicalIF":4.4,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11758136/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143045353","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}