{"title":"Mapping Cell Identity from scRNA-seq: A primer on computational methods","authors":"Daniele Traversa, Matteo Chiara","doi":"10.1016/j.csbj.2025.03.051","DOIUrl":"10.1016/j.csbj.2025.03.051","url":null,"abstract":"<div><div>Single cell (sc) technologies mark a conceptual and methodological breakthrough in our way to study cells, the base units of life. Thanks to these technological developments, large-scale initiatives are currently ongoing aimed at mapping of all the cell types in the human body, with the ambitious aim to gain a cell-level resolution of physiological development and disease. Since its broad applicability and ease of interpretation scRNA-seq is probably the most common sc-based application. This assay uses high throughput RNA sequencing to capture gene expression profiles at the sc-level. Subsequently, under the assumption that differences in transcriptional programs correspond to distinct cellular identities, <em>ad-hoc</em> computational methods are used to infer cell types from gene expression patterns. A wide array of computational methods were developed for this task. However, depending on the underlying algorithmic approach and associated computational requirements, each method might have a specific range of application, with implications that are not always clear to the end user. Here we will provide a concise overview on state-of-the-art computational methods for cell identity annotation in scRNA-seq, tailored for new users and non-computational scientists. To this end, we classify existing tools in five main categories, and discuss their key strengths, limitations and range of application.</div></div>","PeriodicalId":10715,"journal":{"name":"Computational and structural biotechnology journal","volume":"27 ","pages":"Pages 1559-1569"},"PeriodicalIF":4.4,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143824087","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"IGN: Invariable gene set-based normalization for chromatin accessibility profile data analysis","authors":"Shengen Shawn Hu , Hai-Hui Xue , Chongzhi Zang","doi":"10.1016/j.csbj.2025.01.018","DOIUrl":"10.1016/j.csbj.2025.01.018","url":null,"abstract":"<div><div>Chromatin accessibility profiles generated using ATAC-seq or DNase-seq carry functional information of the regulatory genome that controls gene expression. Appropriate normalization of ATAC-seq and DNase-seq data is essential for accurate differential analysis when studying chromatin dynamics. Existing normalization methods usually assume the same distribution of genomic signals across samples; however, this assumption may not be appropriate when there are global changes in chromatin accessibility levels between experimental conditions/samples. We present IGN (Invariable Gene Normalization), a method for ATAC-seq and DNase-seq data normalization. IGN normalizes the promoter chromatin accessibility signals for a set of genes that are unchanged in expression, usually obtained from accompanying RNA-seq data, and extrapolating to normalize the genome-wide chromatin accessibility profile. We demonstrate the effectiveness of IGN in analyzing central memory CD8<sup>+</sup> T cell activation, a system with anticipated global reprogramming of chromatin and gene expression, and show that IGN outperforms existing methods. As the first chromatin accessibility normalization method that accounts for global differences, IGN can be widely applied to differential ATAC-seq and DNase-seq analysis. The package and source code are available on GitHub at <span><span>https://github.com/zang-lab/IGN</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":10715,"journal":{"name":"Computational and structural biotechnology journal","volume":"27 ","pages":"Pages 501-507"},"PeriodicalIF":4.4,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143098822","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ioannis N. Tzortzis , Alberto Gutierrez-Torre , Stavros Sykiotis , Ferran Agulló , Nikolaos Bakalos , Anastasios Doulamis , Nikolaos Doulamis , Josep Ll. Berral
{"title":"Towards generalizable Federated Learning in medical imaging: A real-world case study on mammography data","authors":"Ioannis N. Tzortzis , Alberto Gutierrez-Torre , Stavros Sykiotis , Ferran Agulló , Nikolaos Bakalos , Anastasios Doulamis , Nikolaos Doulamis , Josep Ll. Berral","doi":"10.1016/j.csbj.2025.03.031","DOIUrl":"10.1016/j.csbj.2025.03.031","url":null,"abstract":"<div><div>Federated Learning has been rapidly gaining in popularity in medical applications, due to the increased privacy offered, since medical data doesn't need to leave the hospitals' premises for AI model training. However, a direct translation of a classic experiment to a federated one is not always straightforward. In this work, we delve into the intricacies of federated learning for a breast cancer classification tool. We compare classic model training with a federated variant, and highlight the adaptations that need to be taken care of to ensure the equivalence between the two. Specifically, we introduce the Breast Area Detection tool as an essential component of the pre-processing pipeline to enhance the robustness of Federated Learning by offering data harmonization. On top of that, we present an end-to-end Federated Learning framework that is effective for real-world data and scenarios. Among the three real-world hospitals involved in the experimental procedure, the proposed framework significantly improves performance at the first hospital, providing consistent results similar to those achieved in the classic approach. Experimental results demonstrate that the interventions introduced improved model performance by approximately 35%, aligning federated learning and centralized model performance.</div></div>","PeriodicalId":10715,"journal":{"name":"Computational and structural biotechnology journal","volume":"28 ","pages":"Pages 106-117"},"PeriodicalIF":4.4,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143696893","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
F. Chiappori , F. Di Palma , A. Cavalli , M. de Rosa , F. Viti
{"title":"Dynamical features of smooth muscle actin pathological mutants: The arginine-257(258)-Cysteine cases","authors":"F. Chiappori , F. Di Palma , A. Cavalli , M. de Rosa , F. Viti","doi":"10.1016/j.csbj.2025.02.010","DOIUrl":"10.1016/j.csbj.2025.02.010","url":null,"abstract":"<div><div>The R257(8)C mutation in smooth muscle actins, ACTG2 and ACTA2, is the most frequent cause of severe genetic diseases: namely, visceral myopathy, and familial thoracic aortic aneurysms and dissections, which respectively, stem from impairment of the visceral and vascular muscle. The molecular mechanisms underlying such pathologies are not fully elucidated. In the absence of experimental data of WT and mutated actins in their monomeric (g-) and filamentous (f-) form, molecular dynamics can shed light on the role of the R257(8)C in protein structure and dynamics. Analysis of g-actins does not show significant differences between WT and mutated proteins suggesting the correct monomers folding. On the contrary, mutated filaments are destabilized. Subunits of R257C f-ACTG2 adopt non optimal angles and in R258C f-ACTA2 we observe depolymerization already in the simulated time frame. Overall, our data points to a crucial role of residue R257(8) in actin structure and dynamics, in particular when the protein assembles into the filament.</div></div>","PeriodicalId":10715,"journal":{"name":"Computational and structural biotechnology journal","volume":"27 ","pages":"Pages 753-764"},"PeriodicalIF":4.4,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143488124","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yasir Mamun , Ally Aguado , Ana Preza , Abhilasha Kadel , Anjani Mogallur , Briana Gonzalez , Jayleen De La Rosa , Daniel Diaz , Polina Evdokimova , Ukesh Karki , Yuk-Ching Tse-Dinh , Prem Chapagain
{"title":"Substrate binding of human and bacterial type IA topoisomerase: An experimentation with AlphaFold 3.0","authors":"Yasir Mamun , Ally Aguado , Ana Preza , Abhilasha Kadel , Anjani Mogallur , Briana Gonzalez , Jayleen De La Rosa , Daniel Diaz , Polina Evdokimova , Ukesh Karki , Yuk-Ching Tse-Dinh , Prem Chapagain","doi":"10.1016/j.csbj.2025.03.041","DOIUrl":"10.1016/j.csbj.2025.03.041","url":null,"abstract":"<div><div>Advancements in biophysical techniques such as X-ray crystallography and Cryo-EM have allowed the determination of three-dimensional structures of many proteins and nucleic acids. There, however, is still a lack of 3D structures of proteins that are difficult to crystallize or proteins in complex with other macromolecules. With the advent of deep learning applications such as AlphaFold and RoseTTAFold, it is becoming possible to obtain 3D structures of proteins from their 1D sequences while also generating models of protein-nucleic acid complexes that have been difficult to capture through traditional methods. In this project, we utilized AlphaFold3 (AF3) to create a large number of predicted complexes of two type IA topoisomerases: human topoisomerase 3 beta (hTOP3B) and <em>Mycobacterium tuberculosis</em> topoisomerase I bound to a single-stranded DNA (ssDNA). Topoisomerases are enzymes responsible for resolving topological barriers that arise during regular cellular activity. Obtaining structures of topoisomerase complexed with a ssDNA will allow us to discover possible sequence preferences of this enzyme and obtain structures that can be used to screen potential inhibitors. Our analysis showed that AF3 can predict the structure of the enzymes, especially the N-terminal domain, with high confidence. However, predicted protein-DNA complexes, especially with longer (> 25-mer) oligos, are unreliable. The models generated with shorter (9-mer) oligos are obtained with improved confidence and the substrates are placed similarly to crystal structures, but they do not reliably replicate the sequence specificity of the DNA binding of topoisomerase observed in biochemical assays and crystal structures.</div></div>","PeriodicalId":10715,"journal":{"name":"Computational and structural biotechnology journal","volume":"27 ","pages":"Pages 1342-1349"},"PeriodicalIF":4.4,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143746840","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mansoor Ahmad Bhat , Tanja Radu , Ignacio Martín-Fabiani , Panagiotis D. Kolokathis , Anastasios G. Papadiamantis , Stephan Wagner , Yvonne Kohl , Hilda Witters , Wouter A. Gebbink , Yentl Pareja Rodriguez , Giuseppe Cardelini , Roel Degens , Ivana Burzic , Beatriz Alfaro Serrano , Claudia Pretschuh , Eduardo Santamaría-Aranda , Elena Contreras-García , Judith Sinic , Christoph Jocham , Dror Cohen , Milica Velimirovic
{"title":"Safe and sustainable by design of next generation chemicals and materials: SSbD4CheM project innovations in the textiles, cosmetic and automotive sectors","authors":"Mansoor Ahmad Bhat , Tanja Radu , Ignacio Martín-Fabiani , Panagiotis D. Kolokathis , Anastasios G. Papadiamantis , Stephan Wagner , Yvonne Kohl , Hilda Witters , Wouter A. Gebbink , Yentl Pareja Rodriguez , Giuseppe Cardelini , Roel Degens , Ivana Burzic , Beatriz Alfaro Serrano , Claudia Pretschuh , Eduardo Santamaría-Aranda , Elena Contreras-García , Judith Sinic , Christoph Jocham , Dror Cohen , Milica Velimirovic","doi":"10.1016/j.csbj.2025.03.022","DOIUrl":"10.1016/j.csbj.2025.03.022","url":null,"abstract":"<div><div>The strategic objective of the Safe and sustainable by design of next generation chemicals and materials (SSbD4CheM) project is to develop screening and testing strategies for a variety of substances and materials to ensure safer and more sustainable products in line with the Sustainable Products Initiative. SSbD4CHeM is focusing on chemical safety using new approach methods, including <em>in vitro</em> studies without animal models and <em>in silico</em> tools. Additionally, it integrates environmental sustainability for the implementation of the Safe and Sustainable by Design (SSbD) framework including risk assessment and <em>ex-ante</em> life cycle assessment. New methods and models for safety and sustainability assessment along chemical, material and product life cycles will be developed, validated, and applied to three case studies, including biobased self-cleaning, water repellent, and antimicrobial treatments for textiles, nanocellulose as an additive in cosmetics, and microcellulose composites for the automotive industry. By employing a multidisciplinary strategy, SSbD4CHeM addresses key challenges in material innovation, ensuring regulatory compliance while reducing hazards to environmental and human health. The project will accelerate the development of next-generation sustainable materials, promoting industry innovation, regulatory progress, and improved consumer safety. Ultimately, SSbD4CheM aims to establish a new benchmark for the development of chemicals and materials that conform to safety and sustainability goals.</div></div>","PeriodicalId":10715,"journal":{"name":"Computational and structural biotechnology journal","volume":"29 ","pages":"Pages 60-71"},"PeriodicalIF":4.4,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143684555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fast detection of unique genomic regions","authors":"Beatriz Vieira Mourato, Bernhard Haubold","doi":"10.1016/j.csbj.2025.02.025","DOIUrl":"10.1016/j.csbj.2025.02.025","url":null,"abstract":"<div><div>Unique genomic regions are of particular interest in two scenarios: When extracted from a single mammalian target genome, they are highly enriched for developmental genes. When extracted from target genomes compared to closely related neighbor genomes, they are highly enriched for diagnostic markers. Despite their biological importance and potential economic value, unique regions remain difficult to detect from whole genome sequences. In this review we survey three efficient programs for the detection of unique regions at scale, <span>genmap</span>, <span>macle</span>, and <span>fur</span>. We explain these programs and demonstrate their application by analyzing simulated and real data. Example scripts for searching for unique regions are available from the Github repository <span>evolbioinf/sure</span> as part of a detailed tutorial.</div></div>","PeriodicalId":10715,"journal":{"name":"Computational and structural biotechnology journal","volume":"27 ","pages":"Pages 843-850"},"PeriodicalIF":4.4,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143548770","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Erik Zschaubitz , Henning Schröder , Conor Christopher Glackin , Lukas Vogel , Matthias Labrenz , Theodor Sperlea
{"title":"A benchmark analysis of feature selection and machine learning methods for environmental metabarcoding datasets","authors":"Erik Zschaubitz , Henning Schröder , Conor Christopher Glackin , Lukas Vogel , Matthias Labrenz , Theodor Sperlea","doi":"10.1016/j.csbj.2025.04.017","DOIUrl":"10.1016/j.csbj.2025.04.017","url":null,"abstract":"<div><div>Next-Generation Sequencing methods like DNA metabarcoding enable the generation of large community composition datasets and have grown instrumental in many branches of ecology in recent years. However, the sparsity, compositionality, and high dimensionality of metabarcoding datasets pose challenges in data analysis. In theory, feature selection methods improve the analyzability of eDNA metabarcoding datasets by identifying a subset of informative taxa that are relevant for a certain task and discarding those that are redundant or irrelevant. However, general guidelines on selecting a feature selection method for application to a given setting are lacking. Here, we report a comparison of feature selection methods in a supervised machine learning setup across 13 environmental metabarcoding datasets with differing characteristics. We evaluate workflows that consist of data preprocessing, feature selection and a machine learning model by their ability to capture the ecological relationship between the microbial community composition and environmental parameters. Our results demonstrate that, while the optimal feature selection approach depends on dataset characteristics, feature selection is more likely to impair model performance than to improve it for tree ensemble models like Random Forests. Furthermore, our results show that calculating relative counts impairs model performance, which suggests that novel methods to combat the compositionality of metabarcoding data are required.</div></div>","PeriodicalId":10715,"journal":{"name":"Computational and structural biotechnology journal","volume":"27 ","pages":"Pages 1636-1647"},"PeriodicalIF":4.4,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143859370","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pablo Monfort-Lanzas , Johanna M. Gostner , Hubert Hackl
{"title":"Modeling omics dose-response at the pathway level with DoseRider","authors":"Pablo Monfort-Lanzas , Johanna M. Gostner , Hubert Hackl","doi":"10.1016/j.csbj.2025.04.004","DOIUrl":"10.1016/j.csbj.2025.04.004","url":null,"abstract":"<div><div>The generation of omics data sets has become an important approach in modern pharmacological and toxicological research as it can provide mechanistic and quantitative information on a large scale. Analyses of these data frequently revealed a non-linear dose-response relationship underscoring the importance of the modeling process to infer biological exposure limits. A number of tools have been developed for dose-response modeling and various thresholds have been defined as a quantitative representation of the effect of a substance, such as effective concentrations or benchmark doses (BMD). Here we present DoseRider an easy-to-use web application and a companion R package for linear and non-linear dose-response modeling and assessment of BMD at the level of biological pathways or signatures using generalized mixed effect models. This approach allows to analyze custom or provided multi-omics data such as RNA sequencing or metabolomics data and its annotation of a collection of pathways and gene sets from various species. Moreover, we introduce the concept of the trend change doses (TCDs) as a numerical descriptor of effects derived from complex dose-response curves. The usability of DoseRider was demonstrated by analyses of RNA sequencing data of bisphenol AF (BPAF) treatment of a human breast cancer cell line (MCF-7) at 8 different concentrations using gene sets for chemical and genetic perturbations (MSigDB). The BMD for BPAF and a set of genes upregulated by estrogen in breast cancer was 0.2 µM (95 %-CI 0.1–0.5 µM) and the lowest TCD (TCD1) was 0.003 µM (95 %-CI 0.0006–0.01 µM). The comprehensive presentation of the results underlines the suitability of the system for pharmacogenomics, toxicogenomics, and applications beyond.</div></div>","PeriodicalId":10715,"journal":{"name":"Computational and structural biotechnology journal","volume":"27 ","pages":"Pages 1440-1448"},"PeriodicalIF":4.4,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143783042","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
D.J. Hamelin , M. Scicluna , I. Saadie , F. Mostefai , J.C. Grenier , C. Baron , E. Caron , J.G. Hussin
{"title":"Predicting pathogen evolution and immune evasion in the age of artificial intelligence","authors":"D.J. Hamelin , M. Scicluna , I. Saadie , F. Mostefai , J.C. Grenier , C. Baron , E. Caron , J.G. Hussin","doi":"10.1016/j.csbj.2025.03.044","DOIUrl":"10.1016/j.csbj.2025.03.044","url":null,"abstract":"<div><div>The genomic diversification of viral pathogens during viral epidemics and pandemics represents a major adaptive route for infectious agents to circumvent therapeutic and public health initiatives. Historically, strategies to address viral evolution have relied on responding to emerging variants after their detection, leading to delays in effective public health responses. Because of this, a long-standing yet challenging objective has been to forecast viral evolution by predicting potentially harmful viral mutations prior to their emergence. The promises of artificial intelligence (AI) coupled with the exponential growth of viral data collection infrastructures spurred by the COVID-19 pandemic, have resulted in a research ecosystem highly conducive to this objective. Due to the COVID-19 pandemic accelerating the development of pandemic mitigation and preparedness strategies, many of the methods discussed here were designed in the context of SARS-CoV-2 evolution. However, most of these pipelines were intentionally designed to be adaptable across RNA viruses, with several strategies already applied to multiple viral species. In this review, we explore recent breakthroughs that have facilitated the forecasting of viral evolution in the context of an ongoing pandemic, with particular emphasis on deep learning architectures, including the promising potential of language models (LM). The approaches discussed here employ strategies that leverage genomic, epidemiologic, immunologic and biological information.</div></div>","PeriodicalId":10715,"journal":{"name":"Computational and structural biotechnology journal","volume":"27 ","pages":"Pages 1370-1382"},"PeriodicalIF":4.4,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143746796","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}