{"title":"Replacing normalizations with interval assumptions enhances differential expression and differential abundance analyses.","authors":"Kyle C McGovern, Justin D Silverman","doi":"10.1186/s12859-025-06177-2","DOIUrl":"10.1186/s12859-025-06177-2","url":null,"abstract":"<p><strong>Background: </strong>Methods for differential expression and differential abundance analysis often rely on normalization to address sample-to-sample variation in sequencing depth. However, normalizations imply strict, unrealistic assumptions about the unmeasured scale of biological systems (e.g., microbial load or total cellular transcription). Even slight errors in these assumptions introduce bias, leading to elevated false positive and negative rates.</p><p><strong>Results: </strong>We introduce interval assumptions as a generalization of normalizations. Unlike normalizations, our interval methods allow researchers to account for potential errors in assumptions about the system scale. Interval assumptions are also customizable and allow researchers to express more biologically plausible assumptions about scale. Interval assumptions even generalize Quantitative Microbiome Profiling (QMP), allowing researchers to account for errors in flow cytometry-based measurements of total cellular concentration. We develop a novel hypothesis testing framework that allows us to integrate interval assumptions into existing tools. We develop a modified version of the popular ALDEx2 method using interval assumptions rather than normalizations. Through real and simulated data analyses, we find that interval assumptions can dramatically decrease false positive rates (i.e., from 45% to 5%) while retaining or increasing statistical power. We also study interval assumptions under misspecification and show they still improve on normalizations.</p><p><strong>Conclusions: </strong>Interval assumptions enhance the rigor and reproducibility of differential expression and differential abundance analyses. Our results add to a growing body of literature arguing that normalizations should be replaced with alternative methods that allow researchers to account for scale uncertainty. However, compared to recent alternatives like scale models and sensitivity analyses, interval assumptions are easier to use, are more robust to misspecification, and have stronger and more interpretable inferential guarantees.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"164"},"PeriodicalIF":2.9,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12218962/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144538044","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
FuDong Wen, Yue Su, Dan Liu, YuPeng Wang, MeiNa Liu
{"title":"Automated sparse feature selection in high-dimensional proteomics data via 1-bit compressed sensing and K-Medoids clustering.","authors":"FuDong Wen, Yue Su, Dan Liu, YuPeng Wang, MeiNa Liu","doi":"10.1186/s12859-025-06193-2","DOIUrl":"10.1186/s12859-025-06193-2","url":null,"abstract":"<p><strong>Background: </strong>High-dimensional proteomics data present significant challenges in biomarker discovery due to technical noise, feature redundancy, and multicollinearity. Current feature selection methods, including filter, wrapper, and embedded approaches, struggle with stability, sparsity, and computational efficiency. To address these limitations, we propose Soft-Thresholded Compressed Sensing (ST-CS), a hybrid framework integrating 1-bit compressed sensing with K-Medoids clustering. Unlike conventional methods relying on manual thresholds, ST-CS automates feature selection by dynamically partitioning coefficient magnitudes into discriminative biomarkers and noise.</p><p><strong>Results: </strong>Evaluations on simulated and real-world proteomic datasets demonstrated ST-CS's superiority in feature selection capability and classification performance. In simulations, ST-CS achieved feature selection robustness with balanced sensitivity (> 80%) and specificity (> 99.8%), reducing false discovery rates (FDR) by 20-50% compared to Hard-Thresholded Compressed Sensing (HT-CS). Additionally, it attained superior F1 scores and Matthews Correlation Coefficients (MCC), outperforming HT-CS, LASSO, and SPLSDA in identifying true biomarkers while suppressing noise. For classification performance, ST-CS surpassed all methods in the area under the receiver operating characteristic curve (AUC) across varying noise levels while maintaining sparsity. Applied to Clinical Proteomic Tumor Analysis Consortium (CPTAC) datasets, ST-CS matched HT-CS's classification accuracy (AUC = 97.47% for intrahepatic cholangiocarcinoma) but with 57% fewer selected features (37 vs. 86), demonstrating its dual strength in precision biomarker discovery and predictive accuracy. For glioblastoma data, ST-CS achieved higher AUC (72.71%) than HT-CS (72.15%), LASSO (67.80%), and SPLSDA (71.38%) while retaining a parsimonious feature set (30 vs. 58 features for HT-CS). In ovarian serous cystadenocarcinoma, ST-CS further demonstrated its adaptability, attaining superior AUC (75.86%) over HT-CS (75.61%), LASSO (61.00%), and SPLSDA (70.75%) with only 24 ± 5 selected biomarkers. These results highlight ST-CS's ability to rigorously automate feature selection while balancing classification efficacy, interpretability, and scalability for translational proteomics.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"165"},"PeriodicalIF":2.9,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12220089/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144538041","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Coconut: covariate-assisted composite null hypothesis testing with applications to replicability analysis of high-throughput experimental data.","authors":"Yan Li, Yanmei Li, Han Ma, Zitong Yue, Xin Zhang","doi":"10.1186/s12859-025-06163-8","DOIUrl":"10.1186/s12859-025-06163-8","url":null,"abstract":"<p><strong>Background: </strong>Multiple testing of composite null hypotheses is critical for identifying simultaneous signals across studies. While it is common to incorporate external information in simple null hypotheses, exploiting such auxiliary covariates to provide prior structural relationships among composite null hypotheses and boost the statistical power remains challenging.</p><p><strong>Results: </strong>We propose a robust and powerful covariate-assisted composite null hypothesis testing (CoCoNuT) procedure based on a Bayesian framework to identify replicable signals in two studies while asymptotically controlling the false discovery rate. CoCoNuT innovatively adopts a three-dimensional mixture model to consider two primary studies and an integrative auxiliary covariate jointly. While accounting for heterogeneity across studies, the local false discovery rate optimally captures cross-study and cross-feature information, providing improved rankings of feature importance.</p><p><strong>Conclusions: </strong>Theoretical and empirical evaluations confirm the validity and efficiency of CoCoNuT. Extensive simulations demonstrate that CoCoNuT outperforms conventional methods that do not exploit auxiliary covariates while controlling the FDR. We apply CoCoNuT to schizophrenia genome-wide association studies, illustrating its higher power in identifying replicable genetic variants with the assistance of relevant auxiliary studies.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"163"},"PeriodicalIF":2.9,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12210505/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144538042","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Ensemble machine learning-based pre-trained annotation approach for scRNA-seq data using gradient boosting with genetic optimizer.","authors":"Osama Elnahas, Waleed M Ead, Yushan Qiu, Jian Lu","doi":"10.1186/s12859-025-06151-y","DOIUrl":"10.1186/s12859-025-06151-y","url":null,"abstract":"<p><p>Single-cell RNA sequencing (scRNA-seq) has revolutionized the study of gene expression by allowing researchers to analyze the transcriptomes of individual cells. This technology provides unprecedented insights into cellular heterogeneity, cellular states, and biological processes at a single-cell resolution. The problem of single-cell RNA annotation involves assigning meaningful labels or annotations to each cell in the scRNA-seq dataset, indicating its corresponding cell type, state, or biological function. Current annotation methods are often challenged by limited source data quality, sensitivity to batch effects, and poor adaptability to uncharacterized cell types. We propose an ensemble machine learning-based pre-trained annotation framework that integrates gradient boosting and genetic optimization for robust feature selection. The proposed method uses ensemble learning to enhance annotation accuracy under data scarcity, addressing limitations in existing supervised methods by leveraging a combination of multiple annotated datasets and feature alignment strategies. Through comprehensive benchmarking across varied biological contexts, we demonstrate that the proposed approach significantly improves annotation accuracy and generalization across different scRNA-seq platforms, especially under conditions of reduced reference data. Results confirm its versatility and resilience in accurately annotating cell types, even under reduced data conditions, establishing it as a powerful tool for cell-type classification in scRNA-seq data.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"166"},"PeriodicalIF":2.9,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12220795/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144538043","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"2dSpAn-Auto: an automated tool for analysis of two-dimensional dendritic spine images.","authors":"Shauvik Paul, Rahul Pramanick, Nirmal Das, Ewa Baczynska, Zeinab Bedrood, Tapabrata Chakraborti, Subhadip Basu, Jakub Wlodarczyk","doi":"10.1186/s12859-025-06179-0","DOIUrl":"10.1186/s12859-025-06179-0","url":null,"abstract":"<p><strong>Background: </strong>Quantitative analysis of dendritic spine morphology and density is crucial for understanding synaptic plasticity and its role in neuropsychiatric disorders, including Alzheimer's disease and schizophrenia. While both 3D and 2D approaches exist for spine analysis, 2D methods offer advantages in computational efficiency, rapid assessment, and more reasonable to use in case of limited z-resolution images acquired through confocal and previous generation super-resolution microscopy. In this work, we developed a modality-agnostic spine segmentation approach based on 2D skeletonization. Specifically, we implemented two analytical workflows, viz., 2dSpAn-Auto.b, that implements binary skeletonization alogrithm and 2dSpAn-Auto.f, that generates fuzzy skeletons directly from gray-scale images. Our developed method enables fast and automatic segmentation and morphological analysis of 2D maximum intensity projection images of dendritic spines. Expert users can fine-tune parameters when needed, though default settings prove robust across various imaging conditions. The developed 2dSpAn-Auto software tool is most suitable for automated batch processing while maintaining user flexibility through an intuitive graphical interface.</p><p><strong>Results: </strong>2dSpAn-Auto is validated across multiple imaging modalities (in vitro, ex vivo, and in vivo) for automatic assessment of dendritic spine parameters including spine density, morphometry (spine area, spine length, head width, minimum and average neck width), and total dendrite length. Validation studies demonstrate high accuracy and reproducibility across varying imaging protocols and experimental conditions. Multiple images from similar experimental setups can be processed seamlessly in the batch mode.</p><p><strong>Conclusions: </strong>2dSpAn-Auto provides a robust, interpretable solution for fast analysis of dendritic spines, a critical need in neurological research and clinical assessment. The combination of automated processing with optional expert oversight makes it suitable for both routine analysis and specialized research applications. The software, complete with the source code and comprehensive documentation, is available to the research community for non-commercial use under GNU General Public License (GPL) v3.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"162"},"PeriodicalIF":2.9,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12211165/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144538039","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Adaptive gradient scaling: integrating Adam and landscape modification for protein structure prediction.","authors":"Vitalii Kapitan, Michael Choi","doi":"10.1186/s12859-025-06185-2","DOIUrl":"10.1186/s12859-025-06185-2","url":null,"abstract":"<p><strong>Background: </strong>Protein structure prediction is one of the most important scientific problems, on the one hand, it is one of the NP-hard problems, and on the other hand, it has a wide range of applications including drug discovery and biotechnology development. Since experimental methods for structure determination remain expensive and time-consuming, computational structure prediction offers a scalable and cost-effective alternative and application of machine learning in structural biology has revolutionized protein structure prediction. Despite their success, machine learning methods face fundamental limitations in optimizing complex high-dimensional energy landscapes, which motivates research into new methods to improve the robustness and performance of optimization algorithms.</p><p><strong>Results: </strong>This study presents a novel approach to protein structure prediction by integrating the Landscape Modification (LM) method with the Adam optimizer for OpenFold. The main idea is to change the optimization dynamics by introducing a gradient scaling mechanism based on energy landscape transformations. LM dynamically adjusts gradients using a threshold parameter and a transformation function, thereby improving the optimizer's ability to avoid local minima, more efficiently traverse flat or rough landscape regions, and potentially converge faster to global or high-quality local optima. By integrating simulated annealing into the LM approach, we propose LM SA, a variant designed to improve convergence stability while facilitating more efficient exploration of complex landscapes.</p><p><strong>Conclusion: </strong>We compare the performance of standard Adam, LM, and LM SA on different datasets and computational conditions. Performance was evaluated using Loss function values, predicted Local Distance Difference Test (pLDDT), distance-based Root Mean Square Deviation (dRMSD), and Template Modeling (TM) scores. Our results show that LM and LM SA outperform the standard Adam across all metrics, showing faster convergence and better generalization, particularly on proteins not included in the training set. These results demonstrate that integrating landscape-aware gradient scaling into first-order optimizers advances research in computational optimization and improves prediction performance for complex problems such as protein folding.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"161"},"PeriodicalIF":2.9,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12210780/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144538040","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Differential expression analysis with inmoose, the integrated multi-omic open-source environment in Python.","authors":"Maximilien Colange, Guillaume Appé, Léa Meunier, Solène Weill, Akpéli Nordor, Abdelkader Behdenna","doi":"10.1186/s12859-025-06180-7","DOIUrl":"10.1186/s12859-025-06180-7","url":null,"abstract":"<p><strong>Background: </strong>Differential gene expression analysis is a prominent technique for the analysis of biomolecular data to identify genetic features associated with phenotypes. Limma-for microarray data -, and edgeR and DESeq2-for RNA-Seq data-, are the most widely used tools for differential gene expression analysis of bulk transcriptomic data.</p><p><strong>Results: </strong>We present the differential expression features of InMoose, a Python implementation of R tools limma, edgeR, and DESeq2. We experimentally show that InMoose stands as a drop-in replacement for those tools, with nearly identical results. This ensures reproducibility when interfacing both languages in bioinformatic pipelines. InMoose is an open source software released under the GPL3 license, available at www.github.com/epigenelabs/inmoose and https://inmoose.readthedocs.io .</p><p><strong>Conclusions: </strong>We present a new Python implementation of state-of-the-art tools limma, edgeR, and DESeq2, to perform differential gene expression analysis of bulk transcriptomic data. This new implementation exhibits results nearly identical to the original tools, improving interoperability and reproducibility between Python and R bioinformatics pipelines.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"160"},"PeriodicalIF":2.9,"publicationDate":"2025-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12183803/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144473982","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Amin Khodaei, Sania Eskandari, Hadi Sharifi, Behzad Mozaffari-Tazehkand
{"title":"PRCFX-DT: a new graph-based approach for feature selection and classification of genomic sequences.","authors":"Amin Khodaei, Sania Eskandari, Hadi Sharifi, Behzad Mozaffari-Tazehkand","doi":"10.1186/s12859-025-06183-4","DOIUrl":"10.1186/s12859-025-06183-4","url":null,"abstract":"<p><strong>Background: </strong>In recent years, viral diseases have exhibited a significant incidence of infections and fatalities. The analysis of viral genomic sequences can be efficacious in evaluating the present and potentially forthcoming condition of viruses. Considering the importance of the internal structure of the cell and the nucleotide sequences within it, analyzing nucleotide sequences can provide a range of discussable features. On the other hand, it has been demonstrated that the use of graph algorithms and machine learning in the analysis and examination of virus samples and even viral variants can yield beneficial results.</p><p><strong>Results: </strong>This study proposes a novel approach that utilizes complex networks and probabilistic graph modeling methods to analyze viral genomic sequences for feature extraction. The proposed approach, which relies on the PageRank centrality algorithm, operates on codons that are associated with the nucleotide sequences. Experiments with machine learning algorithms were conducted on multiple datasets of viruses and various variants of coronavirus and influenza viruses. The use of a decision tree classifier model on the extracted distinguishing features enabled the differentiation of coronavirus samples from other samples. The high discriminative capability of the graph node centrality feature played a significant role in these experiments, establishing a meaningful connection with genetic concepts as well. The decision tree classifier applied on 173,228 genomic sequence samples originating from 30 distinct virus types, showed a remarkable accuracy rate of 99.73%.</p><p><strong>Conclusion: </strong>The proposed algorithm was successfully tested on several types of viruses, and the interpretability of the extracted features also enabled its structural analysis. The use of a graph-based approach on genetic features containing information about the internal structure of nucleotides yielded results that could be significant for the identification of any type of virus or specific viral variant.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"159"},"PeriodicalIF":2.9,"publicationDate":"2025-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12172359/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144315857","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Riccardo Smeriglio, Roberta Bardini, Alessandro Savino, Stefano Di Carlo
{"title":"Start & Stop: a PhysiCell and PhysiBoSS 2.0 add-on for interactive simulation control.","authors":"Riccardo Smeriglio, Roberta Bardini, Alessandro Savino, Stefano Di Carlo","doi":"10.1186/s12859-025-06144-x","DOIUrl":"10.1186/s12859-025-06144-x","url":null,"abstract":"<p><p>In computational biology, in silico simulators are vital for exploring and understanding the behavior of complex biological systems. Hybrid multi-level simulators, such as PhysiCell and PhysiBoSS 2.0, integrate multiple layers of biological complexity, providing deeper insights into emergent patterns. However, one key limitation of these simulators is the inability to adjust simulation parameters once the simulation has started, which hinders the interactive exploration and adaptation of dynamic protocols ranging from biofabrication to in vitro pharmacological testing. To address this challenge, we introduce the Start & Stop add-on for PhysiCell and PhysiBoSS 2.0. This add-on offers multi-level state preservation and multi-modal stop control, triggered by simulation time or cell conditions, enabling users to pause a simulation, adjust parameters, and then resume from the exact halted state. We validate Start & Stop using two well-established PhysiBoSS 2.0 use cases, a tumor spheroid 3T3 mouse fibroblasts use case under tumor necrosis factor (TNF) stimulation, and a lung cancer cell line invasion simulation, demonstrating that it preserves the simulator's original behavior while enabling interactive configuration changes that facilitate the exploration of diverse and adaptive treatment strategies. By enhancing flexibility and user interaction, Start & Stop makes PhysiCell and PhysiBoSS 2.0 more akin to real in vitro scenarios, thus expanding the range of potential simulations and advancing more effective protocol development in a variety of applications.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"158"},"PeriodicalIF":2.9,"publicationDate":"2025-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12160357/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144274182","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SCATrans: semantic cross-attention transformer for drug-drug interaction predication through multimodal biomedical data.","authors":"Shanwen Zhang, Changqing Yu, Chuanlei Zhang","doi":"10.1186/s12859-025-06165-6","DOIUrl":"10.1186/s12859-025-06165-6","url":null,"abstract":"<p><p>Predicting potential drug-drug interactions (DDIs) from biomedical data plays a critical role in drug therapy, drug development, drug regulation, and public health. However, it remains challenging due to the large number of possible drug combinations, and multimodal biomedical data, which is disorder, imbalanced, more prone to linguistic errors, and difficult to label. A Semantic Cross-Attention Transformer (SCAT) model is constructed to address the above challenge. In the model, BioBERT, Doc2Vec and graph convolutional network are utilized to embed the multimodal biomedical data into vector representation, BiGRU is adopted to capture contextual dependencies in both forward and backward directions, Cross-Attention is employed to integrate the extracted features and explicitly model dependencies between them, and a feature-joint classifier is adopted to implement DDI predication (DDIP). The experiment results on the DDIExtraction-2013 dataset demonstrate that SCAT outperforms the state-of-the-art DDIP approaches. SCAT expands the application of multimodal deep learning in the field of multimodal DDIP, and can be applied to drug regulation systems to predict novel DDIs and DDI-related events.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"157"},"PeriodicalIF":2.9,"publicationDate":"2025-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12153160/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144265200","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}