K Soni Sharmila, Thanga Revathi S, Pokkuluri Kiran Sree
{"title":"DDINet: Drug-drug interaction prediction network based on multi-molecular fingerprint features and multi-head attention centered weighted autoencoder.","authors":"K Soni Sharmila, Thanga Revathi S, Pokkuluri Kiran Sree","doi":"10.1142/S0219720025500039","DOIUrl":"10.1142/S0219720025500039","url":null,"abstract":"<p><p>Drug-drug interactions (DDIs) pose a major concern in polypharmacy due to their potential to cause unexpected side effects that can adversely affect a patient's health. Therefore, it is crucial to identify DDIs effectively during the early stages of drug discovery and development. In this paper, a novel DDI prediction network (DDINet) is proposed to enhance the predictive performance over conventional DDI methods. Leveraging the DrugBank dataset, drugs are represented using the Simplified Molecular Input Line-Entry System (SMILES), with the RDKit software pre-processing the SMILES strings into their canonical forms. Multiple molecular fingerprinting techniques such as Extended Connectivity Fingerprints (ECFPs), Molecular ACCess System keys (MACCSkeys), PubChem Fingerprints, 3D molecular fingerprints (3D-FP), and molecular dynamics fingerprints (MDFPs) are employed to encode drug chemical structures into feature vectors. Drug similarities are computed using the Tanimoto coefficient (TC), and the final Structural Similarity Profile (SSP) is obtained by averaging the five molecular fingerprint types. The novelty of the approach lies in the integration of a Multi-head Attention centered Weighted Autoencoder (Mul_WAE) as the interaction prediction module, which leverages the Multi-head Attention (MHA) layer to focus on the most significant input features. Furthermore, we introduce the Upgraded Bald Eagle Search Optimization (UBesO) algorithm, which optimally selects the learnable parameters of the Mul_WAE based on cross-entropy loss, improving the model's convergence and performance. The proposed DDINet model achieves an accuracy of 99.77%, 99.66% of AUC, 99.5% average precision, 99.4% precision, and 99.49% recall, providing a comprehensive evaluation of the model's robustness. Beyond high accuracy, DDINet offers advantages in scalability, making it well suited for handling large datasets due to its efficient feature extraction and optimization processes. The unique combination of multiple molecular fingerprinting methods with the MHA layer and UBesO algorithm highlights the innovative aspects of our model and significantly improves prediction performance compared to existing approaches.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":"23 1","pages":"2550003"},"PeriodicalIF":0.9,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143765530","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Gene regulatory network inference based on modified adaptive lasso.","authors":"Chao Li, Xiaoran Huang, Xiao Luo, Xiaohui Lin","doi":"10.1142/S0219720024500264","DOIUrl":"10.1142/S0219720024500264","url":null,"abstract":"<p><p>Gene regulatory networks (GRNs) reveal the regulatory interactions among genes and provide a visual tool to explain biological processes. However, how to identify direct relations among genes from gene expression data in the case of high-dimensional and small samples is a critical challenge. In this paper, we proposed a new GRN inference method based on a modified adaptive least absolute shrinkage and selection operator (MALasso). MALasso expands the number of samples based on the distance correlation and defines a new weighting manner for adaptive lasso to remove false positive edges of the networks in the iterative process. Simulated data and gene expression data from DREAM challenge were used to validate the performance of the proposed method MALasso. The comparison results among MALasso, adaptive lasso and other six state-of-the-art methods show that MALasso outperformed the competition methods in AUROCC and AUPRC in most cases and had a better ability to distinguish direct edges from indirect ones. Hence, by modifying the adaptive weighting manner of adaptive lasso, MALasso can detect linear and nonlinear relations, remove the false positive edges and identify direct relations among genes more accurately.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":" ","pages":"2450026"},"PeriodicalIF":0.9,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143014473","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dehua Chen, Yongsheng Yang, Dongdong Shi, Zhenhua Zhang, Mei Wang, Qiao Pan, Jianwen Su, Zhen Wang
{"title":"The use of 4D data-independent acquisition-based proteomic analysis and machine learning to reveal potential biomarkers for stress levels.","authors":"Dehua Chen, Yongsheng Yang, Dongdong Shi, Zhenhua Zhang, Mei Wang, Qiao Pan, Jianwen Su, Zhen Wang","doi":"10.1142/S0219720024500252","DOIUrl":"10.1142/S0219720024500252","url":null,"abstract":"<p><p>Research suggests that individuals who experience prolonged exposure to stress may be at higher risk for developing psychological stress disorders. Currently, psychological stress is primarily evaluated by professional physicians using rating scales, which may be prone to subjective biases and limitations of the scales. Therefore, it is imperative to explore more objective, accurate, and efficient biomarkers for evaluating the level of psychological stress in an individual. In this study, we utilized 4D data-independent acquisition (4D-DIA) proteomics for quantitative protein analysis, and then employed support vector machine (SVM) combined with SHAP interpretation algorithm to identify potential biomarkers for psychological stress levels. Biomarkers validation was subsequently achieved through machine learning classification and a substantial amount of a priori knowledge derived from the knowledge graph. We performed cross-validation of the biomarkers using two batches of data, and the results showed that the combination of Glyceraldehyde-3-phosphate dehydrogenase and Fibronectin yielded an average area under the curve (AUC) of 92%, an average accuracy of 86%, an average F1 score of 79%, and an average sensitivity of 83%. Therefore, this combination may represent a potential approach for detecting stress levels to prevent psychological stress disorders.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":" ","pages":"2450025"},"PeriodicalIF":0.9,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142639951","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Author index Volume 22 (2024).","authors":"","doi":"10.1142/S0219720024990014","DOIUrl":"https://doi.org/10.1142/S0219720024990014","url":null,"abstract":"","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":"22 6","pages":"2499001"},"PeriodicalIF":0.9,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143442521","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"ASAP-DTA: Predicting drug-target binding affinity with adaptive structure aware networks.","authors":"Weibin Ding, Shaohua Jiang, Ting Xu, Zhijian Lyu","doi":"10.1142/S0219720024500288","DOIUrl":"10.1142/S0219720024500288","url":null,"abstract":"<p><p>The prediction of drug-target affinity (DTA) is crucial for efficiently identifying potential targets for drug repurposing, thereby reducing resource wastage. In this paper, we propose a novel graph-based deep learning model for DTA that leverages adaptive structure-aware pooling for graph processing. Our approach integrates a self-attention mechanism with an enhanced graph neural network to capture the significance of each node in the graph, marking a significant advancement in graph feature extraction. Specifically, adjacent nodes in the 2D molecular graph are aggregated into clusters, with the features of these clusters weighted according to their attention scores to form the final molecular representation. In terms of model architecture, we utilize both global and hierarchical pooling, and assess the performance of the model on multiple benchmark datasets. The evaluation results on the KIBA dataset show that our model achieved the lowest mean squared error (MSE) of 0.126, which is a 0.5% reduction compared to the best-performing baseline method. Additionally, to validate the generalization capabilities of the model, we conduct comparative experiments on regression and binary classification tasks. The results demonstrate that our model outperforms previous models in both types of tasks.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":"22 6","pages":"2450028"},"PeriodicalIF":0.9,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143442501","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Research on similarity retrieval method based on mass spectral entropy.","authors":"Li-Ping Wu, Li Yong, Xiang Cheng, Yang Zhou","doi":"10.1142/S0219720024500276","DOIUrl":"10.1142/S0219720024500276","url":null,"abstract":"<p><p>Compound identification in small molecule research relies on comparing experimental mass spectra with mass spectral databases. However, unequal data lengths often lead to inefficient and inaccurate retrieval. Moreover, the similarity calculation methods used by commercial software have limitations. To address these issues, two mass spectrometry data processing methods namely the \"splicing-filling method\" and the \"matching-filling method\" have been proposed. In addition, an information entropy-based similarity calculation method for mass spectra is presented. The alignment method converts mass spectra of different lengths for unknown and known compounds into equal-length mass spectra, allowing more accurate calculation of similarities between mass spectra. Information entropy measurements are used to quantify the differences in intensity distributions in the aligned mass spectral data, which are then used to compare the degree of similarity between different mass spectra. The results of the example validation show that the two data alignment methods can effectively solve the problem of unequal lengths of mass spectral data in similarity calculation. The results of the mass spectral entropy method are reliable and suitable for the identification of mass spectra.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":"22 6","pages":"2450027"},"PeriodicalIF":0.9,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143442527","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mateusz Twardawa, Kaja Gutowska, Piotr Formanowicz
{"title":"Exploring relationship between hypercholesterolemia and instability of atherosclerotic plaque - An approach based on a matrix population model.","authors":"Mateusz Twardawa, Kaja Gutowska, Piotr Formanowicz","doi":"10.1142/S021972002450029X","DOIUrl":"10.1142/S021972002450029X","url":null,"abstract":"<p><p><b>Background:</b> Cardiovascular diseases have long been studied to identify their causal factors and counteract them effectively. Atherosclerosis, an inflammatory process of the blood vessel wall, is a common cardiovascular disease. Among the many well-known risk factors, hypercholesterolemia is undoubtedly a significant condition for atherosclerotic plaque formation and is linked to atherosclerosis on many levels, i.e. cell interactions, cytokines levels, diet, and lifestyle. Current studies suggest that controlling balance between proinflammatory (<i>M</i>1) and anti-inflammatory (<i>M</i>2) types of macrophages may be used for patient condition improvement and necrotic core reduction. <b>Methods:</b> This study considered the effects of hypercholesterolemia on the population dynamics of macrophages (<i>M</i>0, <i>M</i>1, <i>M</i>2, foam cells) in atherosclerotic plaque. A mathematical model using a matrix approach to population dynamics was proposed and tested in various scenarios. In order to check model sensitivity and variability associated with error propagation, the uncertainty analysis was performed based on the Monte Carlo approach. <b>Results:</b> Simulations of macrophage population dynamics provided the assessment of necrotic core development and plaque instability. Excess lipid levels emerged as the most critical factor for necrotic core development. However, plaque growth can be significantly slowed if macrophages and foam cells can maintain proper lipid levels. This balance may be disrupted by proinflammatory lipids that eventually will increase plaque size, what is also reflected by <i>M</i>1/<i>M</i>2 dynamics. <b>Conclusion:</b> Hypercholesterolemia accelerates atherosclerosis development, leading to earlier cardiovascular incidents. <i>In silico</i> results suggest that reducing lipid intake and portion of proinflammatory lipids is crucial to slowing plaque development and reducing rupture risk, all of which requires preserving fragile <i>M</i>1/<i>M</i>2 balance. Targeting the inflammatory microenvironment and macrophage polarization represents a promising approach for atherosclerosis management.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":"22 6","pages":"2450029"},"PeriodicalIF":0.9,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143442524","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rahim Berahmand, Masoumeh Emadpour, Mokhtar Jalali Javaran, Kaveh Haji-Allahverdipoor, Ali Akbarabadi
{"title":"Molecular dynamics simulations of ribosome-binding sites in theophylline-responsive riboswitch associated with improving the gene expression regulation in chloroplasts.","authors":"Rahim Berahmand, Masoumeh Emadpour, Mokhtar Jalali Javaran, Kaveh Haji-Allahverdipoor, Ali Akbarabadi","doi":"10.1142/S0219720024500239","DOIUrl":"10.1142/S0219720024500239","url":null,"abstract":"<p><p>The existence of an efficient inducible transgene expression system is a valuable tool in recombinant protein production. The synthetic theophylline-responsive riboswitch (theo.RS) can be replaced in the 5[Formula: see text] untranslated region of an mRNA and control the translation of downstream gene in chloroplasts in response to the binding with a ligand molecule, theophylline. One of the drawbacks associated with the efficiency of the theo.RS is the leak in the RS structure allowing undesired background translation when the switch is expected to be off. The purpose of this study was to detect the factors causing the leak of the theo.RS in the off mode, using molecular dynamics (MD) simulations the appropriate balancing of the simulation system, using the necessary commands, a 40[Formula: see text]ns simulation was conducted. Analysis of the solvent-accessible surface area for both ribosome-binding site (RBS) regions indicated that nucleotide 79 of the theo.RS, a guanine, had the highest surface exposure to ribosome access. These results were verified with the study of hydrogen bonding of RBS regions with the RNA structure. Therefore, redesigning the RBS regions and avoiding the unmasked nucleotide(s) in the structure may improve the tightness of theo.RS in off mode resulting in the efficient inhibition of translation.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":"22 5","pages":"2450023"},"PeriodicalIF":0.9,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142688505","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yan Li, Boran Wang, Zengding Wu, Shiliang Ji, Shi Xu, Caiyi Fei
{"title":"SAKit: An all-in-one analysis pipeline for identifying novel proteins resulting from variant events at both large and small scales.","authors":"Yan Li, Boran Wang, Zengding Wu, Shiliang Ji, Shi Xu, Caiyi Fei","doi":"10.1142/S0219720024500227","DOIUrl":"10.1142/S0219720024500227","url":null,"abstract":"<p><p><i>Background:</i> Genetic mutations that cause the inactivation or aberrant activation of essential proteins may trigger alterations or even dysfunctions in cellular signaling pathways, culminating in the development of precancerous lesions and cancer. Mutations and such dysfunctions can result in the generation of \"novel proteins\" that are not part of the conventional human proteome. Identification of these proteins carries a profound potential for unraveling promising drug targets and designing innovative therapeutic models. Despite the emergence of diverse tools for detecting DNA or RNA variants, facilitated by the widespread adoption of nucleotide sequencing technology, these methods primarily target point mutations and exhibit suboptimal performance in detecting large-scale and combinatorial mutations. Additionally, the outcomes of these tools are confined to the genome and transcriptome levels, and do not provide the corresponding protein information resulting from genetic alterations. <i>Results:</i> We present the development of Sequencing Analysis Kit (SAKit), a bioinformatics pipeline for hybrid sequencing analysis integrating long-read and short-read RNA sequencing data. Long reads are utilized for detecting large-scale variations such as gene fusions, exon skipping, intron retention, and aberrant expression in non-coding regions, owing to their excellent coverage capabilities. Short reads serve to validate these findings at breakpoints and splice junctions. Conversely, short reads are employed for identifying small-scale variations, including single nucleotide variants, deletions, and insertions, due to their superior sequencing depth, with long reads providing additional validation. SAKit is designed to perform analyses using inter-species configuration files comprising genome references and annotation data, making it applicable to both human and mouse studies. Furthermore, SAKit implements a hierarchical filtering approach to eliminate low-confidence variants and employs open reading frame (ORF) analysis to translate identified variants into protein sequences. <i>Conclusion:</i> SAKit is a robust and versatile bioinformatics tool designed for the comprehensive identification of both large-scale and small-scale variants from RNA-seq data, facilitating the discovery of novel proteins. This pipeline integrates analysis of long-read and short-read sequencing data, offering a powerful solution for researchers in genomics and transcriptomics. SAKit is freely accessible and open-source, available through GitHub (https://github.com/therarna/SAKit) and as a Docker image https://hub.docker.com/repository/docker/therarna). Implemented primarily within a Snakemake framework using Python, SAKit ensures reproducibility, scalability, and ease of use for the scientific community.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":"22 5","pages":"2450022"},"PeriodicalIF":0.9,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142688766","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Improving drug-target interaction prediction through dual-modality fusion with InteractNet.","authors":"Baozhong Zhu, Runhua Zhang, Tengsheng Jiang, Zhiming Cui, Jing Chen, Hongjie Wu","doi":"10.1142/S0219720024500240","DOIUrl":"10.1142/S0219720024500240","url":null,"abstract":"<p><p>In the drug discovery process, accurate prediction of drug-target interactions is crucial to accelerate the development of new drugs. However, existing methods still face many challenges in dealing with complex biomolecular interactions. To this end, we propose a new deep learning framework that combines the structural information and sequence features of proteins to provide comprehensive feature representation through bimodal fusion. This framework not only integrates the topological adaptive graph convolutional network and multi-head attention mechanism, but also introduces a self-masked attention mechanism to ensure that each protein binding site can focus on its own unique features and its interaction with the ligand. Experimental results on multiple public datasets show that our method significantly outperforms traditional machine learning and graph neural network methods in predictive performance. In addition, our method can effectively identify and explain key molecular interactions, providing new insights into understanding the complex relationship between drugs and targets.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":"22 5","pages":"2450024"},"PeriodicalIF":0.9,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142689319","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}