Irfan Khan, Muhammad Arif, Ali Ghulam, Somayah Albaradei, Maha A. Thafar, Apilak Worachartcheewan
{"title":"Improved in Silico Identification of Protein-Protein Interactions Using Deep Learning Approach","authors":"Irfan Khan, Muhammad Arif, Ali Ghulam, Somayah Albaradei, Maha A. Thafar, Apilak Worachartcheewan","doi":"10.1049/syb2.70008","DOIUrl":null,"url":null,"abstract":"<p>Protein–protein interactions (PPIs) perform significant functions in many biological activities likewise gene regulation, metabolic pathways and signal transduction. The deregulation of PPIs may cause deadly diseases, such as cancer, autoimmune, pernicious anaemia etc. Detecting PPIs can aid in elucidating the cellular process's underlying molecular mechanisms and contribute to facilitating the discovery of new proteins for the development of novel drugs. Although high-throughput wet-lab technologies have been matured to identify large scale PPI identification; however, the traditional experimental methods are costly and slow and resource intensive. To support experimental techniques, numerous computational approaches have been emerged for identifying PPIs solely from protein sequences. However, the performance of available PPI tools are unsatisfactory and gaps remain for further improvement. In this study, a novel deep learning-based model, Deep_PPI, was developed for predicting multiple species PPIs. To extract the biological features, the authors used 21D vector representing 20 kinds' native and one special amino acid residue and implemented the Keras binary profile encoding technique to formulate each residue in proteins. The binary profile use the PaddVal strategy to equalise the length of positive and negative PPIs. After extracting the features, the authors fed them into one dimension convolutional neural network to build the final prediction model. The proposed Deep_PPI model, which consider the protein pairs into two convolutional heads. Finally, the authors concatenated the two outputs were concatenated from two branches concatenated by fully connected layer. The efficiency of the proposed predictor was demonstrated both on the cross validation and tested on various species datasets, for example, that is (Human, <i>C. elegans</i>, <i>E. coli</i>, and <i>H. sapiens</i>). The proposed model surpassed both the machine-learning models and existing state-of-the-art PPI methods. The proposed Deep_PPI will serve as valuable tool in the discovery of large-scale PPIs in particular and provide insights for drugs development in general.</p>","PeriodicalId":50379,"journal":{"name":"IET Systems Biology","volume":"19 1","pages":""},"PeriodicalIF":1.9000,"publicationDate":"2025-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/syb2.70008","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IET Systems Biology","FirstCategoryId":"99","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1049/syb2.70008","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"CELL BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Protein–protein interactions (PPIs) perform significant functions in many biological activities likewise gene regulation, metabolic pathways and signal transduction. The deregulation of PPIs may cause deadly diseases, such as cancer, autoimmune, pernicious anaemia etc. Detecting PPIs can aid in elucidating the cellular process's underlying molecular mechanisms and contribute to facilitating the discovery of new proteins for the development of novel drugs. Although high-throughput wet-lab technologies have been matured to identify large scale PPI identification; however, the traditional experimental methods are costly and slow and resource intensive. To support experimental techniques, numerous computational approaches have been emerged for identifying PPIs solely from protein sequences. However, the performance of available PPI tools are unsatisfactory and gaps remain for further improvement. In this study, a novel deep learning-based model, Deep_PPI, was developed for predicting multiple species PPIs. To extract the biological features, the authors used 21D vector representing 20 kinds' native and one special amino acid residue and implemented the Keras binary profile encoding technique to formulate each residue in proteins. The binary profile use the PaddVal strategy to equalise the length of positive and negative PPIs. After extracting the features, the authors fed them into one dimension convolutional neural network to build the final prediction model. The proposed Deep_PPI model, which consider the protein pairs into two convolutional heads. Finally, the authors concatenated the two outputs were concatenated from two branches concatenated by fully connected layer. The efficiency of the proposed predictor was demonstrated both on the cross validation and tested on various species datasets, for example, that is (Human, C. elegans, E. coli, and H. sapiens). The proposed model surpassed both the machine-learning models and existing state-of-the-art PPI methods. The proposed Deep_PPI will serve as valuable tool in the discovery of large-scale PPIs in particular and provide insights for drugs development in general.
期刊介绍:
IET Systems Biology covers intra- and inter-cellular dynamics, using systems- and signal-oriented approaches. Papers that analyse genomic data in order to identify variables and basic relationships between them are considered if the results provide a basis for mathematical modelling and simulation of cellular dynamics. Manuscripts on molecular and cell biological studies are encouraged if the aim is a systems approach to dynamic interactions within and between cells.
The scope includes the following topics:
Genomics, transcriptomics, proteomics, metabolomics, cells, tissue and the physiome; molecular and cellular interaction, gene, cell and protein function; networks and pathways; metabolism and cell signalling; dynamics, regulation and control; systems, signals, and information; experimental data analysis; mathematical modelling, simulation and theoretical analysis; biological modelling, simulation, prediction and control; methodologies, databases, tools and algorithms for modelling and simulation; modelling, analysis and control of biological networks; synthetic biology and bioengineering based on systems biology.