Huixian Chen , Yun Zuo , Xiangrong Liu , Xiangxiang Zeng , Zhaohong Deng , Jiasong Wu
{"title":"PreRBP:基于注意机制的rna -蛋白结合位点预测的可解释深度学习。","authors":"Huixian Chen , Yun Zuo , Xiangrong Liu , Xiangxiang Zeng , Zhaohong Deng , Jiasong Wu","doi":"10.1016/j.ab.2025.115968","DOIUrl":null,"url":null,"abstract":"<div><div>In the complex process of gene expression and regulation, RNA-binding proteins occupy a pivotal position for RNA. Accurate prediction of RNA-protein binding sites can help researchers better understand RNA-binding proteins and their related mechanisms. And prediction techniques based on machine learning algorithms are both cost-effective and efficient in identifying these binding sites. However, there are some shortcomings in the currently available machine learning methods, such as the input features of the model only consider RNA sequence features, and most of the datasets suffer from class imbalance. To address these issues, this study first uses the publicly available 27 RNA-protein binding site datasets to construct a benchmark dataset. Then, we use RNAshapes and EDeN to obtain the secondary structure of RNA. Higher-order encoding method is used to extract the key information hidden in the RNA sequences and structures. In order to solve the class imbalance problem existing in the dataset, this study utilizes four undersampling algorithms, namely, random undersampling, NearMiss, ENN, and one-sided selection, to remove redundant samples in the negative samples, and lastly, based on Convolutional Neural Network, Bidirectional Long and Short Term Memory Network, this study constructs model PreRBP to predict RNA-protein binding sites.</div><div>The experimental results show that the model used in this study has an average AUC of 0.88, which is higher than other existing RNA-protein binding site prediction methods. Also, for the convenience of prediction, an online predictor is developed in this study. The predictor and experimental codes are available at <span><span>https://github.com/B12-Comet/RBPPrediction</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":7830,"journal":{"name":"Analytical biochemistry","volume":"707 ","pages":"Article 115968"},"PeriodicalIF":2.5000,"publicationDate":"2025-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"PreRBP: Interpretable deep learning for RNA-protein binding site prediction with attention mechanism\",\"authors\":\"Huixian Chen , Yun Zuo , Xiangrong Liu , Xiangxiang Zeng , Zhaohong Deng , Jiasong Wu\",\"doi\":\"10.1016/j.ab.2025.115968\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>In the complex process of gene expression and regulation, RNA-binding proteins occupy a pivotal position for RNA. Accurate prediction of RNA-protein binding sites can help researchers better understand RNA-binding proteins and their related mechanisms. And prediction techniques based on machine learning algorithms are both cost-effective and efficient in identifying these binding sites. However, there are some shortcomings in the currently available machine learning methods, such as the input features of the model only consider RNA sequence features, and most of the datasets suffer from class imbalance. To address these issues, this study first uses the publicly available 27 RNA-protein binding site datasets to construct a benchmark dataset. Then, we use RNAshapes and EDeN to obtain the secondary structure of RNA. Higher-order encoding method is used to extract the key information hidden in the RNA sequences and structures. In order to solve the class imbalance problem existing in the dataset, this study utilizes four undersampling algorithms, namely, random undersampling, NearMiss, ENN, and one-sided selection, to remove redundant samples in the negative samples, and lastly, based on Convolutional Neural Network, Bidirectional Long and Short Term Memory Network, this study constructs model PreRBP to predict RNA-protein binding sites.</div><div>The experimental results show that the model used in this study has an average AUC of 0.88, which is higher than other existing RNA-protein binding site prediction methods. Also, for the convenience of prediction, an online predictor is developed in this study. The predictor and experimental codes are available at <span><span>https://github.com/B12-Comet/RBPPrediction</span><svg><path></path></svg></span>.</div></div>\",\"PeriodicalId\":7830,\"journal\":{\"name\":\"Analytical biochemistry\",\"volume\":\"707 \",\"pages\":\"Article 115968\"},\"PeriodicalIF\":2.5000,\"publicationDate\":\"2025-09-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Analytical biochemistry\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0003269725002076\",\"RegionNum\":4,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"BIOCHEMICAL RESEARCH METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Analytical biochemistry","FirstCategoryId":"99","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0003269725002076","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
PreRBP: Interpretable deep learning for RNA-protein binding site prediction with attention mechanism
In the complex process of gene expression and regulation, RNA-binding proteins occupy a pivotal position for RNA. Accurate prediction of RNA-protein binding sites can help researchers better understand RNA-binding proteins and their related mechanisms. And prediction techniques based on machine learning algorithms are both cost-effective and efficient in identifying these binding sites. However, there are some shortcomings in the currently available machine learning methods, such as the input features of the model only consider RNA sequence features, and most of the datasets suffer from class imbalance. To address these issues, this study first uses the publicly available 27 RNA-protein binding site datasets to construct a benchmark dataset. Then, we use RNAshapes and EDeN to obtain the secondary structure of RNA. Higher-order encoding method is used to extract the key information hidden in the RNA sequences and structures. In order to solve the class imbalance problem existing in the dataset, this study utilizes four undersampling algorithms, namely, random undersampling, NearMiss, ENN, and one-sided selection, to remove redundant samples in the negative samples, and lastly, based on Convolutional Neural Network, Bidirectional Long and Short Term Memory Network, this study constructs model PreRBP to predict RNA-protein binding sites.
The experimental results show that the model used in this study has an average AUC of 0.88, which is higher than other existing RNA-protein binding site prediction methods. Also, for the convenience of prediction, an online predictor is developed in this study. The predictor and experimental codes are available at https://github.com/B12-Comet/RBPPrediction.
期刊介绍:
The journal''s title Analytical Biochemistry: Methods in the Biological Sciences declares its broad scope: methods for the basic biological sciences that include biochemistry, molecular genetics, cell biology, proteomics, immunology, bioinformatics and wherever the frontiers of research take the field.
The emphasis is on methods from the strictly analytical to the more preparative that would include novel approaches to protein purification as well as improvements in cell and organ culture. The actual techniques are equally inclusive ranging from aptamers to zymology.
The journal has been particularly active in:
-Analytical techniques for biological molecules-
Aptamer selection and utilization-
Biosensors-
Chromatography-
Cloning, sequencing and mutagenesis-
Electrochemical methods-
Electrophoresis-
Enzyme characterization methods-
Immunological approaches-
Mass spectrometry of proteins and nucleic acids-
Metabolomics-
Nano level techniques-
Optical spectroscopy in all its forms.
The journal is reluctant to include most drug and strictly clinical studies as there are more suitable publication platforms for these types of papers.