Tasmin Karim , Md Shazzad Hossain Shaon , Md Mamun Ali , Sobhy M. Ibrahim , Mst Shapna Akter , Kawsar Ahmed , Francis M. Bui , Li Chen , Mohammad Ali Moni
{"title":"K-SNOpred: Identification of protein S-nitrosylation sites through word embedding features and machine learning","authors":"Tasmin Karim , Md Shazzad Hossain Shaon , Md Mamun Ali , Sobhy M. Ibrahim , Mst Shapna Akter , Kawsar Ahmed , Francis M. Bui , Li Chen , Mohammad Ali Moni","doi":"10.1016/j.ab.2025.115952","DOIUrl":null,"url":null,"abstract":"<div><div>Protein S-nitrosylation (SNO) is a process involving the covalent modification of cysteine residues by nitric oxide (NO) and its derivatives. Numerous studies have demonstrated that SNO is significantly involved in cell function and pathophysiology. The identification of SNO sites is significant in clarifying their function in cellular physiology, disease processes, and potential treatment strategies, rendering it of paramount importance in medical science. This study developed a machine learning (ML) model named “K-SNOpred” and found notable performance in identifying SNO sites using the Latent Semantic Analysis (LSA) feature embedding system. After collecting dbSNO and RecSNO datasets from the literature search, we applied three feature embedding systems: Doc2vec, FastText, and LSA on each dataset. The study employed various ML models and assessed their performance using multiple evaluation metrics through independent testing and 10-fold cross-validation. The evaluation's outcomes demonstrate that the proposed model achieved an accuracy of 87.56 % and an AUC score of 95.06 %, outperforming existing state-of-the-art (SOTA) models by nearly 10 % in accuracy and 6 % in AUC. Furthermore, the model demonstrated balanced sensitivity and specificity, indicating its ability to detect both positive and negative SNO sites accurately. The outstanding performance of the K-SNOpred model demonstrates its high potential for clinical use and its applicability in the biotechnology field.</div></div>","PeriodicalId":7830,"journal":{"name":"Analytical biochemistry","volume":"707 ","pages":"Article 115952"},"PeriodicalIF":2.5000,"publicationDate":"2025-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Analytical biochemistry","FirstCategoryId":"99","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0003269725001915","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0
Abstract
Protein S-nitrosylation (SNO) is a process involving the covalent modification of cysteine residues by nitric oxide (NO) and its derivatives. Numerous studies have demonstrated that SNO is significantly involved in cell function and pathophysiology. The identification of SNO sites is significant in clarifying their function in cellular physiology, disease processes, and potential treatment strategies, rendering it of paramount importance in medical science. This study developed a machine learning (ML) model named “K-SNOpred” and found notable performance in identifying SNO sites using the Latent Semantic Analysis (LSA) feature embedding system. After collecting dbSNO and RecSNO datasets from the literature search, we applied three feature embedding systems: Doc2vec, FastText, and LSA on each dataset. The study employed various ML models and assessed their performance using multiple evaluation metrics through independent testing and 10-fold cross-validation. The evaluation's outcomes demonstrate that the proposed model achieved an accuracy of 87.56 % and an AUC score of 95.06 %, outperforming existing state-of-the-art (SOTA) models by nearly 10 % in accuracy and 6 % in AUC. Furthermore, the model demonstrated balanced sensitivity and specificity, indicating its ability to detect both positive and negative SNO sites accurately. The outstanding performance of the K-SNOpred model demonstrates its high potential for clinical use and its applicability in the biotechnology field.
期刊介绍:
The journal''s title Analytical Biochemistry: Methods in the Biological Sciences declares its broad scope: methods for the basic biological sciences that include biochemistry, molecular genetics, cell biology, proteomics, immunology, bioinformatics and wherever the frontiers of research take the field.
The emphasis is on methods from the strictly analytical to the more preparative that would include novel approaches to protein purification as well as improvements in cell and organ culture. The actual techniques are equally inclusive ranging from aptamers to zymology.
The journal has been particularly active in:
-Analytical techniques for biological molecules-
Aptamer selection and utilization-
Biosensors-
Chromatography-
Cloning, sequencing and mutagenesis-
Electrochemical methods-
Electrophoresis-
Enzyme characterization methods-
Immunological approaches-
Mass spectrometry of proteins and nucleic acids-
Metabolomics-
Nano level techniques-
Optical spectroscopy in all its forms.
The journal is reluctant to include most drug and strictly clinical studies as there are more suitable publication platforms for these types of papers.