Current Bioinformatics最新文献

筛选
英文 中文
A Comparative Review and Analysis of Computational Predictors forIdentification of Enhancer and their Strength 用于识别增强子及其强度的计算预测因子的比较研究与分析
IF 4 3区 生物学
Current Bioinformatics Pub Date : 2024-06-04 DOI: 10.2174/0115748936285942240513064919
Mehwish Gill, Muhammad Kabir, Saeed Ahmed, Muhammad Asif Subhani, Maqsood Hayat
{"title":"A Comparative Review and Analysis of Computational Predictors for\u0000Identification of Enhancer and their Strength","authors":"Mehwish Gill, Muhammad Kabir, Saeed Ahmed, Muhammad Asif Subhani, Maqsood Hayat","doi":"10.2174/0115748936285942240513064919","DOIUrl":"https://doi.org/10.2174/0115748936285942240513064919","url":null,"abstract":"\u0000\u0000Enhancers are the short functional regions (50–1500bp) in the genome, which play an\u0000effective character in activating gene-transcription in the presence of transcription-factors (TFs).\u0000Many human diseases, such as cancer and inflammatory bowel disease, are correlated with the enhancers’\u0000genetic variations. The precise recognition of the enhancers provides useful insights for\u0000understanding the pathogenesis of human diseases and their treatments. High-throughput experiments\u0000are considered essential tools for characterizing enhancers; however, these methods are laborious,\u0000costly and time-consuming. Computational methods are considered alternative solutions for\u0000accurate and rapid identification of the enhancers. Over the past years, numerous computational\u0000predictors have been devised for predicting enhancers and their strength. A comprehensive review\u0000and thorough assessment are indispensable to systematically compare sequence-based enhancer’s\u0000bioinformatics tools on their performance. Giving the increasing interest in this domain, we conducted\u0000a large-scale analysis and assessment of the state-of-the-art enhancer predictors to evaluate\u0000their scalability and generalization power. Additionally, we classified the existing approaches into\u0000three main groups: conventional machine-learning, ensemble and deep learning-based approaches.\u0000Furthermore, the study has focused on exploring the important factors that are crucial for developing\u0000precise and reliable predictors such as designing trusted benchmark/independent datasets, feature\u0000representation schemes, feature selection methods, classification strategies, evaluation metrics\u0000and webservers. Finally, the insights from this review are expected to provide important guidelines\u0000to the research community and pharmaceutical companies in general and high-throughput tools for\u0000the detection and characterization of enhancers in particular.\u0000","PeriodicalId":10801,"journal":{"name":"Current Bioinformatics","volume":null,"pages":null},"PeriodicalIF":4.0,"publicationDate":"2024-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141387227","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multimodal Deep Learning for Cancer Survival Prediction: A Review 用于癌症生存预测的多模态深度学习:综述
IF 4 3区 生物学
Current Bioinformatics Pub Date : 2024-05-31 DOI: 10.2174/0115748936289033240424071522
Ge Zhang, Chenwei Ma, Chaokun Yan, Huimin Luo, Jianlin Wang, Wenjuan Liang, Junwei Luo
{"title":"Multimodal Deep Learning for Cancer Survival Prediction: A Review","authors":"Ge Zhang, Chenwei Ma, Chaokun Yan, Huimin Luo, Jianlin Wang, Wenjuan Liang, Junwei Luo","doi":"10.2174/0115748936289033240424071522","DOIUrl":"https://doi.org/10.2174/0115748936289033240424071522","url":null,"abstract":"Background:: Cancer has emerged as the \"leading killer\" of human health. Survival prediction is a crucial branch of cancer prognosis. It aims to estimate patients' survival risk based on their disease conditions. Accurate and efficient survival prediction is vital in cancer patients' treatment and clinical management, preventing unnecessary suffering and conserving precious medical resources. Deep learning has been extensively applied in cancer diagnosis, prognosis, and treatment management. The decreasing cost of next-generation sequencing, continuous development of related databases, and in-depth research on multimodal deep learning have provided opportunities for establishing more functionally rich and accurate survival prediction models. Objective:: The current area of cancer survival prediction still lacks a review of multimodal deep learning methods. Methods:: We conducted a statistical analysis of the relevant research on multimodal deep learning for cancer survival prediction. We first filtered keywords from 6 known relevant papers. Then, we searched PubMed and Google Scholar for relevant publications from 2018 to 2022 using \"Multimodal\", \"Deep Learning\" and \"Cancer Survival Prediction\" as keywords. Then, we further searched the related publications through the backward and forward citation search. Subsequently, we conducted a detailed analysis and review of these studies based on their datasets and methods. Results:: We present a comprehensive systematic review of the multimodal deep learning research on cancer survival prediction from 2018 to 2022. Conclusion:: Multimodal deep learning has demonstrated powerful data aggregation capabilities and excellent performance in improving cancer survival prediction greatly. It has made a significant positive impact on facilitating the advancement of automated cancer diagnosis and precision oncology.","PeriodicalId":10801,"journal":{"name":"Current Bioinformatics","volume":null,"pages":null},"PeriodicalIF":4.0,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141198177","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhancing Drug Peptide Sequence Prediction Using Multi-view Feature Fusion Learning 利用多视角特征融合学习加强药物多肽序列预测
IF 4 3区 生物学
Current Bioinformatics Pub Date : 2024-05-27 DOI: 10.2174/0115748936294345240510112941
Junyu Zhang, Ronglin Lu, Hongmei Zhou, Xinbo Jiang
{"title":"Enhancing Drug Peptide Sequence Prediction Using Multi-view Feature Fusion Learning","authors":"Junyu Zhang, Ronglin Lu, Hongmei Zhou, Xinbo Jiang","doi":"10.2174/0115748936294345240510112941","DOIUrl":"https://doi.org/10.2174/0115748936294345240510112941","url":null,"abstract":"Background: Currently, various types of peptides have broad implications for human health and disease. Some drug peptides play significant roles in sensory science, drug research, and cancer biology. The prediction and classification of peptide sequences are of significant importance to various industries. However, predicting peptide sequences through biological experiments is a time-consuming and expensive process. Moreover, the task of protein sequence classification and prediction faces challenges due to the high dimensionality, nonlinearity, and irregularity of protein sequence data, along with the presence of numerous unknown or unlabeled protein sequences. Therefore, an accurate and efficient method for predicting peptide classification is necessary. Methods: In our work, we used two pre-trained models to extract sequence features, TextCNN (Convolutional Neural Networks for Text Classification) and Transformer. We extracted the overall semantic information of the sequences using Transformer Encoder and extracted the local semantic information between sequences using TextCNN and concatenated them into a new feature. Finally, we used the concatenated feature for classification prediction. To validate this approach, we conducted experiments on the BP dataset, THP dataset and DPP-IV dataset and compared them with some pre-trained models. Results: Since TextCNN and Transformer Encoder extract features from different perspectives, the concatenated feature contains multi-view information, which improves the accuracy of the peptide predictor. Conclusion: Ultimately, our model demonstrated superior metrics, highlighting its efficacy in peptide sequence prediction and classification.","PeriodicalId":10801,"journal":{"name":"Current Bioinformatics","volume":null,"pages":null},"PeriodicalIF":4.0,"publicationDate":"2024-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141168940","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Exploratory Review on Recent Computational Approaches Devised for MiRNA Disease Association Prediction 最新 MiRNA 疾病关联预测计算方法探索性综述
IF 4 3区 生物学
Current Bioinformatics Pub Date : 2024-05-20 DOI: 10.2174/0115748936293219240426051148
S Sujamol, E R Vimina, U. Krishnakumar
{"title":"An Exploratory Review on Recent Computational Approaches Devised for MiRNA Disease Association Prediction","authors":"S Sujamol, E R Vimina, U. Krishnakumar","doi":"10.2174/0115748936293219240426051148","DOIUrl":"https://doi.org/10.2174/0115748936293219240426051148","url":null,"abstract":"\u0000\u0000Recent evidence demonstrated the fundamental role of miRNAs as disease biomarkers\u0000and their role in disease progression and pathology. Identifying disease related miRNAs using computational\u0000approaches has become one of the trending topics in health informatics. Many biological\u0000databases and online tools were developed for uncovering novel disease-related miRNAs. Hence, a\u0000brief overview regarding the disease biomarkers, miRNAs as disease biomarkers and their role in\u0000complex disorders is given here. Various methods for calculating miRNA and disease similarities are\u0000included and the existing machine learning and network based computational approaches for detecting\u0000disease associated miRNAs are reviewed along with the benchmark dataset used. Finally, the\u0000performance matrices, validation measures and online tools used for miRNA Disease Association\u0000(MDA) predictions are also outlined.\u0000","PeriodicalId":10801,"journal":{"name":"Current Bioinformatics","volume":null,"pages":null},"PeriodicalIF":4.0,"publicationDate":"2024-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141122550","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Validating the Distinctiveness of the Omicron Lineage within the SARSCov-2 based on Protein Language Models 基于蛋白质语言模型验证 SARSCov-2 中 Omicron 系的独特性
IF 4 3区 生物学
Current Bioinformatics Pub Date : 2024-04-30 DOI: 10.2174/0115748936291075240409080924
Ke Dong, Jingyang Gao
{"title":"Validating the Distinctiveness of the Omicron Lineage within the SARSCov-2 based on Protein Language Models","authors":"Ke Dong, Jingyang Gao","doi":"10.2174/0115748936291075240409080924","DOIUrl":"https://doi.org/10.2174/0115748936291075240409080924","url":null,"abstract":"Introduction: Variants of concern were identified in severe acute respiratory syndrome coronavirus 2, namely Alpha, Beta, Gamma, Delta, and Omicron. This study explores the mutations of the Omicron lineage and its differences from other lineages through a protein language model. Methods: By inputting the severe acute respiratory syndrome coronavirus 2 wild-type sequence into the protein language model evolving pre-trained models-1v, this study obtained the score for each position mutating to other amino acids and calculated the overall trend of a new variant of concern mutation scores. objective: Analyze the differences in the number of Omicron amino acid mutations compared to the other four VOC mutations using statistical methods, and use the protein language model esm-1v to analyze the specificity of Omicron amino acid mutations. Results: It is found that when the proportion of unobserved mutations to observed mutations is 4:15, Omicron still generates a large number of newly emerging mutations. It was found that the overall score for the Omicron family is low, and the overall ranking for the Omicron family is low. Conclusion: Mutations in the Omicron lineage are different from amino acid mutations in other lineages. The findings of this paper deepen the understanding of the spatial distribution of spike protein amino acid mutations and overall trends of newly emerging mutations corresponding to different variants of concern. This also provides insights into simulating the evolution of the Omicron lineage.","PeriodicalId":10801,"journal":{"name":"Current Bioinformatics","volume":null,"pages":null},"PeriodicalIF":4.0,"publicationDate":"2024-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140830720","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Comparative Analysis of Deep Generative Model for Industrial Enzyme Design 用于工业酶设计的深度生成模型对比分析
IF 4 3区 生物学
Current Bioinformatics Pub Date : 2024-04-16 DOI: 10.2174/0115748936303223240404043202
Beibei Zhang, Qiaozhen Meng, Chengwei Ai, Guihua Duan, Ercheng Wang, Fei Guo
{"title":"Comparative Analysis of Deep Generative Model for Industrial Enzyme Design","authors":"Beibei Zhang, Qiaozhen Meng, Chengwei Ai, Guihua Duan, Ercheng Wang, Fei Guo","doi":"10.2174/0115748936303223240404043202","DOIUrl":"https://doi.org/10.2174/0115748936303223240404043202","url":null,"abstract":": Although enzymes have the advantage of efficient catalysis, natural enzymes lack stability in industrial environments and do not even meet the required catalytic reactions. This prompted us to urgently de novo design new enzymes. Computational design is a powerful tool, allowing rapid and efficient exploration of sequence space and facilitating the design of novel enzymes tailored to specific conditions and requirements. It is beneficial to de novo design industrial enzymes using computational methods. Currently, only one tool explicitly designed for the enzyme-only generation performs unsatisfactorily. We have selected several general protein sequence design tools and systematically evaluated their effectiveness when applied to specific industrial enzymes. We investigated the literature related to protein generation. We summarized the computational methods used for sequence generation into three categories: structure-conditional sequence generation, sequence generation without structural constraints, and co-generation of sequence and structure. To effectively evaluate the ability of six computational tools to generate enzyme sequences, we first constructed a luciferase dataset named Luc_64. Then we assessed the quality of enzyme sequences generated by these methods on this dataset, including amino acid distribution, EC number validation, etc. We also assessed sequences generated by structure-based methods on existing public datasets using sequence recovery rates and root-mean-square deviation (RMSD) from a sequence and structure perspective. In the functionality dataset, Luc_64, ABACUS-R, and ProteinMPNN stood out for producing sequences with amino acid distributions and functionalities closely matching those of naturally occurring luciferase enzymes, suggesting their effectiveness in preserving essential enzymatic characteristics. Across both benchmark datasets, ABACUS-R and ProteinMPNN, have also exhibited the highest sequence recovery rates, indicating their superior ability to generate sequences closely resembling the original enzyme structures. Our study provides a crucial reference for researchers selecting appropriate enzyme sequence design tools, highlighting the strengths and limitations of each tool in generating accurate and functional enzyme sequences. ProteinMPNN and ABACUS-R emerged as the most effective tools in our evaluation, offering high accuracy in sequence recovery and RMSD and maintaining the functional integrity of enzymes through accurate amino acid distribution. Meanwhile, the performance of protein general tools for migration to specific industrial enzymes was fairly evaluated on our specific industrial enzyme benchmark.","PeriodicalId":10801,"journal":{"name":"Current Bioinformatics","volume":null,"pages":null},"PeriodicalIF":4.0,"publicationDate":"2024-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140582835","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Effective Method to Identify Cooperation Driver Gene Sets 识别合作驱动基因组的有效方法
IF 4 3区 生物学
Current Bioinformatics Pub Date : 2024-04-15 DOI: 10.2174/0115748936293238240313081211
Wei Zhang, Yifu Zeng, Bihai Zhao, Jie Xiong, Tuanfei Zhu, Jingjing Wang, Guiji Li, Lei Wang
{"title":"An Effective Method to Identify Cooperation Driver Gene Sets","authors":"Wei Zhang, Yifu Zeng, Bihai Zhao, Jie Xiong, Tuanfei Zhu, Jingjing Wang, Guiji Li, Lei Wang","doi":"10.2174/0115748936293238240313081211","DOIUrl":"https://doi.org/10.2174/0115748936293238240313081211","url":null,"abstract":"Background: In cancer genomics research, identifying driver genes is a challenging task. Detecting cancer-driver genes can further our understanding of cancer risk factors and promote the development of personalized treatments. Gene mutations show mutual exclusivity and cooccur, and most of the existing methods focus on identifying driver pathways or driver gene sets through the study of mutual exclusivity, that is functionally redundant gene sets. Moreover, less research on cooperation genes with co-occurring mutations has been conducted. Objective: We propose an effective method that combines the two characteristics of genes, cooccurring mutations and the coordinated regulation of proliferation genes, to explore cooperation driver genes. Methods: This study is divided into three stages: (1) constructing a binary gene mutation matrix; (2) combining mutation co-occurrence characteristics to identify the candidate cooperation gene sets; and (3) constructing a gene regulation network to screen the cooperation gene sets that perform synergistically regulating proliferation. Results: The method performance is evaluated on three TCGA cancer datasets, and the experiments showed that it can detect effective cooperation driver gene sets. In further investigations, it was determined that the discovered set of co-driver genes could be used to generate prognostic classifications, which could be biologically significant and provide complementary information to the cancer genome. Conclusion: Our approach is effective in identifying sets of cancer cooperation driver genes, and the results can be used as clinical markers to stratify patients.","PeriodicalId":10801,"journal":{"name":"Current Bioinformatics","volume":null,"pages":null},"PeriodicalIF":4.0,"publicationDate":"2024-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140582893","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Integrated Somatic Mutation Network Diffusion Model for Stratification of Breast Cancer into Different Metabolic Mutation Subtypes 将乳腺癌分为不同代谢突变亚型的体细胞突变网络扩散综合模型
IF 4 3区 生物学
Current Bioinformatics Pub Date : 2024-04-15 DOI: 10.2174/0115748936298012240322091111
Dongqing Su, Honghao Li, Tao Wang, Min Zou, Haodong Wei, Yuqiang Xiong, Hongmei Sun, Shiyuan Wang, Qilemuge Xi, Yongchun Zuo, Lei Yang
{"title":"Integrated Somatic Mutation Network Diffusion Model for Stratification of Breast Cancer into Different Metabolic Mutation Subtypes","authors":"Dongqing Su, Honghao Li, Tao Wang, Min Zou, Haodong Wei, Yuqiang Xiong, Hongmei Sun, Shiyuan Wang, Qilemuge Xi, Yongchun Zuo, Lei Yang","doi":"10.2174/0115748936298012240322091111","DOIUrl":"https://doi.org/10.2174/0115748936298012240322091111","url":null,"abstract":"Background: Mutations in metabolism-related genes in somatic cells potentially lead to disruption of metabolic pathways, which results in patients exhibiting different molecular and pathological features. background: Mutations in metabolism-related genes in somatic cells potentially lead to disruption of metabolic pathways, which results in patients exhibiting different molecular and pathological features. Objective: In this study, we focused on somatic mutation data to investigate the significance of metabolic mutation typing in guiding the prognosis and treatment of breast cancer patients. objective: In this study, we focused on somatic mutation data to investigate the significance of metabolic mutation typing in guiding the prognosis and treatment of breast cancer patients. Methods: The somatic mutation profile of breast cancer patients was analyzed and smoothed by utilizing a network diffusion model within the protein-protein interaction network to construct a comprehensive somatic mutation network diffusion profile. Subsequently, a deep clustering approach was employed to explore metabolic mutation typing in breast cancer based on integrated metabolic pathway information and the somatic mutation network diffusion profile. In addition, we employed deep neural networks and machine learning prediction models to assess the feasibility of predicting drug responses through somatic mutation network diffusion profiles. Results: Significant differences in prognosis and metabolic heterogeneity were observed among the different metabolic mutation subtypes, characterized by distinct alterations in metabolic pathways and genetic mutations, and these mutational features offered potential targets for subtype-specific therapies. Furthermore, there was a strong consistency between the results of the drug response prediction model constructed on the somatic mutation network diffusion profile and the actual observed drug responses. Conclusion: Metabolic mutation typing of cancer assists in guiding patient prognosis and treatment.","PeriodicalId":10801,"journal":{"name":"Current Bioinformatics","volume":null,"pages":null},"PeriodicalIF":4.0,"publicationDate":"2024-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140582916","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
GB5mCPred: Cross-species 5mc Site Predictor Based on Bootstrap-based Stochastic Gradient Boosting Method for Poaceae GB5mCPred:基于 Bootstrap 的随机梯度提升法的 Poaceae 跨物种 5mc 位点预测器
IF 4 3区 生物学
Current Bioinformatics Pub Date : 2024-04-15 DOI: 10.2174/0115748936285544231221113226
Dipro Sinha, Tanwy Dasmandal, Md Yeasin, D.C Mishra, Anil Rai, Sunil Archak
{"title":"GB5mCPred: Cross-species 5mc Site Predictor Based on Bootstrap-based Stochastic Gradient Boosting Method for Poaceae","authors":"Dipro Sinha, Tanwy Dasmandal, Md Yeasin, D.C Mishra, Anil Rai, Sunil Archak","doi":"10.2174/0115748936285544231221113226","DOIUrl":"https://doi.org/10.2174/0115748936285544231221113226","url":null,"abstract":"Background: One of the most prevalent epigenetic alterations in all three kingdoms of life is 5mC, which plays a part in a wide range of biological functions. Although in-vitro techniques are more effective in detecting epigenetic alterations, they are time and money-intensive. Artificial intelligence-based in silico approaches have been used to overcome these obstacles. background: One of the most prevalent epigenetic alterations in all three kingdoms of life is 5mC, which plays a part in a wide range of biological functions. Although in-vitro techniques are more effective in detecting epigenetic alterations, they are time and money intensive. Artificial intelligence-based in silico approaches have been used to overcome these obstacles. Aim: This study aimed to develop an ML-based predictor for the detection of 5mC sites in Poaceae. Objective: The objective of this study was the evaluation of machine learning and deep learning models for the prediction of 5mC sites in rice. Method: In this study, the vectorization of DNA sequences has been performed using three distinct feature sets- Oligo Nucleotide Frequencies (k = 2), Mono-nucleotide Binary Encoding, and Chemical Properties of Nucleotides. Two deep learning models, long short-term memory (LSTM) and Bidirectional LSTM (Bi-LSTM), as well as nine machine learning models, including random forest, gradient boosting, naïve bayes, regression tree, k-Nearest neighbour, support vector machine, adaboost, multiple logistic regression, and artificial neural network, were investigated. Also, bootstrap resampling was used to build more efficient models along with a hybrid feature selection module for dimensional reduction and removal of irrelevant features of the vector space. Result: Random Forest gains the maximum accuracy, specificity and MCC, i.e., 92.6%, 86.41% and 0.84. Gradient Boosting obtained the maximum sensitivity, i.e., 96.85%. The Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS) technique showed that the best three models were Random Forest, Gradient Boosting, and Support Vector Machine in terms of accurate prediction of 5mC sites in rice. We developed an R-package, ‘GB5mCPred,’ and it is available in CRAN (https://cran.r-project.org/web/packages/GB5mcPred/index.html). Also, a user-friendly prediction server was made based on this algorithm (http://cabgrid.res.in:5474/). Conclusion: With nearly equal TOPSIS scores, Random Forest, Gradient Boosting, and Support Vector Machine ended up being the best three models. The major rationale may be found in their architectural design since they are gradual learning models that can capture the 5mC sites more correctly than other learning models.","PeriodicalId":10801,"journal":{"name":"Current Bioinformatics","volume":null,"pages":null},"PeriodicalIF":4.0,"publicationDate":"2024-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140583275","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DiffSeqMol: A Non-Autoregressive Diffusion-Based Approach for Molecular Sequence Generation and Optimization DiffSeqMol:基于非自回归扩散的分子序列生成和优化方法
IF 4 3区 生物学
Current Bioinformatics Pub Date : 2024-04-03 DOI: 10.2174/0115748936285493240307071916
Zixu Wang, Yangyang Chen, Xiulan Guo, Yayang Li, Pengyong Li, Chunyan Li, Xiucai Ye, Tetsuya Sakurai
{"title":"DiffSeqMol: A Non-Autoregressive Diffusion-Based Approach for Molecular Sequence Generation and Optimization","authors":"Zixu Wang, Yangyang Chen, Xiulan Guo, Yayang Li, Pengyong Li, Chunyan Li, Xiucai Ye, Tetsuya Sakurai","doi":"10.2174/0115748936285493240307071916","DOIUrl":"https://doi.org/10.2174/0115748936285493240307071916","url":null,"abstract":"Background: The application of deep generative models for molecular discovery has witnessed a significant surge in recent years. Currently, the field of molecular generation and molecular optimization is predominantly governed by autoregressive models regardless of how molecular data is represented. However, an emerging paradigm in the generation domain is diffusion models, which treat data non-autoregressively and have achieved significant breakthroughs in areas such as image generation. Methods: The potential and capability of diffusion models in molecular generation and optimization tasks remain largely unexplored. In order to investigate the potential applicability of diffusion models in the domain of molecular exploration, we proposed DiffSeqMol, a molecular sequence generation model, underpinned by diffusion process. Results & Discussion: DiffSeqMol distinguishes itself from traditional autoregressive methods by its capacity to draw samples from random noise and direct generating the entire molecule. Through experiment evaluations, we demonstrated that DiffSeqMol can achieve, even surpass, the performance of established state-of-the-art models on unconditional generation tasks and molecular optimization tasks. Conclusion: Taken together, our results show that DiffSeqMol can be considered a promising molecular generation method. It opens new pathways to traverse the expansive chemical space and to discover novel molecules.","PeriodicalId":10801,"journal":{"name":"Current Bioinformatics","volume":null,"pages":null},"PeriodicalIF":4.0,"publicationDate":"2024-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140582953","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信