{"title":"Optimized Hybrid Deep Learning for Real-Time Pandemic Data Forecasting: Long and Short-Term Perspectives","authors":"Sujata Dash, Sourav Kumar Giri, Subhendu Kumar Pani, Saurav Mallik, Mingqiang Wang, Hong Qin","doi":"10.2174/0115748936257412231120113648","DOIUrl":"https://doi.org/10.2174/0115748936257412231120113648","url":null,"abstract":"Background:: With new variants of COVID-19 causing challenges, we need to focus on integrating multiple deep-learning frameworks to develop intelligent healthcare systems for early detection and diagnosis. Objective:: This article suggests three hybrid deep learning models, namely CNN-LSTM, CNN-Bi- LSTM, and CNN-GRU, to address the pressing need for an intelligent healthcare system. These models are designed to capture spatial and temporal patterns in COVID-19 data, thereby improving the accuracy and timeliness of predictions. An output forecasting framework integrates these models, and an optimization algorithm automatically selects the hyperparameters for the 13 baselines and the three proposed hybrid models. Methods:: Real-time time series data from the five most affected countries were used to test the effectiveness of the proposed models. Baseline models were compared, and optimization algorithms were employed to improve forecasting capabilities. Results:: CNN-GRU and CNN-LSTM are the top short- and long-term forecasting models. CNNGRU had the best performance with the lowest SMAPE and MAPE values for long-term forecasting in India at 3.07% and 3.17%, respectively, and impressive results for short-term forecasting with SMAPE and MAPE values of 1.46% and 1.47%. Conclusion:: Hybrid deep learning models, like CNN-GRU, can aid in early COVID-19 assessment and diagnosis. They detect patterns in data for effective governmental strategies and forecasting. This helps manage and mitigate the pandemic faster and more accurately.","PeriodicalId":10801,"journal":{"name":"Current Bioinformatics","volume":null,"pages":null},"PeriodicalIF":4.0,"publicationDate":"2023-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138556751","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Identifying Pathological Myopia Associated Genes with A Random Walk- Based Method in Protein-Protein Interaction Network","authors":"Jiyu Zhang, Tao Huang, Qiao Sun, Jian Zhang","doi":"10.2174/0115748936268218231114070754","DOIUrl":"https://doi.org/10.2174/0115748936268218231114070754","url":null,"abstract":"Background:: Pathological myopia, a severe variant of myopia, extends beyond the typical refractive error associated with nearsightedness. While the condition has a strong genetic component, the intricate mechanisms of inheritance remain elusive. Some genes have been associated with the development of pathological myopia, but their exact roles are not fully understood. Objective:: This study aimed to identify novel genes associated with pathological myopia Methods:: Our study leveraged DisGeNET to identify 184 genes linked with high myopia and 39 genes related to degenerative myopia. To uncover additional pathological myopia-associated genes, we employed the random walk with restart algorithm to investigate the protein-protein interactions network. We used the previously identified 184 high myopia and 39 degenerative myopia genes as seed nodes. Results:: Through subsequent screening tests, we discarded genes with weak associations, yielding 103 new genes for high myopia and 33 for degenerative myopia. Conclusion:: We confirmed the association of certain genes, including six genes that were confirmed to be associated with both high and degenerative myopia. The newly discovered genes are helpful to uncover and understand the pathogenesis of myopia.","PeriodicalId":10801,"journal":{"name":"Current Bioinformatics","volume":null,"pages":null},"PeriodicalIF":4.0,"publicationDate":"2023-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138555910","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Discovering Microbe-disease Associations with Weighted Graph Convolution Networks and Taxonomy Common Tree","authors":"Jieqi Xing, Yu Shi, Xiaoquan Su, Shunyao Wu","doi":"10.2174/0115748936270441231116093650","DOIUrl":"https://doi.org/10.2174/0115748936270441231116093650","url":null,"abstract":"Background:: Microbe-disease associations are integral to understanding complex dis-eases and their screening procedures. Objective:: While numerous computational methods have been developed to detect these associa-tions, their performance remains limited due to inadequate utilization of weighted inherent similari-ties and microbial taxonomy hierarchy. To address this limitation, we have introduced WTHMDA (weighted taxonomic heterogeneous network-based microbe-disease association), a novel deep learning framework. Methods:: WTHMDA combines a weighted graph convolution network and the microbial taxono-my common tree to predict microbe-disease associations effectively. The framework extracts mul-tiple microbe similarities from the taxonomy common tree, facilitating the construction of a mi-crobe-disease heterogeneous interaction network. Utilizing a weighted DeepWalk algorithm, node embeddings in the network incorporate weight information from the similarities. Subsequently, a deep neural network (DNN) model accurately predicts microbe-disease associations based on this interaction network. Results:: Extensive experiments on multiple datasets and case studies demonstrate WTHMDA's su-periority over existing approaches, particularly in predicting unknown associations. Conclusion:: Our proposed method offers a new strategy for discovering microbe-disease linkages, showcasing remarkable performance and enhancing the feasibility of identifying disease risk.","PeriodicalId":10801,"journal":{"name":"Current Bioinformatics","volume":null,"pages":null},"PeriodicalIF":4.0,"publicationDate":"2023-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138514996","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Metabolomics: Recent Advances and Future Prospects Unveiled","authors":"Shweta Sharma, Garima Singh, Mymoona Akhter","doi":"10.2174/0115748936270744231115110329","DOIUrl":"https://doi.org/10.2174/0115748936270744231115110329","url":null,"abstract":": In the era of genomics, fueled by advanced technologies and analytical tools, metabo-lomics has become a vital component in biomedical research. Its significance spans various do-mains, encompassing biomarker identification, uncovering underlying mechanisms and pathways, as well as the exploration of new drug targets and precision medicine. This article presents a com-prehensive overview of the latest developments in metabolomics techniques, emphasizing their wide-ranging applications across diverse research fields and underscoring their immense potential for future advancements.","PeriodicalId":10801,"journal":{"name":"Current Bioinformatics","volume":null,"pages":null},"PeriodicalIF":4.0,"publicationDate":"2023-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138514997","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ying Liang, Suhui Li, Xiya You, You Guo, Jianjun Tang
{"title":"Stacking-Kcr: A Stacking Model for Predicting the Crotonylation Sites of Lysine by Fusing Serial and Automatic Encoder","authors":"Ying Liang, Suhui Li, Xiya You, You Guo, Jianjun Tang","doi":"10.2174/0115748936272040231117114252","DOIUrl":"https://doi.org/10.2174/0115748936272040231117114252","url":null,"abstract":"Background:: Protein lysine crotonylation (Kcr), a newly discovered important post-translational modification (PTM), is typically localized at the transcription start site and regulates gene expression, which is associated with a variety of pathological conditions such as developmen-tal defects and malignant transformation. Objective:: Identifying Kcr sites is advantageous for the discovery of its biological mechanism and the development of new drugs for related diseases. However, traditional experimental methods for identifying Kcr sites are expensive and inefficient, necessitating the development of new computa-tional techniques. Methods:: In this work, to accurately identify Kcr sites, we propose a model for ensemble learning called Stacking-Kcr. Firstly, extract features from sequence information, physicochemical proper-ties, and sequence fragment similarity. Then, the two characteristics of sequence information and physicochemical properties are fused using automatic encoder and serial, respectively. Finally, the fused two features and sequence fragment similarity features are then respectively input into the four base classifiers, a meta classifier is constructed using the first level prediction results, and the final forecasting results are obtained. method: In this work, to accurately identify Kcr sites, we propose a model for ensemble learning called Stacking-Kcr. Firstly, extract features from sequence information, physicochemical properties, and sequence fragment similarity. Then, the two characteristics of sequence information and physicochemical properties are fused using automatic encoder and serial, respectively. Finally, the fused two features and sequence fragment similarity features are then respectively input into the four base classifiers, a meta classifier is constructed using the first level prediction results, and the final forecasting results are obtained. Results:: The five-fold cross-validation of this model has achieved an accuracy of 0.828 and an AUC of 0.910. This shows that the Stacking-Kcr method has obvious advantages over traditional machine learning methods. On independent test sets, Stacking-Kcr achieved an accuracy of 84.89% and an AUC of 92.21%, which was higher than 1.7% and 0.8% of other state-of-the-art tools. Addi-tionally, we trained Stacking-Kcr on the phosphorylation site, and the result is superior to the cur-rent model. Conclusion:: These outcomes are additional evidence that Stacking-Kcr has strong application po-tential and generalization performance.","PeriodicalId":10801,"journal":{"name":"Current Bioinformatics","volume":null,"pages":null},"PeriodicalIF":4.0,"publicationDate":"2023-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138515001","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Muhammad Shujaat, Hilal Tayara, Sunggoo Yoo, Kil To Chong
{"title":"iProm-Yeast: Prediction Tool for Yeast Promoters Based on ML Stacking","authors":"Muhammad Shujaat, Hilal Tayara, Sunggoo Yoo, Kil To Chong","doi":"10.2174/0115748936256869231019113616","DOIUrl":"https://doi.org/10.2174/0115748936256869231019113616","url":null,"abstract":"Background and Objective:: Gene promoters play a crucial role in regulating gene transcription by serving as DNA regulatory elements near transcription start sites. Despite numerous approaches, including alignment signal and content-based methods for promoter prediction, accurately identifying promoters remains challenging due to the lack of explicit features in their sequences. Consequently, many machine learning and deep learning models for promoter identification have been presented, but the performance of these tools is not precise. Most recent investigations have concentrated on identifying sigma or plant promoters. While the accurate identification of Saccharomyces cerevisiae promoters remains an underexplored area. In this study, we introduced “iPromyeast”, a method for identifying yeast promoters. Using genome sequences from the eukaryotic yeast Saccharomyces cerevisiae, we investigate vector encoding and promoter classification. Additionally, we developed a more difficult negative set by employing promoter sequences rather than nonpromoter regions of the genome. The newly developed negative reconstruction approach improves classification and minimizes the amount of false positive predictions. Methods:: To overcome the problems associated with promoter prediction, we investigate alternate vector encoding and feature extraction methodologies. Following that, these strategies are coupled with several machine learning algorithms and a 1-D convolutional neural network model. Our results show that the pseudo-dinucleotide composition is preferable for feature encoding and that the machine- learning stacking approach is excellent for accurate promoter categorization. Furthermore, we provide a negative reconstruction method that uses promoter sequences rather than non-promoter regions, resulting in higher classification performance and fewer false positive predictions. Results:: Based on the results of 5-fold cross-validation, the proposed predictor, iProm-Yeast, has a good potential for detecting Saccharomyces cerevisiae promoters. The accuracy (Acc) was 86.27%, the sensitivity (Sn) was 82.29%, the specificity (Sp) was 89.47%, the Matthews correlation coefficient (MCC) was 0.72, and the area under the receiver operating characteristic curve (AUROC) was 0.98. We also performed a cross-species analysis to determine the generalizability of iProm-Yeast across other species. Conclusion:: iProm-Yeast is a robust method for accurately identifying Saccharomyces cerevisiae promoters. With advanced vector encoding techniques and a negative reconstruction approach, it achieves improved classification accuracy and reduces false positive predictions. In addition, it offers researchers a reliable and precise webserver to study gene regulation in diverse organisms.","PeriodicalId":10801,"journal":{"name":"Current Bioinformatics","volume":null,"pages":null},"PeriodicalIF":4.0,"publicationDate":"2023-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138515003","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Prediction of Super-enhancers Based on Mean-shift Undersampling","authors":"Han Cheng, Shumei Ding, Cangzhi Jia","doi":"10.2174/0115748936268302231110111456","DOIUrl":"https://doi.org/10.2174/0115748936268302231110111456","url":null,"abstract":"Background:: Super-enhancers are clusters of enhancers defined based on the binding occupancy of master transcription factors, chromatin regulators, or chromatin marks. It has been reported that super-enhancers are transcriptionally more active and cell-type-specific than regular enhancers. Therefore, it is necessary to identify super-enhancers from regular enhancers. A variety of computational methods have been proposed to identify super-enhancers as auxiliary tools. However, most methods use ChIP-seq data, and the lack of this part of the data will make the predictor unable to execute or fail to achieve satisfactory performance. Objective:: The aim of this study is to propose a stacking computational model based on the fusion of multiple features to identify super-enhancers in both human and mouse species. Methods:: This work adopted mean-shift to cluster majority class samples and selected five sets of balanced datasets for mouse and three sets of balanced datasets for humans to train the stacking model. Five types of sequence information are used as input to the XGBoost classifier, and the average value of the probability outputs from each classifier is designed as the final classification result. Results:: The results of 10-fold cross-validation and cross-cell-line validation prove that our method has superior performance compared to other existing methods. The source code and datasets are available at https://github.com/Cheng-Han-max/SE_voting. Conclusion:: The analysis of feature importance indicates that Mismatch accounts for the highest proportion among the top 20 important features.","PeriodicalId":10801,"journal":{"name":"Current Bioinformatics","volume":null,"pages":null},"PeriodicalIF":4.0,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138515002","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SCV Filter: A Hybrid Deep Learning Model for SARS-CoV-2 Variants Classification","authors":"Han Wang, Jingyang Gao","doi":"10.2174/1574893618666230809121509","DOIUrl":"https://doi.org/10.2174/1574893618666230809121509","url":null,"abstract":"Background: The high mutability of severe acute respiratory syndrome coronavirus 2(SARS-CoV-2) makes it easy for mutations to occur during transmission. As the epidemic continues to develop, several mutated strains have been produced. Researchers worldwide are working on the effective identification of SARS-CoV-2. Objective: In this paper, we propose a new deep learning method that can effectively identify SARSCoV- 2 Variant sequences, called SCVfilter, which is a deep hybrid model with embedding, attention residual network, and long short-term memory as components. Methods: Deep learning is effective in extracting rich features from sequence data, which has significant implications for the study of Coronavirus Disease 2019 (COVID-19), which has become prevalent in recent years. In this paper, we propose a new deep learning method that can effectively identify SARS-CoV-2 Variant sequences, called SCVfilter, which is a deep hybrid model with embedding, attention residual network, and long short-term memory as components. Results: The accuracy of the SCVfilter is 93.833% on Dataset-I consisting of different variant strains; 90.367% on Dataset-II consisting of data collected from China, Taiwan, and Hong Kong; and 79.701% on Dataset-III consisting of data collected from six continents (Africa, Asia, Europe, North America, Oceania, and South America). Conclusion: When using the SCV filter to process lengthy and high-homology SARS-CoV-2 data, it can automatically select features and accurately detect different variant strains of SARS-CoV-2. In addition, the SCV filter is sufficiently robust to handle the problems caused by sample imbalance and sequence incompleteness. Other: The SCVfilter is an open-source method available at https://github.com/deconvolutionw/ SCVfilter.","PeriodicalId":10801,"journal":{"name":"Current Bioinformatics","volume":null,"pages":null},"PeriodicalIF":4.0,"publicationDate":"2023-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138514985","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mauricio Arenas-Salinas, Jessica Lara Muñoz, José Antonio Reyes, Felipe Besoain
{"title":"Prediction of DNA-binding Sites in Transcriptions Factor in Fur-like Proteins Using Machine Learning and Molecular Descriptors","authors":"Mauricio Arenas-Salinas, Jessica Lara Muñoz, José Antonio Reyes, Felipe Besoain","doi":"10.2174/0115748936264122231016094702","DOIUrl":"https://doi.org/10.2174/0115748936264122231016094702","url":null,"abstract":"Introduction: Transcription factors are of great interest in biotechnology due to their key role in the regulation of gene expression. One of the most important transcription factors in gramnegative bacteria is Fur, a global regulator studied as a therapeutic target for the design of antibacterial agents. Its DNA-binding domain, which contains a helix-turn-helix motif, is one of its most relevant features. Methods: In this study, we evaluated several machine learning algorithms for the prediction of DNA-binding sites based on proteins from the Fur superfamily and other helix-turn-helix transcription factors, including Support-Vector Machines (SVM), Random Forest (RF), Decision Trees (DT), and Naive Bayes (NB). We also tested the efficacy of using several molecular descriptors derived from the amino acid sequence and the structure of the protein fragments that bind the DNA. A feature selection procedure was employed to select fewer descriptors in each case by maintaining a good classification performance. Results: The best results were obtained with the SVM model using twelve sequence-derived attributes and the DT model using nine structure-derived features, achieving 82% and 76% accuracy, respectively. Conclusion: The performance obtained indicates that the descriptors we used are relevant for predicting DNA-binding sites since they can discriminate between binding and non-binding regions of a protein.","PeriodicalId":10801,"journal":{"name":"Current Bioinformatics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136318847","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"NaProGraph: Network Analyzer for Interactions between Nucleic Acids and Proteins","authors":"Sajjad nematzadeh, Nizamettin Aydin, Zeyneb Kurt, Mahsa Torkamanian-Afshar","doi":"10.2174/0115748936266189231004110412","DOIUrl":"https://doi.org/10.2174/0115748936266189231004110412","url":null,"abstract":"abstract: Interactions of RNA and DNA with proteins are crucial for elucidating intracellular processes in living organisms, diagnosing disorders, designing aptamer drugs, and other applications. Therefore, investigating the relationships between these macromolecules is essential to life science research. This study proposes an online network provider tool (NaProGraph) that offers an intuitive and user-friendly interface for studying interactions between nucleic acids (NA) and proteins. NaProGraph utilizes a comprehensive and curated dataset encompassing nearly all interacting macromolecules in the Research Collaboratory for Structural Bioinformatics (RCSB) Protein Data Bank (PDB). Researchers can employ this online tool to focus on a specific portion of the PDB, investigate its associated relationships, and visualize and extract pertinent information. This tool provides insights into the frequency of atoms and residues between proteins and nucleic acids (NAs) and the similarity of the macromolecules' primary structures. Furthermore, the functional similarity of proteins can be inferred using protein families and clans from Pfam. The tool we have developed is publicly available at https://naprolink.com/NaProGraph/ background: Interactions between RNA and DNA with proteins are crucial for elucidating intracellular processes in living organisms, diagnosing disorders, designing aptamer drugs, and other applications. Therefore, investigating the relationships between these macromolecules is essential to life science research. method: This study proposes an online network provider tool (NaProGraph) that offers an intuitive and user-friendly interface for studying interactions between nucleic acids (NA) and proteins. NaProGraph utilizes a comprehensive and curated dataset encompassing nearly all interacting macromolecules in the Research Collaboratory for Structural Bioinformatics (RCSB) Protein Data Bank (PDB). Researchers can employ this online tool to focus on a specific portion of the PDB, investigate its associated relationships, and visualize and extract pertinent information. This tool provides insights into the frequency of atoms and residues between proteins and NAs and the similarity of the macromolecules' primary structures. Furthermore, the functional similarity of proteins can be inferred using protein families and clans from Pfam. conclusion: The NaProGraph tool serves as an effective online resource for researchers interested in studying interactions between nucleic acids and proteins. By leveraging a comprehensive dataset and providing various visualization and extraction capabilities, NaProGraph facilitates the exploration of macromolecular relationships and aids in understanding intracellular processes in living organisms. other: -","PeriodicalId":10801,"journal":{"name":"Current Bioinformatics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135666197","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}