{"title":"A pharmacokinetic model based on the SSA-1DCNN-Attention method.","authors":"Zi-Yi He, Jie-Yu Yang, Yong Li","doi":"10.1142/S021972002350004X","DOIUrl":"https://doi.org/10.1142/S021972002350004X","url":null,"abstract":"<p><p>To solve the problem of the lack of representativeness of the training set and the poor prediction accuracy due to the limited number of training samples when the machine learning method is used for the classification and prediction of pharmacokinetic indicators, this paper proposes a 1DCNN-Attention concentration prediction model optimized by the sparrow search algorithm (SSA). First, the SMOTE method is used to expand the small sample experimental data to make the data diverse and representative. Then a one-dimensional convolutional neural network (1DCNN) model is established, and the attention mechanism is introduced to calculate the weight of each variable for dividing the importance of each pharmacokinetic indicator by the output drug concentration. The SSA algorithm was used to optimize the parameters in the model to improve the prediction accuracy after data expansion. Taking the pharmacokinetic model of phenobarbital (PHB) combined with <i>Cynanchum otophyllum saponins</i> to treat epilepsy as an example, the concentration changes of PHB were predicted and the effectiveness of the method was verified. The results show that the proposed model has a better prediction effect than other methods.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":"21 1","pages":"2350004"},"PeriodicalIF":1.0,"publicationDate":"2023-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9473265","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"PTGAC Model: A machine learning approach for constructing phylogenetic tree to compare protein sequences.","authors":"Jayanta Pal, Sourav Saha, Bansibadan Maji, Dilip Kumar Bhattacharya","doi":"10.1142/S0219720022500287","DOIUrl":"https://doi.org/10.1142/S0219720022500287","url":null,"abstract":"<p><p>This work proposes a machine learning-based phylogenetic tree generation model based on agglomerative clustering (PTGAC) that compares protein sequences considering all known chemical properties of amino acids. The proposed model can serve as a suitable alternative to the Unweighted Pair Group Method with Arithmetic Mean (UPGMA), which is inherently time-consuming in nature. Initially, principal component analysis (PCA) is used in the proposed scheme to reduce the dimensions of 20 amino acids using seven known chemical characteristics, yielding 20 TP (Total Points) values for each amino acid. The approach of cumulative summing is then used to give a non-degenerate numeric representation of the sequences based on these 20 TP values. A special kind of three-component vector is proposed as a descriptor, which consists of a new type of non-central moment of orders one, two, and three. Subsequently, the proposed model uses Euclidean Distance measures among the descriptors to create a distance matrix. Finally, a phylogenetic tree is constructed using hierarchical agglomerative clustering based on the distance matrix. The results are compared with the UPGMA and other existing methods in terms of the quality and time of constructing the phylogenetic tree. Both qualitative and quantitative analysis are performed as key assessment criteria for analyzing the performance of the proposed model. The qualitative analysis of the phylogenetic tree is performed by considering rationalized perception, while the quantitative analysis is performed based on symmetric distance (SD). On both criteria, the results obtained by the proposed model are more satisfactory than those produced earlier on the same species by other methods. Notably, this method is found to be efficient in terms of both time and space requirements and is capable of dealing with protein sequences of varying lengths.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":"21 1","pages":"2250028"},"PeriodicalIF":1.0,"publicationDate":"2023-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9472273","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yonglin Zhang, Mei Hu, Qi Mo, Wenli Gan, Jiesi Luo
{"title":"A novel method for predicting DNA N<sup>4</sup>-methylcytosine sites based on deep forest algorithm.","authors":"Yonglin Zhang, Mei Hu, Qi Mo, Wenli Gan, Jiesi Luo","doi":"10.1142/S0219720023500038","DOIUrl":"https://doi.org/10.1142/S0219720023500038","url":null,"abstract":"<p><p>N<sup>4</sup>-methyladenosine (4mC) methylation is an essential epigenetic modification of deoxyribonucleic acid (DNA) that plays a key role in many biological processes such as gene expression, gene replication and transcriptional regulation. Genome-wide identification and analysis of the 4mC sites can better reveal the epigenetic mechanisms that regulate various biological processes. Although some high-throughput genomic experimental methods can effectively facilitate the identification in a genome-wide scale, they are still too expensive and laborious for routine use. Computational methods can compensate for these disadvantages, but they still leave much room for performance improvement. In this study, we develop a non-NN-style deep learning-based approach for accurately predicting 4mC sites from genomic DNA sequence. We generate various informative features represented sequence fragments around 4mC sites, and subsequently implement them into a deep forest (DF) model. After training the deep model using 10-fold cross-validation, the overall accuracies of 85.0%, 90.0%, and 87.8% were achieved for three representative model organisms, <i>A. thaliana, C. elegans</i>, and <i>D. melanogaster</i>, respectively. In addition, extensive experiment results show that our proposed approach outperforms other existing state-of-the-art predictors in the 4mC identification. Our approach stands for the first DF-based algorithm for the prediction of 4mC sites, providing a novel idea in this field.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":"21 1","pages":"2350003"},"PeriodicalIF":1.0,"publicationDate":"2023-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9474484","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"NuKit: A deep learning platform for fast nucleus segmentation of histopathological images.","authors":"Ching-Nung Lin, Christine H Chung, Aik Choon Tan","doi":"10.1142/S0219720023500026","DOIUrl":"https://doi.org/10.1142/S0219720023500026","url":null,"abstract":"<p><p>Nucleus segmentation represents the initial step for histopathological image analysis pipelines, and it remains a challenge in many quantitative analysis methods in terms of accuracy and speed. Recently, deep learning nucleus segmentation methods have demonstrated to outperform previous intensity- or pattern-based methods. However, the heavy computation of deep learning provides impression of lagging response in real time and hampered the adoptability of these models in routine research. We developed and implemented NuKit a deep learning platform, which accelerates nucleus segmentation and provides prompt results to the users. NuKit platform consists of two deep learning models coupled with an interactive graphical user interface (GUI) to provide fast and automatic nucleus segmentation \"on the fly\". Both deep learning models provide complementary tasks in nucleus segmentation. The whole image segmentation model performs whole image nucleus whereas the click segmentation model supplements the nucleus segmentation with user-driven input to edits the segmented nuclei. We trained the NuKit whole image segmentation model on a large public training data set and tested its performance in seven independent public image data sets. The whole image segmentation model achieves average [Formula: see text] and [Formula: see text]. The outputs could be exported into different file formats, as well as provides seamless integration with other image analysis tools such as QuPath. NuKit can be executed on Windows, Mac, and Linux using personal computers.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":"21 1","pages":"2350002"},"PeriodicalIF":1.0,"publicationDate":"2023-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/68/f9/nihms-1915365.PMC10362904.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9852066","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"RiRPSSP: A unified deep learning method for prediction of regular and irregular protein secondary structures.","authors":"Mukhtar Ahmad Sofi, M Arif Wani","doi":"10.1142/S0219720023500014","DOIUrl":"https://doi.org/10.1142/S0219720023500014","url":null,"abstract":"<p><p>Protein secondary structure prediction (PSSP) is an important and challenging task in protein bioinformatics. Protein secondary structures (SSs) are categorized in regular and irregular structure classes. Regular SSs, representing nearly 50% of amino acids consist of helices and sheets, whereas the remaining amino acids represent irregular SSs. [Formula: see text]-turns and [Formula: see text]-turns are the most abundant irregular SSs present in proteins. Existing methods are well developed for separate prediction of regular and irregular SSs. However, for more comprehensive PSSP, it is essential to develop a uniform model to predict all types of SSs simultaneously. In this work, using a novel dataset comprising dictionary of secondary structure of protein (DSSP)-based SSs and PROMOTIF-based [Formula: see text]-turns and [Formula: see text]-turns, we propose a unified deep learning model consisting of convolutional neural networks (CNNs) and long short-term memory networks (LSTMs) for simultaneous prediction of regular and irregular SSs. To the best of our knowledge, this is the first study in PSSP covering both regular and irregular structures. The protein sequences in our constructed datasets, RiR6069 and RiR513, have been borrowed from benchmark CB6133 and CB513 datasets, respectively. The results are indicative of increased PSSP accuracy.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":"21 1","pages":"2350001"},"PeriodicalIF":1.0,"publicationDate":"2023-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9474486","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Andrea Mae Añonuevo, Marineil Gomez, Lemmuel L Tayo
{"title":"<i>In silico de novo</i> drug design of a therapeutic peptide inhibitor against UBE2C in breast cancer.","authors":"Andrea Mae Añonuevo, Marineil Gomez, Lemmuel L Tayo","doi":"10.1142/S0219720022500299","DOIUrl":"https://doi.org/10.1142/S0219720022500299","url":null,"abstract":"<p><p>The World Health Organization (WHO) declared breast cancer (BC) as the most prevalent cancer in the world. With its prevalence and severity, there have been several breakthroughs in developing treatments for the disease. Targeted therapy treatments limit the damage done to healthy tissues. These targeted therapies are especially potent for luminal and HER-2 positive type breast cancer. However, for triple negative breast cancer (TNBC), the lack of defining biomarkers makes it hard to approach with targeted therapy methods. Protein-protein interactions (PPIs) have been studied as possible targets for drug action. However, small molecule drugs are not able to cover the entirety of the PPI binding interface. Peptides were found to be more suited to the large or flat PPI surfaces, in addition to their better pharmacokinetic properties. In this study, computational methods was used in order to verify whether peptide drug inhibitors are good drug candidates against the ubiquitin protein, UBE2C by conducting docking, MD and MMPBSA analyses. Results show that while the lead peptide, T20-M shows good potential as a peptide drug, its binding affinity towards UBE2C is not enough to overcome the natural UBE2C-ANAPC2 interaction. Further studies on modification of T20-M and the analysis of other peptide leads are recommended.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":"21 1","pages":"2250029"},"PeriodicalIF":1.0,"publicationDate":"2023-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9465490","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hui-Ling Huang, Chong-Heng Weng, Torbjörn E M Nordling, Yi-Fan Liou
{"title":"ThermalProGAN: A sequence-based thermally stable protein generator trained using unpaired data.","authors":"Hui-Ling Huang, Chong-Heng Weng, Torbjörn E M Nordling, Yi-Fan Liou","doi":"10.1142/S0219720023500087","DOIUrl":"https://doi.org/10.1142/S0219720023500087","url":null,"abstract":"<p><strong>Motivation: </strong>The synthesis of proteins with novel desired properties is challenging but sought after by the industry and academia. The dominating approach is based on trial-and-error inducing point mutations, assisted by structural information or predictive models built with paired data that are difficult to collect. This study proposes a sequence-based unpaired-sample of novel protein inventor (SUNI) to build ThermalProGAN for generating thermally stable proteins based on sequence information.</p><p><strong>Results: </strong>The ThermalProGAN can strongly mutate the input sequence with a median number of 32 residues. A known normal protein, 1RG0, was used to generate a thermally stable form by mutating 51 residues. After superimposing the two structures, high similarity is shown, indicating that the basic function would be conserved. Eighty four molecular dynamics simulation results of 1RG0 and the COVID-19 vaccine candidates with a total simulation time of 840[Formula: see text]ns indicate that the thermal stability increased.</p><p><strong>Conclusion: </strong>This proof of concept demonstrated that transfer of a desired protein property from one set of proteins is feasible. <b>Availability and implementation:</b> The source code of ThermalProGAN can be freely accessed at https://github.com/markliou/ThermalProGAN/ with an MIT license. The website is https://thermalprogan.markliou.tw:433. <b>Supplementary information:</b> Supplementary data are available on Github.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":"21 1","pages":"2350008"},"PeriodicalIF":1.0,"publicationDate":"2023-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9466541","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Evaluating network-based missing protein prediction using <i>p</i>-values, Bayes Factors, and probabilities.","authors":"Wilson Wen Bin Goh, Weijia Kong, Limsoon Wong","doi":"10.1142/S0219720023500051","DOIUrl":"https://doi.org/10.1142/S0219720023500051","url":null,"abstract":"<p><p>Some prediction methods use probability to rank their predictions, while some other prediction methods do not rank their predictions and instead use [Formula: see text]-values to support their predictions. This disparity renders direct cross-comparison of these two kinds of methods difficult. In particular, approaches such as the Bayes Factor upper Bound (BFB) for [Formula: see text]-value conversion may not make correct assumptions for this kind of cross-comparisons. Here, using a well-established case study on renal cancer proteomics and in the context of missing protein prediction, we demonstrate how to compare these two kinds of prediction methods using two different strategies. The first strategy is based on false discovery rate (FDR) estimation, which does not make the same naïve assumptions as BFB conversions. The second strategy is a powerful approach which we colloquially call \"home ground testing\". Both strategies perform better than BFB conversions. Thus, we recommend comparing prediction methods by standardization to a common performance benchmark such as a global FDR. And where this is not possible, we recommend reciprocal \"home ground testing\".</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":"21 1","pages":"2350005"},"PeriodicalIF":1.0,"publicationDate":"2023-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9474482","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xin Huang, Benzhe Su, Xingyu Wang, Yang Zhou, Xinyu He, Bing Liu
{"title":"A network-based dynamic criterion for identifying prediction and early diagnosis biomarkers of complex diseases.","authors":"Xin Huang, Benzhe Su, Xingyu Wang, Yang Zhou, Xinyu He, Bing Liu","doi":"10.1142/S0219720022500275","DOIUrl":"https://doi.org/10.1142/S0219720022500275","url":null,"abstract":"<p><p>Lung adenocarcinoma (LUAD) seriously threatens human health and generally results from dysfunction of relevant module molecules, which dynamically change with time and conditions, rather than that of an individual molecule. In this study, a novel network construction algorithm for identifying early warning network signals (IEWNS) is proposed for improving the performance of LUAD early diagnosis. To this end, we theoretically derived a dynamic criterion, namely, the relationship of variation (RV), to construct dynamic networks. RV infers correlation [Formula: see text] statistics to measure dynamic changes in molecular relationships during the process of disease development. Based on the dynamic networks constructed by IEWNS, network warning signals used to represent the occurrence of LUAD deterioration can be defined without human intervention. IEWNS was employed to perform a comprehensive analysis of gene expression profiles of LUAD from The Cancer Genome Atlas (TCGA) database and the Gene Expression Omnibus (GEO) database. The experimental results suggest that the potential biomarkers selected by IEWNS can facilitate a better understanding of pathogenetic mechanisms and help to achieve effective early diagnosis of LUAD. In conclusion, IEWNS provides novel insight into the initiation and progression of LUAD and helps to define prospective biomarkers for assessing disease deterioration.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":"20 6","pages":"2250027"},"PeriodicalIF":1.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9471022","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Author Index Volume 20 (2022).","authors":"","doi":"10.1142/S0219720022990013","DOIUrl":"https://doi.org/10.1142/S0219720022990013","url":null,"abstract":"","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":"20 6","pages":"2299001"},"PeriodicalIF":1.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10505287","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}