Hui-Ling Huang, Chong-Heng Weng, Torbjörn E M Nordling, Yi-Fan Liou
{"title":"ThermalProGAN: A sequence-based thermally stable protein generator trained using unpaired data.","authors":"Hui-Ling Huang, Chong-Heng Weng, Torbjörn E M Nordling, Yi-Fan Liou","doi":"10.1142/S0219720023500087","DOIUrl":"https://doi.org/10.1142/S0219720023500087","url":null,"abstract":"<p><strong>Motivation: </strong>The synthesis of proteins with novel desired properties is challenging but sought after by the industry and academia. The dominating approach is based on trial-and-error inducing point mutations, assisted by structural information or predictive models built with paired data that are difficult to collect. This study proposes a sequence-based unpaired-sample of novel protein inventor (SUNI) to build ThermalProGAN for generating thermally stable proteins based on sequence information.</p><p><strong>Results: </strong>The ThermalProGAN can strongly mutate the input sequence with a median number of 32 residues. A known normal protein, 1RG0, was used to generate a thermally stable form by mutating 51 residues. After superimposing the two structures, high similarity is shown, indicating that the basic function would be conserved. Eighty four molecular dynamics simulation results of 1RG0 and the COVID-19 vaccine candidates with a total simulation time of 840[Formula: see text]ns indicate that the thermal stability increased.</p><p><strong>Conclusion: </strong>This proof of concept demonstrated that transfer of a desired protein property from one set of proteins is feasible. <b>Availability and implementation:</b> The source code of ThermalProGAN can be freely accessed at https://github.com/markliou/ThermalProGAN/ with an MIT license. The website is https://thermalprogan.markliou.tw:433. <b>Supplementary information:</b> Supplementary data are available on Github.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":"21 1","pages":"2350008"},"PeriodicalIF":1.0,"publicationDate":"2023-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9466541","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Evaluating network-based missing protein prediction using <i>p</i>-values, Bayes Factors, and probabilities.","authors":"Wilson Wen Bin Goh, Weijia Kong, Limsoon Wong","doi":"10.1142/S0219720023500051","DOIUrl":"https://doi.org/10.1142/S0219720023500051","url":null,"abstract":"<p><p>Some prediction methods use probability to rank their predictions, while some other prediction methods do not rank their predictions and instead use [Formula: see text]-values to support their predictions. This disparity renders direct cross-comparison of these two kinds of methods difficult. In particular, approaches such as the Bayes Factor upper Bound (BFB) for [Formula: see text]-value conversion may not make correct assumptions for this kind of cross-comparisons. Here, using a well-established case study on renal cancer proteomics and in the context of missing protein prediction, we demonstrate how to compare these two kinds of prediction methods using two different strategies. The first strategy is based on false discovery rate (FDR) estimation, which does not make the same naïve assumptions as BFB conversions. The second strategy is a powerful approach which we colloquially call \"home ground testing\". Both strategies perform better than BFB conversions. Thus, we recommend comparing prediction methods by standardization to a common performance benchmark such as a global FDR. And where this is not possible, we recommend reciprocal \"home ground testing\".</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":"21 1","pages":"2350005"},"PeriodicalIF":1.0,"publicationDate":"2023-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9474482","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xin Huang, Benzhe Su, Xingyu Wang, Yang Zhou, Xinyu He, Bing Liu
{"title":"A network-based dynamic criterion for identifying prediction and early diagnosis biomarkers of complex diseases.","authors":"Xin Huang, Benzhe Su, Xingyu Wang, Yang Zhou, Xinyu He, Bing Liu","doi":"10.1142/S0219720022500275","DOIUrl":"https://doi.org/10.1142/S0219720022500275","url":null,"abstract":"<p><p>Lung adenocarcinoma (LUAD) seriously threatens human health and generally results from dysfunction of relevant module molecules, which dynamically change with time and conditions, rather than that of an individual molecule. In this study, a novel network construction algorithm for identifying early warning network signals (IEWNS) is proposed for improving the performance of LUAD early diagnosis. To this end, we theoretically derived a dynamic criterion, namely, the relationship of variation (RV), to construct dynamic networks. RV infers correlation [Formula: see text] statistics to measure dynamic changes in molecular relationships during the process of disease development. Based on the dynamic networks constructed by IEWNS, network warning signals used to represent the occurrence of LUAD deterioration can be defined without human intervention. IEWNS was employed to perform a comprehensive analysis of gene expression profiles of LUAD from The Cancer Genome Atlas (TCGA) database and the Gene Expression Omnibus (GEO) database. The experimental results suggest that the potential biomarkers selected by IEWNS can facilitate a better understanding of pathogenetic mechanisms and help to achieve effective early diagnosis of LUAD. In conclusion, IEWNS provides novel insight into the initiation and progression of LUAD and helps to define prospective biomarkers for assessing disease deterioration.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":"20 6","pages":"2250027"},"PeriodicalIF":1.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9471022","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Author Index Volume 20 (2022).","authors":"","doi":"10.1142/S0219720022990013","DOIUrl":"https://doi.org/10.1142/S0219720022990013","url":null,"abstract":"","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":"20 6","pages":"2299001"},"PeriodicalIF":1.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10505287","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Adriana Laura López Lobato, Martha Lorena Avendaño Garrido, Héctor Gabriel Acosta Mesa, Clara Luz Sampieri, Víctor Hugo Sandoval Lozano
{"title":"Quantification of the presence of enzymes in gelatin zymography using the Gini index.","authors":"Adriana Laura López Lobato, Martha Lorena Avendaño Garrido, Héctor Gabriel Acosta Mesa, Clara Luz Sampieri, Víctor Hugo Sandoval Lozano","doi":"10.1142/S0219720022500251","DOIUrl":"https://doi.org/10.1142/S0219720022500251","url":null,"abstract":"<p><p>Gel zymography quantifies the activity of certain enzymes in tumor processes. These enzymes are widely used in medical diagnosis. In order to analyze them, experts classify the zymography spots into various classes according to their tonalities. This classification is done by visual analysis, which is what makes it a subjective process. This work proposes a methodology to carry out this classifications with a process that involves an unsupervised learning algorithm in the images, denoted as the GI algorithm. With the experiments shown in this paper, this methodology could constitute a tool that bioinformatics scientists can trust to perform the desired classification since it is a quantitative indicator to order the enzymatic activity of the spots in a zymography.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":"20 6","pages":"2250025"},"PeriodicalIF":1.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9118622","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Author Index Volume 20 (2022).","authors":"","doi":"10.1142/s0219749922990015","DOIUrl":"https://doi.org/10.1142/s0219749922990015","url":null,"abstract":"","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":"20 6 1","pages":"2299001"},"PeriodicalIF":1.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"63928439","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kano Hasegawa, Yoshitaka Moriwaki, Tohru Terada, Cao Wei, Kentaro Shimizu
{"title":"Feedback-AVPGAN: Feedback-guided generative adversarial network for generating antiviral peptides.","authors":"Kano Hasegawa, Yoshitaka Moriwaki, Tohru Terada, Cao Wei, Kentaro Shimizu","doi":"10.1142/S0219720022500263","DOIUrl":"https://doi.org/10.1142/S0219720022500263","url":null,"abstract":"<p><p>In this study, we propose <i>Feedback-AVPGAN</i>, a system that aims to computationally generate novel antiviral peptides (AVPs). This system relies on the key premise of the Generative Adversarial Network (GAN) model and the Feedback method. GAN, a generative modeling approach that uses deep learning methods, comprises a generator and a discriminator. The generator is used to generate peptides; the generated proteins are fed to the discriminator to distinguish between the AVPs and non-AVPs. The original GAN design uses actual data to train the discriminator. However, not many AVPs have been experimentally obtained. To solve this problem, we used the Feedback method to allow the discriminator to learn from the existing as well as generated synthetic data. We implemented this method using a classifier module that classifies each peptide sequence generated by the GAN generator as AVP or non-AVP. The classifier uses the transformer network and achieves high classification accuracy. This mechanism enables the efficient generation of peptides with a high probability of exhibiting antiviral activity. Using the Feedback method, we evaluated various algorithms and their performance. Moreover, we modeled the structure of the generated peptides using AlphaFold2 and determined the peptides having similar physicochemical properties and structures to those of known AVPs, although with different sequences.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":"20 6","pages":"2250026"},"PeriodicalIF":1.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9118189","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Accounting for treatment during the development or validation of prediction models.","authors":"Wei Xin Chan, Limsoon Wong","doi":"10.1142/S0219720022710019","DOIUrl":"https://doi.org/10.1142/S0219720022710019","url":null,"abstract":"Clinical prediction models are widely used to predict adverse outcomes in patients, and are often employed to guide clinical decision-making. Clinical data typically consist of patients who received different treatments. Many prediction modeling studies fail to account for differences in patient treatment appropriately, which results in the development of prediction models that show poor accuracy and generalizability. In this paper, we list the most common methods used to handle patient treatments and discuss certain caveats associated with each method. We believe that proper handling of differences in patient treatment is crucial for the development of accurate and generalizable models. As different treatment strategies are employed for different diseases, the best approach to properly handle differences in patient treatment is specific to each individual situation. We use the Ma-Spore acute lymphoblastic leukemia data set as a case study to demonstrate the complexities associated with differences in patient treatment, and offer suggestions on incorporating treatment information during evaluation of prediction models. In clinical data, patients are typically treated on a case by case basis, with unique cases occurring more frequently than expected. Hence, there are many subtleties to consider during the analysis and evaluation of clinical prediction models.","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":"20 6","pages":"2271001"},"PeriodicalIF":1.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10523629","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wajid Arshad Abbasi, Asma Anjam, Sadia Khalil, Saiqa Andleeb, Maryum Bibi, Syed Ali Abbas
{"title":"COYOTE: Sequence-derived structural descriptors-based computational identification of glycoproteins.","authors":"Wajid Arshad Abbasi, Asma Anjam, Sadia Khalil, Saiqa Andleeb, Maryum Bibi, Syed Ali Abbas","doi":"10.1142/S0219720022500196","DOIUrl":"https://doi.org/10.1142/S0219720022500196","url":null,"abstract":"<p><p>Glycoproteins play an important and ubiquitous role in many biological processes such as protein folding, cell-to-cell signaling, invading microorganism infection, tumor metastasis, and leukocyte trafficking. The key mechanism of glycoproteins must be revealed to model and refine glycosylated protein recognition, which will eventually assist in the design and discovery of carbohydrate-derived therapeutics. Experimental procedures involving wet-lab experiments to reveal glycoproteins are very time-consuming, laborious, and highly costly. However, costly and tedious experimental procedures can be assisted by ranking the most probable glycoproteins through computational methods with improved accuracy. In this study, we have proposed a novel machine learning-based predictive model for glycoproteins identification. Our proposed model is based on sequence-derived structural descriptors (SDSD) that fill the gap of unavailability of protein 3D structures and lack of accuracy in sequence information alone. Through a series of simulation studies, we have shown that our proposed model gives state-of-the-art generalization performance verified through various machine learning-centric and biologically relevant techniques and metrics. Through data mining in this study, we have also identified the role of descriptors in determining glycoproteins. Python-based standalone code together with a webserver implementation of our proposed model (COYOTE: identifiCation Of glYcoprOteins Through sEquences) is available at the URL: https://sites.google.com/view/wajidarshad/software.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":" ","pages":"2250019"},"PeriodicalIF":1.0,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"33464194","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"GCMCDTI: Graph convolutional autoencoder framework for predicting drug-target interactions based on matrix completion.","authors":"Jing Li, Chen Zhang, Zhengwei Li, Ru Nie, Pengyong Han, Wenjia Yang, Hongmei Liao","doi":"10.1142/S0219720022500238","DOIUrl":"https://doi.org/10.1142/S0219720022500238","url":null,"abstract":"<p><p>Identification of potential drug-target interactions (DTIs) plays a pivotal role in the development of drug and target discovery in the public healthcare sector. However, biological experiments for predicting interactions between drugs and targets are still expensive, complicated, and time-consuming. Thus, computational methods are widely applied for aiding drug-target interaction prediction. In this paper, we propose a novel model, named GCMCDTI, for DTIs prediction which adopts a graph convolutional network based on matrix completion. We regard the association prediction between drugs and targets as link prediction and treat the process as matrix completion, and then a graph convolutional auto-encoder framework is employed to construct the drug and target embeddings. Then, a bilinear decoder is applied to reconstruct the DTI matrix. We conduct our experiments on four benchmark datasets consisting of enzymes, G protein-coupled receptors (GPCRs), ion channels, and nuclear receptors. The five-fold cross-validation results achieve the high average AUC values of 95.78%, 95.31%, 93.90%, and 91.77%, respectively. To further evaluate our method, we compare our proposed method with other state-of-the-art approaches. The comparison results illustrate that our proposed method obtains improvement in performance on DTI prediction. The proposed method will be a good choice in the field of DTI prediction.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":" ","pages":"2250023"},"PeriodicalIF":1.0,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"40673948","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}