{"title":"Analysis to determine the effect of mutations on binding to small chemical molecules","authors":"T. Koshlan, K. Kulikov","doi":"10.1142/S0219720022400030","DOIUrl":"https://doi.org/10.1142/S0219720022400030","url":null,"abstract":"In this paper, the authors present and describe, in detail, an original software-implemented numerical methodology used to determine the effect of mutations on binding to small chemical molecules, on the example of gefitinib, AMPPNP, CO-1686, ASP8273, erlotinib binding with EGFR protein, and imatinib binding with PPARgamma. Furthermore, the developed numerical approach makes it possible to determine the stability of a molecular complex, which consists of a protein and a small chemical molecule. The description of the software package that implements the presented algorithm is given in the website: https://binomlabs.com/.","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":"1 1","pages":"2240003"},"PeriodicalIF":1.0,"publicationDate":"2022-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47358607","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Clinical drug response prediction from preclinical cancer cell lines by logistic matrix factorization approach.","authors":"Akram Emdadi, Changiz Eslahchi","doi":"10.1142/S0219720021500359","DOIUrl":"https://doi.org/10.1142/S0219720021500359","url":null,"abstract":"<p><p>Predicting tumor drug response using cancer cell line drug response values for a large number of anti-cancer drugs is a significant challenge in personalized medicine. Predicting patient response to drugs from data obtained from preclinical models is made easier by the availability of different knowledge on cell lines and drugs. This paper proposes the TCLMF method, a predictive model for predicting drug response in tumor samples that was trained on preclinical samples and is based on the logistic matrix factorization approach. The TCLMF model is designed based on gene expression profiles, tissue type information, the chemical structure of drugs and drug sensitivity (<i>IC</i> 50) data from cancer cell lines. We use preclinical data from the Genomics of Drug Sensitivity in Cancer dataset (GDSC) to train the proposed drug response model, which we then use to predict drug sensitivity of samples from the Cancer Genome Atlas (TCGA) dataset. The TCLMF approach focuses on identifying successful features of cell lines and drugs in order to calculate the probability of the tumor samples being sensitive to drugs. The closest cell line neighbours for each tumor sample are calculated using a description of similarity between tumor samples and cell lines in this study. The drug response for a new tumor is then calculated by averaging the low-rank features obtained from its neighboring cell lines. We compare the results of the TCLMF model with the results of the previously proposed methods using two databases and two approaches to test the model's performance. In the first approach, 12 drugs with enough known clinical drug response, considered in previous methods, are studied. For 7 drugs out of 12, the TCLMF can significantly distinguish between patients that are resistance to these drugs and the patients that are sensitive to them. These approaches are converted to classification models using a threshold in the second approach, and the results are compared. The results demonstrate that the TCLMF method provides accurate predictions across the results of the other algorithms. Finally, we accurately classify tumor tissue type using the latent vectors obtained from TCLMF's logistic matrix factorization process. These findings demonstrate that the TCLMF approach produces effective latent vectors for tumor samples. The source code of the TCLMF method is available in https://github.com/emdadi/TCLMF.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":"20 2","pages":"2150035"},"PeriodicalIF":1.0,"publicationDate":"2022-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39614910","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xuan Song, Hai Yun Gao, Karl Herrup, Ronald P Hart
{"title":"Optimized splitting of mixed-species RNA sequencing data.","authors":"Xuan Song, Hai Yun Gao, Karl Herrup, Ronald P Hart","doi":"10.1142/S0219720022500019","DOIUrl":"10.1142/S0219720022500019","url":null,"abstract":"<p><p>Gene expression studies using xenograft transplants or co-culture systems, usually with mixed human and mouse cells, have proven to be valuable to uncover cellular dynamics during development or in disease models. However, the mRNA sequence similarities among species presents a challenge for accurate transcript quantification. To identify optimal strategies for analyzing mixed-species RNA sequencing data, we evaluate both alignment-dependent and alignment-independent methods. Alignment of reads to a pooled reference index is effective, particularly if optimal alignments are used to classify sequencing reads by species, which are re-aligned with individual genomes, generating [Formula: see text] accuracy across a range of species ratios. Alignment-independent methods, such as convolutional neural networks, which extract the conserved patterns of sequences from two species, classify RNA sequencing reads with over 85% accuracy. Importantly, both methods perform well with different ratios of human and mouse reads. While non-alignment strategies successfully partitioned reads by species, a more traditional approach of mixed-genome alignment followed by optimized separation of reads proved to be the more successful with lower error rates.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":"20 2","pages":"2250001"},"PeriodicalIF":1.0,"publicationDate":"2022-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9081140/pdf/nihms-1770823.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39792860","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A protein succinylation sites prediction method based on the hybrid architecture of LSTM network and CNN.","authors":"Die Zhang, Shunfang Wang","doi":"10.1142/S0219720022500032","DOIUrl":"https://doi.org/10.1142/S0219720022500032","url":null,"abstract":"<p><p>The succinylation modification of protein participates in the regulation of a variety of cellular processes. Identification of modified substrates with precise sites is the basis for understanding the molecular mechanism and regulation of succinylation. In this work, we picked and chose five superior feature codes: CKSAAP, ACF, BLOSUM62, AAindex, and one-hot, according to their performance in the problem of succinylation sites prediction. Then, LSTM network and CNN were used to construct four models: LSTM-CNN, CNN-LSTM, LSTM, and CNN. The five selected features were, respectively, input into each of these four models for training to compare the four models. Based on the performance of each model, the optimal model among them was chosen to construct a hybrid model DeepSucc that was composed of five sub-modules for integrating heterogeneous information. Under the 10-fold cross-validation, the hybrid model DeepSucc achieves 86.26% accuracy, 84.94% specificity, 87.57% sensitivity, 0.9406 AUC, and 0.7254 MCC. When compared with other prediction tools using an independent test set, DeepSucc outperformed them in sensitivity and MCC. The datasets and source codes can be accessed at https://github.com/1835174863zd/DeepSucc.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":"20 2","pages":"2250003"},"PeriodicalIF":1.0,"publicationDate":"2022-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39942705","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Tensor decomposition based on the potential low-rank and <i>p</i>-shrinkage generalized threshold algorithm for analyzing cancer multiomics data.","authors":"Hang-Jin Yang, Yu-Xia Lei, Juan Wang, Xiang-Zhen Kong, Jin-Xing Liu, Ying-Lian Gao","doi":"10.1142/S0219720022500020","DOIUrl":"https://doi.org/10.1142/S0219720022500020","url":null,"abstract":"<p><p>Tensor Robust Principal Component Analysis (TRPCA) has achieved promising results in the analysis of genomics data. However, the TRPCA model under the existing tensor singular value decomposition ([Formula: see text]-SVD) framework insufficiently extracts the potential low-rank structure of the data, resulting in suboptimal restored components. Simultaneously, the tensor nuclear norm (TNN) defined based on [Formula: see text]-SVD uses the same standard to handle various singular values. TNN ignores the difference of singular values, leading to the failure of the main information that needs to be well preserved. To preserve the heterogeneous structure in the low-rank information, we propose a novel TNN and extend it to the TRPCA model. Potential low-rank space may contain important information. We learn the low-rank structural information from the core tensor. The singular value space contains the association information between genes and cancers. The [Formula: see text]-shrinkage generalized threshold function is utilized to preserve the low-rank properties of larger singular values. The optimization problem is solved by the alternating direction method of the multiplier (ADMM) algorithm. Clustering and feature selection experiments are performed on the TCGA data set. The experimental results show that the proposed model is more promising than other state-of-the-art tensor decomposition methods.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":"20 2","pages":"2250002"},"PeriodicalIF":1.0,"publicationDate":"2022-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39942706","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
P. Song, Sheng Zhou, Xiaoyang Qi, Y. Jiao, Y. Gong, Jie Zhao, Haojun Yang, Z. Qian, J. Qian, Liming Tang
{"title":"RNA modification writers influence tumor microenvironment in gastric cancer and prospects of targeted drug therapy","authors":"P. Song, Sheng Zhou, Xiaoyang Qi, Y. Jiao, Y. Gong, Jie Zhao, Haojun Yang, Z. Qian, J. Qian, Liming Tang","doi":"10.1142/S0219720022500044","DOIUrl":"https://doi.org/10.1142/S0219720022500044","url":null,"abstract":"Background: RNA adenosine modifications are crucial for regulating RNA levels. N6-methyladenosine (m6A), N1-methyladenosine (m1A), adenosine-to-inosine RNA editing, and alternative polyadenylation (APA) are four major RNA modification types. Methods: We evaluated the altered mRNA expression profiles of 27 RNA modification enzymes and compared the differences in tumor microenvironment (TME) and clinical prognosis between two RNA modification patterns using unsupervised clustering. Then, we constructed a scoring system, WM_score, and quantified the RNA modifications in patients of gastric cancer (GC), associating WM_score with TME, clinical outcomes, and effectiveness of targeted therapies. Results: RNA adenosine modifications strongly correlated with TME and could predict the degree of TME cell infiltration, genetic variation, and clinical prognosis. Two modification patterns were identified according to high and low WM_scores. Tumors in the WM_score-high subgroup were closely linked with survival advantage, CD4[Formula: see text] T-cell infiltration, high tumor mutation burden, and cell cycle signaling pathways, whereas those in the WM_score-low subgroup showed strong infiltration of inflammatory cells and poor survival. Regarding the immunotherapy response, a high WM_score showed a significant correlation with PD-L1 expression, predicting the effect of PD-L1 blockade therapy. Conclusion: The WM_scoring system could facilitate scoring and prediction of GC prognosis.","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":"1 1","pages":"2250004"},"PeriodicalIF":1.0,"publicationDate":"2022-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48168063","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A sequence-based two-layer predictor for identifying enhancers and their strength through enhanced feature extraction","authors":"Santhosh Amilpur, Raju Bhukya","doi":"10.1142/S0219720022500056","DOIUrl":"https://doi.org/10.1142/S0219720022500056","url":null,"abstract":"Enhancers are short regulatory DNA fragments that are bound with proteins called activators. They are free-bound and distant elements, which play a vital role in controlling gene expression. It is challenging to identify enhancers and their strength due to their dynamic nature. Although some machine learning methods exist to accelerate identification process, their prediction accuracy and efficiency will need more improvement. In this regard, we propose a two-layer prediction model with enhanced feature extraction strategy which does feature combination from improved position-specific amino acid propensity (PSTKNC) method along with Enhanced Nucleic Acid Composition (ENAC) and Composition of k-spaced Nucleic Acid Pairs (CKSNAP). The feature sets from all three feature extraction approaches were concatenated and then sent through a simple artificial neural network (ANN) to accurately identify enhancers in the first layer and their strength in the second layer. Experiments are conducted on benchmark chromatin nine cell lines dataset. A 10-fold cross validation method is employed to evaluate model's performance. The results show that the proposed model gives an outstanding performance with 94.50%, 0.8903 of accuracy and Matthew's correlation coefficient (MCC) in predicting enhancers and fairly does well with independent test also when compared with all other existing methods.","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":"1 1","pages":"2250005"},"PeriodicalIF":1.0,"publicationDate":"2022-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41464283","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Identification of cancer-related module in protein-protein interaction network based on gene prioritization.","authors":"Jingli Wu, Qi Zhang, Gaoshi Li","doi":"10.1142/S0219720021500311","DOIUrl":"https://doi.org/10.1142/S0219720021500311","url":null,"abstract":"<p><p>With the rapid development of deep sequencing technologies, a large amount of high-throughput data has been available for studying the carcinogenic mechanism at the molecular level. It has been widely accepted that the development and progression of cancer are regulated by modules/pathways rather than individual genes. The investigation of identifying cancer-related active modules has received an extensive attention. In this paper, we put forward an identification method ModFinder by integrating both biological networks and gene expression profiles. More concretely, a gene scoring function is devised by using the regression model with [Formula: see text]-step random walk kernel, and the genes are ranked according to both of their active scores and degrees in the PPI network. Then a greedy algorithm NSEA is introduced to find an active module with high score and strong connectivity. Experiments were performed on both simulated data and real biological one, i.e. breast cancer and cervical cancer. Compared with the previous methods SigMod, LEAN and RegMod, ModFinder shows competitive performance. It can successfully identify a well-connected module that contains a large proportion of cancer-related genes, including some well-known oncogenes or tumor suppressors enriched in cancer-related pathways.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":"20 1","pages":"2150031"},"PeriodicalIF":1.0,"publicationDate":"2022-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39956506","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A new Bayesian approach for QTL mapping of family data.","authors":"Daiane Aparecida Zuanetti, Luis Aparecido Milan","doi":"10.1142/S021972002150030X","DOIUrl":"https://doi.org/10.1142/S021972002150030X","url":null,"abstract":"<p><p>In this paper, we propose a new Bayesian approach for QTL mapping of family data. The main purpose is to model a phenotype as a function of QTLs' effects. The model considers the detailed familiar dependence and it does not rely on random effects. It combines the probability for Mendelian inheritance of parents' genotype and the correlation between flanking markers and QTLs. This is an advance when compared with models which use only Mendelian segregation or only the correlation between markers and QTLs to estimate transmission probabilities. We use the Bayesian approach to estimate the number of QTLs, their location and the additive and dominance effects. We compare the performance of the proposed method with variance component and LASSO models using simulated and GAW17 data sets. Under tested conditions, the proposed method outperforms other methods in aspects such as estimating the number of QTLs, the accuracy of the QTLs' position and the estimate of their effects. The results of the application of the proposed method to data sets exceeded all of our expectations.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":"20 1","pages":"2150030"},"PeriodicalIF":1.0,"publicationDate":"2022-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39645904","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Identifying duplications and lateral gene transfers simultaneously and rapidly.","authors":"Zhi-Zhong Chen, Fei Deng, Lusheng Wang","doi":"10.1142/S0219720021500335","DOIUrl":"https://doi.org/10.1142/S0219720021500335","url":null,"abstract":"<p><p>This paper deals with the problem of enumerating all minimum-cost LCA-reconciliations involving gene duplications and lateral gene transfers (LGTs) for a given species tree [Formula: see text] and a given gene tree [Formula: see text]. Previously, [Tofigh A, Hallett M, Lagergren J, Simultaneous identification of duplications and lateral gene transfers, <i>IEEE/ACM Trans Comput Biol Bioinf</i> 517-535, 2011.] gave a fixed-parameter algorithm for this problem that runs in [Formula: see text] time, where [Formula: see text] is the number of vertices in [Formula: see text], [Formula: see text] is the number of vertices in [Formula: see text], and [Formula: see text] is the minimum cost of an LCA-reconciliation between [Formula: see text] and [Formula: see text]. In this paper, by refining their algorithm, we obtain a new one for the same problem that finds and outputs the solutions in a compact form within [Formula: see text] time. In the most interesting case where [Formula: see text], our algorithm is [Formula: see text] times faster.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":"20 1","pages":"2150033"},"PeriodicalIF":1.0,"publicationDate":"2022-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39805627","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}