Sheng Chang, Lijun Shen, Linlin Li, Xi Chen, Hua Han
{"title":"Denoising of scanning electron microscope images for biological ultrastructure enhancement","authors":"Sheng Chang, Lijun Shen, Linlin Li, Xi Chen, Hua Han","doi":"10.1142/S021972002250007X","DOIUrl":"https://doi.org/10.1142/S021972002250007X","url":null,"abstract":"Scanning electron microscopy (SEM) is of great significance for analyzing the ultrastructure. However, due to the requirements of data throughput and electron dose of biological samples in the imaging process, the SEM image of biological samples is often occupied by noise which severely affects the observation of ultrastructure. Therefore, it is necessary to analyze and establish a noise model of SEM and propose an effective denoising algorithm that can preserve the ultrastructure. We first investigated the noise source of SEM images and introduced a signal-related SEM noise model. Then, we validated the effectiveness of the noise model through experiments, which are designed with standard samples to reflect the relation between real signal intensity and noise. Based on the SEM noise model and traditional variance stabilization denoising strategy, we proposed a novel, two-stage denoising method. In the first stage variance stabilization, our VS-Net realizes the separation of signal-dependent noise and signal in the SEM image. In the second stage denoising, our D-Net employs the structure of U-Net and combines the attention mechanism to achieve efficient noise removal. Compared with other existing denoising methods for SEM images, our proposed method is more competitive in objective evaluation and visual effects. Source code is available on GitHub (https://github.com/VictorCSheng/VSID-Net).","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":"1 1","pages":"2250007"},"PeriodicalIF":1.0,"publicationDate":"2022-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46938369","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Quantitative structure-activity relationship modeling reveals the minimal sequence requirement and amino acid preference of sirtuin-1's deacetylation substrates in diabetes mellitus","authors":"X. Shao, W. Kong, Y. Li, S. Zhang","doi":"10.1142/S0219720022500081","DOIUrl":"https://doi.org/10.1142/S0219720022500081","url":null,"abstract":"Sirtuin 1 (SIRT1) is a nicotinamide adenine dinucleotide (NAD[Formula: see text]-dependent deacetylase involved in multiple glucose metabolism pathways and plays an important role in the pathogenesis of diabetes mellitus (DM). The enzyme specifically recognizes its deacetylation substrates' peptide segments containing a central acetyl-lysine residue as well as a number of amino acids flanking the central residue. In this study, we attempted to ascertain the minimal sequence requirement (MSR) around the central acetyl-lysine residue of SIRT1 substrate-recognition sites as well as the amino acid preference (AAP) at different residues of the MSR window through quantitative structure-activity relationship (QSAR) strategy, which would benefit our understanding of SIRT1 substrate specificity at the molecular level and is also helpful to rationally design substrate-mimicking peptidic agents against DM by competitively targeting SIRT1 active site. In this procedure, a large-scale dataset containing 6801 13-mer acetyl-lysine peptides (and their SIRT1-catalyized deacetylation activities) were compiled to train 10 QSAR regression models developed by systematic combination of machine learning methods (PLS and SVM) and five amino acids descriptors (DPPS, T-scale, MolSurf, [Formula: see text]-score, and FASGAI). The two best QSAR models (PLS+FASGAI and SVM+DPPS) were then employed to statistically examine the contribution of residue positions to the deacetylation activity of acetyl-lysine peptide substrates, revealing that the MSR can be represented by 5-mer acetyl-lysine peptides that meet a consensus motif X[Formula: see text]X[Formula: see text]X[Formula: see text](AcK)0X[Formula: see text]. Structural analysis found that the X[Formula: see text] and (AcK)0 residues are tightly packed against the enzyme active site and confer both stability and specificity for the enzyme-substrate complex, whereas the X[Formula: see text], X[Formula: see text] and X[Formula: see text] residues are partially exposed to solvent but can also effectively stabilize the complex system. Subsequently, a systematic deacetylation activity change profile (SDACP) was created based on QSAR modeling, from which the AAP for each residue position of MSR was depicted. With the profile, we were able to rationally design an SDACP combinatorial library with promising deacetylation activity, from which nine MSR acetyl-lysine peptides as well as two known SIRT1 acetyl-lysine peptide substrates were tested by using SIRT1 deacetylation assay. It is revealed that the designed peptides exhibit a comparable or even higher activity than the controls, although the former is considerably shorter than the latter.","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":"1 1","pages":"2250008"},"PeriodicalIF":1.0,"publicationDate":"2022-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45781245","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"DeepBtoD: Improved RNA-binding proteins prediction via integrated deep learning","authors":"Xiuquan Du, Xiu-juan Zhao, Yanping Zhang","doi":"10.1142/S0219720022500068","DOIUrl":"https://doi.org/10.1142/S0219720022500068","url":null,"abstract":"RNA-binding proteins (RBPs) have crucial roles in various cellular processes such as alternative splicing and gene regulation. Therefore, the analysis and identification of RBPs is an essential issue. However, although many computational methods have been developed for predicting RBPs, a few studies simultaneously consider local and global information from the perspective of the RNA sequence. Facing this challenge, we present a novel method called DeepBtoD, which predicts RBPs directly from RNA sequences. First, a [Formula: see text]-BtoD encoding is designed, which takes into account the composition of [Formula: see text]-nucleotides and their relative positions and forms a local module. Second, we designed a multi-scale convolutional module embedded with a self-attentive mechanism, the ms-focusCNN, which is used to further learn more effective, diverse, and discriminative high-level features. Finally, global information is considered to supplement local modules with ensemble learning to predict whether the target RNA binds to RBPs. Our preliminary 24 independent test datasets show that our proposed method can classify RBPs with the area under the curve of 0.933. Remarkably, DeepBtoD shows competitive results across seven state-of-the-art methods, suggesting that RBPs can be highly recognized by integrating local [Formula: see text]-BtoD and global information only from RNA sequences. Hence, our integrative method may be useful to improve the power of RBPs prediction, which might be particularly useful for modeling protein-nucleic acid interactions in systems biology studies. Our DeepBtoD server can be accessed at http://175.27.228.227/DeepBtoD/.","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":"1 1","pages":"2250006"},"PeriodicalIF":1.0,"publicationDate":"2022-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42540334","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Qingting Wei, Hong Zou, Cuncong Zhong, Jianfeng Xu
{"title":"RPfam: A refiner towards curated-like multiple sequence alignments of the Pfam protein families","authors":"Qingting Wei, Hong Zou, Cuncong Zhong, Jianfeng Xu","doi":"10.1142/S0219720022400029","DOIUrl":"https://doi.org/10.1142/S0219720022400029","url":null,"abstract":"High-quality multiple sequence alignments can provide insights into the architecture and function of protein families. The existing MSA tools often generate results inconsistent with biological distribution of conserved regions because of positioning amino acid residues and gaps only by symbols. We propose RPfam, a refiner towards curated-like MSAs for modeling the protein families in the Pfam database. RPfam refines the automatic alignments via scoring alignments based on the PFASUM matrix, restricting realignments within badly aligned blocks, optimizing the block scores by dynamic programming, and running refinements iteratively using the Simulated Annealing algorithm. Experiments show RPfam effectively refined the alignments produced by the MSA tools ClustalO and Muscle with reference to the curated seed alignments of the Pfam protein families. Especially RPfam improved the quality of the ClustalO alignments by 4.4% and the Muscle alignments by 2.8% on the gp32 DNA binding protein-like family. Supplementary Table is available at http://www.worldscinet.com/jbcb/.","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":"1 1","pages":"2240002"},"PeriodicalIF":1.0,"publicationDate":"2022-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48191874","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Analysis to determine the effect of mutations on binding to small chemical molecules","authors":"T. Koshlan, K. Kulikov","doi":"10.1142/S0219720022400030","DOIUrl":"https://doi.org/10.1142/S0219720022400030","url":null,"abstract":"In this paper, the authors present and describe, in detail, an original software-implemented numerical methodology used to determine the effect of mutations on binding to small chemical molecules, on the example of gefitinib, AMPPNP, CO-1686, ASP8273, erlotinib binding with EGFR protein, and imatinib binding with PPARgamma. Furthermore, the developed numerical approach makes it possible to determine the stability of a molecular complex, which consists of a protein and a small chemical molecule. The description of the software package that implements the presented algorithm is given in the website: https://binomlabs.com/.","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":"1 1","pages":"2240003"},"PeriodicalIF":1.0,"publicationDate":"2022-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47358607","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Clinical drug response prediction from preclinical cancer cell lines by logistic matrix factorization approach.","authors":"Akram Emdadi, Changiz Eslahchi","doi":"10.1142/S0219720021500359","DOIUrl":"https://doi.org/10.1142/S0219720021500359","url":null,"abstract":"<p><p>Predicting tumor drug response using cancer cell line drug response values for a large number of anti-cancer drugs is a significant challenge in personalized medicine. Predicting patient response to drugs from data obtained from preclinical models is made easier by the availability of different knowledge on cell lines and drugs. This paper proposes the TCLMF method, a predictive model for predicting drug response in tumor samples that was trained on preclinical samples and is based on the logistic matrix factorization approach. The TCLMF model is designed based on gene expression profiles, tissue type information, the chemical structure of drugs and drug sensitivity (<i>IC</i> 50) data from cancer cell lines. We use preclinical data from the Genomics of Drug Sensitivity in Cancer dataset (GDSC) to train the proposed drug response model, which we then use to predict drug sensitivity of samples from the Cancer Genome Atlas (TCGA) dataset. The TCLMF approach focuses on identifying successful features of cell lines and drugs in order to calculate the probability of the tumor samples being sensitive to drugs. The closest cell line neighbours for each tumor sample are calculated using a description of similarity between tumor samples and cell lines in this study. The drug response for a new tumor is then calculated by averaging the low-rank features obtained from its neighboring cell lines. We compare the results of the TCLMF model with the results of the previously proposed methods using two databases and two approaches to test the model's performance. In the first approach, 12 drugs with enough known clinical drug response, considered in previous methods, are studied. For 7 drugs out of 12, the TCLMF can significantly distinguish between patients that are resistance to these drugs and the patients that are sensitive to them. These approaches are converted to classification models using a threshold in the second approach, and the results are compared. The results demonstrate that the TCLMF method provides accurate predictions across the results of the other algorithms. Finally, we accurately classify tumor tissue type using the latent vectors obtained from TCLMF's logistic matrix factorization process. These findings demonstrate that the TCLMF approach produces effective latent vectors for tumor samples. The source code of the TCLMF method is available in https://github.com/emdadi/TCLMF.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":"20 2","pages":"2150035"},"PeriodicalIF":1.0,"publicationDate":"2022-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39614910","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xuan Song, Hai Yun Gao, Karl Herrup, Ronald P Hart
{"title":"Optimized splitting of mixed-species RNA sequencing data.","authors":"Xuan Song, Hai Yun Gao, Karl Herrup, Ronald P Hart","doi":"10.1142/S0219720022500019","DOIUrl":"10.1142/S0219720022500019","url":null,"abstract":"<p><p>Gene expression studies using xenograft transplants or co-culture systems, usually with mixed human and mouse cells, have proven to be valuable to uncover cellular dynamics during development or in disease models. However, the mRNA sequence similarities among species presents a challenge for accurate transcript quantification. To identify optimal strategies for analyzing mixed-species RNA sequencing data, we evaluate both alignment-dependent and alignment-independent methods. Alignment of reads to a pooled reference index is effective, particularly if optimal alignments are used to classify sequencing reads by species, which are re-aligned with individual genomes, generating [Formula: see text] accuracy across a range of species ratios. Alignment-independent methods, such as convolutional neural networks, which extract the conserved patterns of sequences from two species, classify RNA sequencing reads with over 85% accuracy. Importantly, both methods perform well with different ratios of human and mouse reads. While non-alignment strategies successfully partitioned reads by species, a more traditional approach of mixed-genome alignment followed by optimized separation of reads proved to be the more successful with lower error rates.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":"20 2","pages":"2250001"},"PeriodicalIF":1.0,"publicationDate":"2022-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9081140/pdf/nihms-1770823.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39792860","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A protein succinylation sites prediction method based on the hybrid architecture of LSTM network and CNN.","authors":"Die Zhang, Shunfang Wang","doi":"10.1142/S0219720022500032","DOIUrl":"https://doi.org/10.1142/S0219720022500032","url":null,"abstract":"<p><p>The succinylation modification of protein participates in the regulation of a variety of cellular processes. Identification of modified substrates with precise sites is the basis for understanding the molecular mechanism and regulation of succinylation. In this work, we picked and chose five superior feature codes: CKSAAP, ACF, BLOSUM62, AAindex, and one-hot, according to their performance in the problem of succinylation sites prediction. Then, LSTM network and CNN were used to construct four models: LSTM-CNN, CNN-LSTM, LSTM, and CNN. The five selected features were, respectively, input into each of these four models for training to compare the four models. Based on the performance of each model, the optimal model among them was chosen to construct a hybrid model DeepSucc that was composed of five sub-modules for integrating heterogeneous information. Under the 10-fold cross-validation, the hybrid model DeepSucc achieves 86.26% accuracy, 84.94% specificity, 87.57% sensitivity, 0.9406 AUC, and 0.7254 MCC. When compared with other prediction tools using an independent test set, DeepSucc outperformed them in sensitivity and MCC. The datasets and source codes can be accessed at https://github.com/1835174863zd/DeepSucc.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":"20 2","pages":"2250003"},"PeriodicalIF":1.0,"publicationDate":"2022-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39942705","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Tensor decomposition based on the potential low-rank and <i>p</i>-shrinkage generalized threshold algorithm for analyzing cancer multiomics data.","authors":"Hang-Jin Yang, Yu-Xia Lei, Juan Wang, Xiang-Zhen Kong, Jin-Xing Liu, Ying-Lian Gao","doi":"10.1142/S0219720022500020","DOIUrl":"https://doi.org/10.1142/S0219720022500020","url":null,"abstract":"<p><p>Tensor Robust Principal Component Analysis (TRPCA) has achieved promising results in the analysis of genomics data. However, the TRPCA model under the existing tensor singular value decomposition ([Formula: see text]-SVD) framework insufficiently extracts the potential low-rank structure of the data, resulting in suboptimal restored components. Simultaneously, the tensor nuclear norm (TNN) defined based on [Formula: see text]-SVD uses the same standard to handle various singular values. TNN ignores the difference of singular values, leading to the failure of the main information that needs to be well preserved. To preserve the heterogeneous structure in the low-rank information, we propose a novel TNN and extend it to the TRPCA model. Potential low-rank space may contain important information. We learn the low-rank structural information from the core tensor. The singular value space contains the association information between genes and cancers. The [Formula: see text]-shrinkage generalized threshold function is utilized to preserve the low-rank properties of larger singular values. The optimization problem is solved by the alternating direction method of the multiplier (ADMM) algorithm. Clustering and feature selection experiments are performed on the TCGA data set. The experimental results show that the proposed model is more promising than other state-of-the-art tensor decomposition methods.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":"20 2","pages":"2250002"},"PeriodicalIF":1.0,"publicationDate":"2022-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39942706","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
P. Song, Sheng Zhou, Xiaoyang Qi, Y. Jiao, Y. Gong, Jie Zhao, Haojun Yang, Z. Qian, J. Qian, Liming Tang
{"title":"RNA modification writers influence tumor microenvironment in gastric cancer and prospects of targeted drug therapy","authors":"P. Song, Sheng Zhou, Xiaoyang Qi, Y. Jiao, Y. Gong, Jie Zhao, Haojun Yang, Z. Qian, J. Qian, Liming Tang","doi":"10.1142/S0219720022500044","DOIUrl":"https://doi.org/10.1142/S0219720022500044","url":null,"abstract":"Background: RNA adenosine modifications are crucial for regulating RNA levels. N6-methyladenosine (m6A), N1-methyladenosine (m1A), adenosine-to-inosine RNA editing, and alternative polyadenylation (APA) are four major RNA modification types. Methods: We evaluated the altered mRNA expression profiles of 27 RNA modification enzymes and compared the differences in tumor microenvironment (TME) and clinical prognosis between two RNA modification patterns using unsupervised clustering. Then, we constructed a scoring system, WM_score, and quantified the RNA modifications in patients of gastric cancer (GC), associating WM_score with TME, clinical outcomes, and effectiveness of targeted therapies. Results: RNA adenosine modifications strongly correlated with TME and could predict the degree of TME cell infiltration, genetic variation, and clinical prognosis. Two modification patterns were identified according to high and low WM_scores. Tumors in the WM_score-high subgroup were closely linked with survival advantage, CD4[Formula: see text] T-cell infiltration, high tumor mutation burden, and cell cycle signaling pathways, whereas those in the WM_score-low subgroup showed strong infiltration of inflammatory cells and poor survival. Regarding the immunotherapy response, a high WM_score showed a significant correlation with PD-L1 expression, predicting the effect of PD-L1 blockade therapy. Conclusion: The WM_scoring system could facilitate scoring and prediction of GC prognosis.","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":"1 1","pages":"2250004"},"PeriodicalIF":1.0,"publicationDate":"2022-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48168063","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}