Ali Burak Öncül, Yüksel Çelik, Necdet Mehmet Ünel, Mehmet Cengiz Baloglu
{"title":"bHLHDB: A next generation database of basic helix loop helix transcription factors based on deep learning model.","authors":"Ali Burak Öncül, Yüksel Çelik, Necdet Mehmet Ünel, Mehmet Cengiz Baloglu","doi":"10.1142/S0219720022500147","DOIUrl":"https://doi.org/10.1142/S0219720022500147","url":null,"abstract":"<p><p>The basic helix loop helix (bHLH) superfamily is a large and diverse protein family that plays a role in various vital functions in nearly all animals and plants. The bHLH proteins form one of the largest families of transcription factors found in plants that act as homo- or heterodimers to regulate the expression of their target genes. The bHLH transcription factor is involved in many aspects of plant development and metabolism, including photomorphogenesis, light signal transduction, secondary metabolism, and stress response. The amount of molecular data has increased dramatically with the development of high-throughput techniques and wide use of bioinformatics techniques. The most efficient way to use this information is to store and analyze the data in a well-organized manner. In this study, all members of the bHLH superfamily in the plant kingdom were used to develop and implement a relational database. We have created a database called bHLHDB (www.bhlhdb.org) for the bHLH family members on which queries can be conducted based on the family or sequences information. The Hidden Markov Model (HMM), which is frequently used by researchers for the analysis of sequences, and the BLAST query were integrated into the database. In addition, the deep learning model was developed to predict the type of TF with only the protein sequence quickly, efficiently, and with 97.54% accuracy and 97.76% precision. We created a unique and next-generation database for bHLH transcription factors and made this database available to the world of science. We believe that the database will be a valuable tool in future studies of the bHLH family.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":" ","pages":"2250014"},"PeriodicalIF":1.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"40555017","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Omid Zarei, Stéphane L Raeppel, Maryam Hamzeh-Mivehroud
{"title":"An alignment-independent three-dimensional quantitative structure-activity relationship study on ron receptor tyrosine kinase inhibitors.","authors":"Omid Zarei, Stéphane L Raeppel, Maryam Hamzeh-Mivehroud","doi":"10.1142/S0219720022500159","DOIUrl":"https://doi.org/10.1142/S0219720022500159","url":null,"abstract":"<p><p>Recepteur d'Origine Nantais known as RON is a member of the receptor tyrosine kinase (RTK) superfamily which has recently gained increasing attention as cancer target for therapeutic intervention. The aim of this work was to perform an alignment-independent three-dimensional quantitative structure-activity relationship (3D QSAR) study for a series of RON inhibitors. A 3D QSAR model based on GRid-INdependent Descriptors (GRIND) methodology was generated using a set of 19 compounds with RON inhibitory activities. The generated 3D QSAR model revealed the main structural features important in the potency of RON inhibitors. The results obtained from the presented study can be used in lead optimization projects for designing of novel compounds where inhibition of RON is needed.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":"20 3","pages":"2250015"},"PeriodicalIF":1.0,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9233977","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Prediction of nucleosome dynamic interval based on long-short-term memory network (LSTM).","authors":"Jianli Liu, Deliang Zhou, Wen Jin","doi":"10.1142/S0219720022500093","DOIUrl":"10.1142/S0219720022500093","url":null,"abstract":"<p><p>Nucleosome localization is a dynamic process and consists of nucleosome dynamic intervals (NDIs). We preprocessed nucleosome sequence data as time series data (TSD) and developed a long short-term memory network (LSTM) model for training time series data (TSD; LSTM-TSD model) using iterative training and feature learning that predicts NDIs with high accuracy. Sn, Sp, Acc, and MCC of the obtained LSTM model is 91.88%, 92.72%, 92.30%, and 84.61%, respectively. LSTM model could precisely predict the NDIs of yeast 16 chromosome. The NDIs contain 90.29% of nucleosome core DNA and 91.20% of nucleosome central sites, indicating that NDIs have high confidence. We found that the binding sites of transcriptional proteins and other proteins are outside NDIs, not in NDIs. These results are important for analysis of nucleosome localization and gene transcriptional regulation.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":"1 1","pages":"2250009"},"PeriodicalIF":0.9,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48943898","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Transforming OMIC features for classification using siamese convolutional networks.","authors":"Qian Wang, Meiyu Duan, Yusi Fan, Shuai Liu, Yanjiao Ren, Lan Huang, Fengfeng Zhou","doi":"10.1142/S0219720022500135","DOIUrl":"https://doi.org/10.1142/S0219720022500135","url":null,"abstract":"<p><p>Modern biotechnologies have generated huge amount of OMIC data, among which transcriptomes and methylomes are two major OMIC types. Transcriptomes measure the expression levels of all the transcripts while methylomes depict the cytosine methylation levels across a genome. Both OMIC data types could be generated by array or sequencing. And some studies deliver many more features (the number of features is denoted as [Formula: see text]) for a sample than the number [Formula: see text] of samples in a cohort, which induce the \"large [Formula: see text] small [Formula: see text]\" paradigm. This study focused on the classification problem about OMIC with \"large [Formula: see text] small [Formula: see text]\" paradigm. A Siamese convolutional network was utilized to transform the OMIC features into a new space with minimized intra-class distances and maximized inter-class distances between the samples. The proposed feature engineering algorithm SiaCo was comprehensively evaluated using both transcriptome and methylome datasets. The experimental data showed that SiaCo generated SiaCo features with improved classification accuracies for binary classification problems, and achieved improvements on the independent test dataset. The individual SiaCo features did not show better inter-class discrimination powers than the original OMIC features. This may be due to that the Siamese convolutional network optimized the collective performances of the SiaCo features, instead of the individual feature's discrimination power. The inherent transformation nature of the Siamese twin network also makes the SiaCo features lack of interpretability. The source code of SiaCo is freely available at http://www.healthinformaticslab.org/supp/resources.php.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":" ","pages":"2250013"},"PeriodicalIF":1.0,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"40608394","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mengqiu Sun, Shengnan She, Hengwei Chen, Jiaxi Cheng, Wei Ji, Dan Wang, Chunlai Feng
{"title":"Prediction model for synergistic anti-tumor multi-compound combinations from traditional Chinese medicine based on extreme gradient boosting, targets and gene expression data.","authors":"Mengqiu Sun, Shengnan She, Hengwei Chen, Jiaxi Cheng, Wei Ji, Dan Wang, Chunlai Feng","doi":"10.1142/S0219720022500160","DOIUrl":"https://doi.org/10.1142/S0219720022500160","url":null,"abstract":"<p><p>Traditional Chinese medicine (TCM) is characterized by synergistic therapeutic effect involving multiple compounds and targets, which provide potential new therapy for the treatment of complex cancer conditions. However, the main contributors and the underlying mechanisms of synergistic TCM cancer therapies remain largely undetermined. Machine learning now provides a new approach to determine synergistic compound combinations from complex components of TCM. In this study, a prediction model based on extreme gradient boosting (XGBoost) algorithm was constructed by integrating gene expression data of different cancer cell lines, targets information of natural compounds and drug response data. Radix Paeoniae Rubra (RPR) was selected as a model herbal sample to evaluate the reliability of the constructed model. The optimal XGBoost prediction model achieved a good performance with Mean Square Error (MSE) of 0.66, Mean Absolute Error (MAE) of 0.61, and the Root Mean Squared Error (RMSE) of 0.81 on test dataset. The superior synergistic anti-tumor combinations of D15 (Paeonol[Formula: see text][Formula: see text][Formula: see text]Ethyl gallate) and D13 (Paeoniflorin[Formula: see text][Formula: see text][Formula: see text]Paeonol) were successfully predicted from RPR and experimentally validated on MCF-7 cells. Moreover, the combination of D13 could work as a main contributor to a synergistic anti-proliferative activity in the compatibility of RPR and Cortex Moutan (CM). Our XGBoost model could be a reliable tool for the efficient prediction of synergistic anti-tumor multi-compound combinations from TCM.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":" ","pages":"2250016"},"PeriodicalIF":1.0,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"40624490","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ensieh Khazaei, Ala Emrany, Mostafa Tavassolipour, Foroozandeh Mahjoubi, Ahmad Ebrahimi, Seyed Abolfazl Motahari
{"title":"Automated analysis of karyotype images.","authors":"Ensieh Khazaei, Ala Emrany, Mostafa Tavassolipour, Foroozandeh Mahjoubi, Ahmad Ebrahimi, Seyed Abolfazl Motahari","doi":"10.1142/S0219720022500111","DOIUrl":"https://doi.org/10.1142/S0219720022500111","url":null,"abstract":"<p><p>Karyotype is a genetic test that is used for detection of chromosomal defects. In a karyotype test, an image is captured from chromosomes during the cell division. The captured images are then analyzed by cytogeneticists in order to detect possible chromosomal defects. In this paper, we have proposed an automated pipeline for analysis of karyotype images. There are three main steps for karyotype image analysis: image enhancement, image segmentation and chromosome classification. In this paper, we have proposed a novel chromosome segmentation algorithm to decompose overlapped chromosomes. We have also proposed a CNN-based classifier which outperforms all the existing classifiers. Our classifier is trained by a dataset of about 1,62,000 human chromosome images. We also introduced a novel post-processing algorithm which improves the classification results. The success rate of our segmentation algorithm is 95%. In addition, our experimental results show that the accuracy of our classifier for human chromosomes is 92.63% and our novel post-processing algorithm increases the classification results to 94%.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":" ","pages":"2250011"},"PeriodicalIF":1.0,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"40494201","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sheng Chang, Lijun Shen, Linlin Li, Xi Chen, Hua Han
{"title":"Denoising of scanning electron microscope images for biological ultrastructure enhancement","authors":"Sheng Chang, Lijun Shen, Linlin Li, Xi Chen, Hua Han","doi":"10.1142/S021972002250007X","DOIUrl":"https://doi.org/10.1142/S021972002250007X","url":null,"abstract":"Scanning electron microscopy (SEM) is of great significance for analyzing the ultrastructure. However, due to the requirements of data throughput and electron dose of biological samples in the imaging process, the SEM image of biological samples is often occupied by noise which severely affects the observation of ultrastructure. Therefore, it is necessary to analyze and establish a noise model of SEM and propose an effective denoising algorithm that can preserve the ultrastructure. We first investigated the noise source of SEM images and introduced a signal-related SEM noise model. Then, we validated the effectiveness of the noise model through experiments, which are designed with standard samples to reflect the relation between real signal intensity and noise. Based on the SEM noise model and traditional variance stabilization denoising strategy, we proposed a novel, two-stage denoising method. In the first stage variance stabilization, our VS-Net realizes the separation of signal-dependent noise and signal in the SEM image. In the second stage denoising, our D-Net employs the structure of U-Net and combines the attention mechanism to achieve efficient noise removal. Compared with other existing denoising methods for SEM images, our proposed method is more competitive in objective evaluation and visual effects. Source code is available on GitHub (https://github.com/VictorCSheng/VSID-Net).","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":"1 1","pages":"2250007"},"PeriodicalIF":1.0,"publicationDate":"2022-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46938369","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Quantitative structure-activity relationship modeling reveals the minimal sequence requirement and amino acid preference of sirtuin-1's deacetylation substrates in diabetes mellitus","authors":"X. Shao, W. Kong, Y. Li, S. Zhang","doi":"10.1142/S0219720022500081","DOIUrl":"https://doi.org/10.1142/S0219720022500081","url":null,"abstract":"Sirtuin 1 (SIRT1) is a nicotinamide adenine dinucleotide (NAD[Formula: see text]-dependent deacetylase involved in multiple glucose metabolism pathways and plays an important role in the pathogenesis of diabetes mellitus (DM). The enzyme specifically recognizes its deacetylation substrates' peptide segments containing a central acetyl-lysine residue as well as a number of amino acids flanking the central residue. In this study, we attempted to ascertain the minimal sequence requirement (MSR) around the central acetyl-lysine residue of SIRT1 substrate-recognition sites as well as the amino acid preference (AAP) at different residues of the MSR window through quantitative structure-activity relationship (QSAR) strategy, which would benefit our understanding of SIRT1 substrate specificity at the molecular level and is also helpful to rationally design substrate-mimicking peptidic agents against DM by competitively targeting SIRT1 active site. In this procedure, a large-scale dataset containing 6801 13-mer acetyl-lysine peptides (and their SIRT1-catalyized deacetylation activities) were compiled to train 10 QSAR regression models developed by systematic combination of machine learning methods (PLS and SVM) and five amino acids descriptors (DPPS, T-scale, MolSurf, [Formula: see text]-score, and FASGAI). The two best QSAR models (PLS+FASGAI and SVM+DPPS) were then employed to statistically examine the contribution of residue positions to the deacetylation activity of acetyl-lysine peptide substrates, revealing that the MSR can be represented by 5-mer acetyl-lysine peptides that meet a consensus motif X[Formula: see text]X[Formula: see text]X[Formula: see text](AcK)0X[Formula: see text]. Structural analysis found that the X[Formula: see text] and (AcK)0 residues are tightly packed against the enzyme active site and confer both stability and specificity for the enzyme-substrate complex, whereas the X[Formula: see text], X[Formula: see text] and X[Formula: see text] residues are partially exposed to solvent but can also effectively stabilize the complex system. Subsequently, a systematic deacetylation activity change profile (SDACP) was created based on QSAR modeling, from which the AAP for each residue position of MSR was depicted. With the profile, we were able to rationally design an SDACP combinatorial library with promising deacetylation activity, from which nine MSR acetyl-lysine peptides as well as two known SIRT1 acetyl-lysine peptide substrates were tested by using SIRT1 deacetylation assay. It is revealed that the designed peptides exhibit a comparable or even higher activity than the controls, although the former is considerably shorter than the latter.","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":"1 1","pages":"2250008"},"PeriodicalIF":1.0,"publicationDate":"2022-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45781245","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"DeepBtoD: Improved RNA-binding proteins prediction via integrated deep learning","authors":"Xiuquan Du, Xiu-juan Zhao, Yanping Zhang","doi":"10.1142/S0219720022500068","DOIUrl":"https://doi.org/10.1142/S0219720022500068","url":null,"abstract":"RNA-binding proteins (RBPs) have crucial roles in various cellular processes such as alternative splicing and gene regulation. Therefore, the analysis and identification of RBPs is an essential issue. However, although many computational methods have been developed for predicting RBPs, a few studies simultaneously consider local and global information from the perspective of the RNA sequence. Facing this challenge, we present a novel method called DeepBtoD, which predicts RBPs directly from RNA sequences. First, a [Formula: see text]-BtoD encoding is designed, which takes into account the composition of [Formula: see text]-nucleotides and their relative positions and forms a local module. Second, we designed a multi-scale convolutional module embedded with a self-attentive mechanism, the ms-focusCNN, which is used to further learn more effective, diverse, and discriminative high-level features. Finally, global information is considered to supplement local modules with ensemble learning to predict whether the target RNA binds to RBPs. Our preliminary 24 independent test datasets show that our proposed method can classify RBPs with the area under the curve of 0.933. Remarkably, DeepBtoD shows competitive results across seven state-of-the-art methods, suggesting that RBPs can be highly recognized by integrating local [Formula: see text]-BtoD and global information only from RNA sequences. Hence, our integrative method may be useful to improve the power of RBPs prediction, which might be particularly useful for modeling protein-nucleic acid interactions in systems biology studies. Our DeepBtoD server can be accessed at http://175.27.228.227/DeepBtoD/.","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":"1 1","pages":"2250006"},"PeriodicalIF":1.0,"publicationDate":"2022-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42540334","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Qingting Wei, Hong Zou, Cuncong Zhong, Jianfeng Xu
{"title":"RPfam: A refiner towards curated-like multiple sequence alignments of the Pfam protein families","authors":"Qingting Wei, Hong Zou, Cuncong Zhong, Jianfeng Xu","doi":"10.1142/S0219720022400029","DOIUrl":"https://doi.org/10.1142/S0219720022400029","url":null,"abstract":"High-quality multiple sequence alignments can provide insights into the architecture and function of protein families. The existing MSA tools often generate results inconsistent with biological distribution of conserved regions because of positioning amino acid residues and gaps only by symbols. We propose RPfam, a refiner towards curated-like MSAs for modeling the protein families in the Pfam database. RPfam refines the automatic alignments via scoring alignments based on the PFASUM matrix, restricting realignments within badly aligned blocks, optimizing the block scores by dynamic programming, and running refinements iteratively using the Simulated Annealing algorithm. Experiments show RPfam effectively refined the alignments produced by the MSA tools ClustalO and Muscle with reference to the curated seed alignments of the Pfam protein families. Especially RPfam improved the quality of the ClustalO alignments by 4.4% and the Muscle alignments by 2.8% on the gp32 DNA binding protein-like family. Supplementary Table is available at http://www.worldscinet.com/jbcb/.","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":"1 1","pages":"2240002"},"PeriodicalIF":1.0,"publicationDate":"2022-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48191874","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}