Nadia Tahiri, Andrey Veriga, Aleksandr Koshkarov, Boris Morozov
{"title":"Invariant transformers of Robinson and Foulds distance matrices for Convolutional Neural Network.","authors":"Nadia Tahiri, Andrey Veriga, Aleksandr Koshkarov, Boris Morozov","doi":"10.1142/S0219720022500123","DOIUrl":"https://doi.org/10.1142/S0219720022500123","url":null,"abstract":"<p><p>The evolutionary histories of genes are susceptible of differing greatly from each other which could be explained by evolutionary variations in horizontal gene transfers or biological recombinations. A phylogenetic tree would therefore represent the evolutionary history of each gene, which may present different patterns from the species tree that defines the main evolutionary patterns. In addition, phylogenetic trees of closely related species should be merged, thus minimizing the topological conflicts they present and obtaining consensus trees (in the case of homogeneous data) or supertrees (in the case of heterogeneous data). The traditional approaches are consensus tree inference (if the set of trees contains the same set of species) or supertrees (if the set of trees contains different, but overlapping sets of species). Consensus trees and supertrees are constructed to produce unique trees. However, these methods lose precision with respect to different evolutionary variability. Other approaches have been implemented to preserve this variability using the [Formula: see text]-means algorithm or the [Formula: see text]-medoids algorithm. Using a new method, we determine all possible consensus trees and supertrees that best represent the most significant evolutionary models in a set of phylogenetic trees, thereby increasing the precision of the results and decreasing the time required. <b>Results:</b> This paper presents in detail a new method for predicting the number of clusters in a Robinson and Foulds (RF) distance matrix using a convolutional neural network (CNN). We developed a new CNN approach (called CNNTrees) for multiple tree classification. This new strategy returns a number of clusters of the input phylogenetic trees for different-size sets of trees, which makes the new approach more stable and more robust. The paper provides an in-depth analysis of the relevant, but very difficult, problem of constructing alternative supertrees using phylogenies with different but overlapping sets of taxa. This new model will play an important role in the inference of Trees of Life (ToL). <b>Availability and implementation:</b> CNNTrees is available through a web server at https://tahirinadia.github.io/. The source code, data and information about installation procedures are also available at https://github.com/TahiriNadia/CNNTrees. <b>Supplementary information:</b> Supplementary data are available on GitHub platform. The evolutionary history of species is not unique, but is specific to sets of genes. Indeed, each gene has its own evolutionary history that differs considerably from one gene to another. For example, some individual genes or operons may be affected by specific horizontal gene transfer and recombination events. Thus, the evolutionary history of each gene must be represented by its own phylogenetic tree, which may exhibit different evolutionary patterns than the species tree that accounts for the major vertical descent patterns. T","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":"20 4","pages":"2250012"},"PeriodicalIF":1.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10775458","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"TemporalGSSA: A numerically robust R-wrapper to facilitate computation of a metabolite-specific and simulation time-dependent trajectory from stochastic simulation algorithm (SSA)-generated datasets.","authors":"Siddhartha Kundu","doi":"10.1142/S0219720022500184","DOIUrl":"https://doi.org/10.1142/S0219720022500184","url":null,"abstract":"<p><p>Whilst data on biochemical networks has increased several-fold, our comprehension of the underlying molecular biology is incomplete and inadequate. Simulation studies permit data collation from disparate time points and the imputed trajectories can provide valuable insights into the molecular biology of complex biochemical systems. Although, stochastic simulations are accurate, each run is an independent event and the data that is generated cannot be directly compared even with identical simulation times. This lack of robustness will preclude a biologically meaningful result for the metabolite(s) of concern and is a significant limitation of this approach. \"TemporalGSSA\" or temporal Gillespie Stochastic Simulation Algorithm is an R-wrapper which will collate and partition SSA-generated datasets with identical simulation times (trials) into finite sets of linear models (technical replicates). Each such model (time step of a single run, absolute number of molecules for a metabolite) computes several coefficients (slope, intercept, etc.). These coefficients are averaged (mean slope, mean intercept) across all trials of a technical replicate and along with an imputed time step (mean, median, random) is incorporated into a linear regression equation. The solution to this equation is the number of molecules of a metabolite which is used to compute the molar concentration of the metabolite per technical replicate. The summarized (mean, standard deviation) data of this vector of technical replicates is the outcome or numerical estimate of the molar concentration of a metabolite and is dependent on the duration of the simulation. If the SSA-generated dataset comprises runs with differing simulation times, \"TemporalGSSA\" can compute the time-dependent trajectory of a metabolite provided the trials-per technical replicate constraint is complied with. The algorithms deployed by \"TemporalGSSA\" are rigorous, have a sound theoretical basis and have contributed meaningfully to our comprehension of the mechanism(s) that drive complex biochemical systems. \"TemporalGSSA\", is robust, freely accessible and easy to use with several readily testable examples.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":" ","pages":"2250018"},"PeriodicalIF":1.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"40691252","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Flux balance network expansion predicts stage-specific human peri_implantation embryo metabolism.","authors":"Andisheh Dadashi, Derek Martinez","doi":"10.1142/S021972002250010X","DOIUrl":"https://doi.org/10.1142/S021972002250010X","url":null,"abstract":"<p><p>Metabolism is an essential cellular process for the growth and maintenance of organisms. A better understanding of metabolism during embryogenesis may shed light on the developmental origins of human disease. Metabolic networks, however, are vastly complex with many redundant pathways and interconnected circuits. Thus, computational approaches serve as a practical solution for unraveling the genetic basis of embryo metabolism to help guide future experimental investigations. RNA-sequencing and other profiling technologies make it possible to elucidate metabolic genotype-phenotype relationships and yet our understanding of metabolism is limited. Very few studies have examined the temporal or spatial metabolomics of the human embryo, and prohibitively small sample sizes traditionally observed in human embryo research have presented logistical challenges for metabolic studies, hindering progress towards the reconstruction of the human embryonic metabolome. We employed a network expansion algorithm to evolve the metabolic network of the peri-implantation embryo metabolism and we utilized flux balance analysis (FBA) to examine the viability of the evolved networks. We found that modulating oxygen uptake promotes lactate diffusion across the outer mitochondrial layer, providing <i>in-silico</i> support for a proposed lactate-malate-aspartate shuttle. We developed a stage-specific model to serve as a proof-of-concept for the reconstruction of future metabolic models of development. Our work shows that it is feasible to model human metabolism with respect to time-dependent changes characteristic of peri-implantation development.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":"20 4","pages":"2250010"},"PeriodicalIF":1.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10409009","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Transcriptomic meta-analysis reveals biomarker pairs and key pathways in Tetralogy of Fallot.","authors":"Sona Charles, J Sreekumar, Jeyakumar Natarajan","doi":"10.1142/S0219720022400042","DOIUrl":"https://doi.org/10.1142/S0219720022400042","url":null,"abstract":"<p><p>Tetralogy of Fallot (TOF) is a cyanotic congenital condition contributed by genetic, epigenetic as well as environmental factors. We applied sparse machine learning algorithms to RNAseq and sRNAseq data to select the prospective biomarker candidates. Furthermore, we applied filtering techniques to identify a subset of biomarker pairs in TOF. Differential expression analysis disclosed 2757 genes and 214 miRNAs, which are dysregulated. Weighted gene co-expression network analysis on the differentially expressed genes extracted five significant modules that are enriched in GO terms, extracellular matrix, signaling and calcium ion binding. Also, voomNSC selected two genes and five miRNAs and transformed PLDA-predicted 72 genes and 38 miRNAs as prognostic biomarkers. Out of the selected biomarkers, miRNA target analysis revealed 14 miRNA-gene interactions. Also, 10 out of 14 pairs were oppositely expressed and four out of 10 oppositely expressed biomarker pairs shared common pathways of focal adhesion and P13K-Akt signaling. In conclusion, our study demonstrated the concept of biomarker pairs, which may be considered for clinical validation due to the high literature as well as experimental support.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":" ","pages":"2240004"},"PeriodicalIF":1.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"40576560","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Introduction to Selected Papers from InCoB 2021.","authors":"Yun Zheng","doi":"10.1142/S0219720022020012","DOIUrl":"https://doi.org/10.1142/S0219720022020012","url":null,"abstract":"","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":" ","pages":"2202001"},"PeriodicalIF":1.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"40576561","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ali Burak Öncül, Yüksel Çelik, Necdet Mehmet Ünel, Mehmet Cengiz Baloglu
{"title":"bHLHDB: A next generation database of basic helix loop helix transcription factors based on deep learning model.","authors":"Ali Burak Öncül, Yüksel Çelik, Necdet Mehmet Ünel, Mehmet Cengiz Baloglu","doi":"10.1142/S0219720022500147","DOIUrl":"https://doi.org/10.1142/S0219720022500147","url":null,"abstract":"<p><p>The basic helix loop helix (bHLH) superfamily is a large and diverse protein family that plays a role in various vital functions in nearly all animals and plants. The bHLH proteins form one of the largest families of transcription factors found in plants that act as homo- or heterodimers to regulate the expression of their target genes. The bHLH transcription factor is involved in many aspects of plant development and metabolism, including photomorphogenesis, light signal transduction, secondary metabolism, and stress response. The amount of molecular data has increased dramatically with the development of high-throughput techniques and wide use of bioinformatics techniques. The most efficient way to use this information is to store and analyze the data in a well-organized manner. In this study, all members of the bHLH superfamily in the plant kingdom were used to develop and implement a relational database. We have created a database called bHLHDB (www.bhlhdb.org) for the bHLH family members on which queries can be conducted based on the family or sequences information. The Hidden Markov Model (HMM), which is frequently used by researchers for the analysis of sequences, and the BLAST query were integrated into the database. In addition, the deep learning model was developed to predict the type of TF with only the protein sequence quickly, efficiently, and with 97.54% accuracy and 97.76% precision. We created a unique and next-generation database for bHLH transcription factors and made this database available to the world of science. We believe that the database will be a valuable tool in future studies of the bHLH family.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":" ","pages":"2250014"},"PeriodicalIF":1.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"40555017","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Omid Zarei, Stéphane L Raeppel, Maryam Hamzeh-Mivehroud
{"title":"An alignment-independent three-dimensional quantitative structure-activity relationship study on ron receptor tyrosine kinase inhibitors.","authors":"Omid Zarei, Stéphane L Raeppel, Maryam Hamzeh-Mivehroud","doi":"10.1142/S0219720022500159","DOIUrl":"https://doi.org/10.1142/S0219720022500159","url":null,"abstract":"<p><p>Recepteur d'Origine Nantais known as RON is a member of the receptor tyrosine kinase (RTK) superfamily which has recently gained increasing attention as cancer target for therapeutic intervention. The aim of this work was to perform an alignment-independent three-dimensional quantitative structure-activity relationship (3D QSAR) study for a series of RON inhibitors. A 3D QSAR model based on GRid-INdependent Descriptors (GRIND) methodology was generated using a set of 19 compounds with RON inhibitory activities. The generated 3D QSAR model revealed the main structural features important in the potency of RON inhibitors. The results obtained from the presented study can be used in lead optimization projects for designing of novel compounds where inhibition of RON is needed.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":"20 3","pages":"2250015"},"PeriodicalIF":1.0,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9233977","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Prediction of nucleosome dynamic interval based on long-short-term memory network (LSTM).","authors":"Jianli Liu, Deliang Zhou, Wen Jin","doi":"10.1142/S0219720022500093","DOIUrl":"10.1142/S0219720022500093","url":null,"abstract":"<p><p>Nucleosome localization is a dynamic process and consists of nucleosome dynamic intervals (NDIs). We preprocessed nucleosome sequence data as time series data (TSD) and developed a long short-term memory network (LSTM) model for training time series data (TSD; LSTM-TSD model) using iterative training and feature learning that predicts NDIs with high accuracy. Sn, Sp, Acc, and MCC of the obtained LSTM model is 91.88%, 92.72%, 92.30%, and 84.61%, respectively. LSTM model could precisely predict the NDIs of yeast 16 chromosome. The NDIs contain 90.29% of nucleosome core DNA and 91.20% of nucleosome central sites, indicating that NDIs have high confidence. We found that the binding sites of transcriptional proteins and other proteins are outside NDIs, not in NDIs. These results are important for analysis of nucleosome localization and gene transcriptional regulation.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":"1 1","pages":"2250009"},"PeriodicalIF":0.9,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48943898","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Transforming OMIC features for classification using siamese convolutional networks.","authors":"Qian Wang, Meiyu Duan, Yusi Fan, Shuai Liu, Yanjiao Ren, Lan Huang, Fengfeng Zhou","doi":"10.1142/S0219720022500135","DOIUrl":"https://doi.org/10.1142/S0219720022500135","url":null,"abstract":"<p><p>Modern biotechnologies have generated huge amount of OMIC data, among which transcriptomes and methylomes are two major OMIC types. Transcriptomes measure the expression levels of all the transcripts while methylomes depict the cytosine methylation levels across a genome. Both OMIC data types could be generated by array or sequencing. And some studies deliver many more features (the number of features is denoted as [Formula: see text]) for a sample than the number [Formula: see text] of samples in a cohort, which induce the \"large [Formula: see text] small [Formula: see text]\" paradigm. This study focused on the classification problem about OMIC with \"large [Formula: see text] small [Formula: see text]\" paradigm. A Siamese convolutional network was utilized to transform the OMIC features into a new space with minimized intra-class distances and maximized inter-class distances between the samples. The proposed feature engineering algorithm SiaCo was comprehensively evaluated using both transcriptome and methylome datasets. The experimental data showed that SiaCo generated SiaCo features with improved classification accuracies for binary classification problems, and achieved improvements on the independent test dataset. The individual SiaCo features did not show better inter-class discrimination powers than the original OMIC features. This may be due to that the Siamese convolutional network optimized the collective performances of the SiaCo features, instead of the individual feature's discrimination power. The inherent transformation nature of the Siamese twin network also makes the SiaCo features lack of interpretability. The source code of SiaCo is freely available at http://www.healthinformaticslab.org/supp/resources.php.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":" ","pages":"2250013"},"PeriodicalIF":1.0,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"40608394","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mengqiu Sun, Shengnan She, Hengwei Chen, Jiaxi Cheng, Wei Ji, Dan Wang, Chunlai Feng
{"title":"Prediction model for synergistic anti-tumor multi-compound combinations from traditional Chinese medicine based on extreme gradient boosting, targets and gene expression data.","authors":"Mengqiu Sun, Shengnan She, Hengwei Chen, Jiaxi Cheng, Wei Ji, Dan Wang, Chunlai Feng","doi":"10.1142/S0219720022500160","DOIUrl":"https://doi.org/10.1142/S0219720022500160","url":null,"abstract":"<p><p>Traditional Chinese medicine (TCM) is characterized by synergistic therapeutic effect involving multiple compounds and targets, which provide potential new therapy for the treatment of complex cancer conditions. However, the main contributors and the underlying mechanisms of synergistic TCM cancer therapies remain largely undetermined. Machine learning now provides a new approach to determine synergistic compound combinations from complex components of TCM. In this study, a prediction model based on extreme gradient boosting (XGBoost) algorithm was constructed by integrating gene expression data of different cancer cell lines, targets information of natural compounds and drug response data. Radix Paeoniae Rubra (RPR) was selected as a model herbal sample to evaluate the reliability of the constructed model. The optimal XGBoost prediction model achieved a good performance with Mean Square Error (MSE) of 0.66, Mean Absolute Error (MAE) of 0.61, and the Root Mean Squared Error (RMSE) of 0.81 on test dataset. The superior synergistic anti-tumor combinations of D15 (Paeonol[Formula: see text][Formula: see text][Formula: see text]Ethyl gallate) and D13 (Paeoniflorin[Formula: see text][Formula: see text][Formula: see text]Paeonol) were successfully predicted from RPR and experimentally validated on MCF-7 cells. Moreover, the combination of D13 could work as a main contributor to a synergistic anti-proliferative activity in the compatibility of RPR and Cortex Moutan (CM). Our XGBoost model could be a reliable tool for the efficient prediction of synergistic anti-tumor multi-compound combinations from TCM.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":" ","pages":"2250016"},"PeriodicalIF":1.0,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"40624490","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}