Chem-Bio Informatics Journal最新文献_第2页

Quantitative prediction of hERG inhibitory activities using support vector regression and the integrated hERG dataset in AMED cardiotoxicity database 使用支持向量回归和AMED心脏毒性数据库中整合的hERG数据集定量预测hERG抑制活性

IF 0.3

Chem-Bio Informatics Journal Pub Date : 2021-10-01 DOI: 10.1273/cbij.21.70

Tomohiro Sato, Hitomi Yuki, T. Honma

{"title":"Quantitative prediction of hERG inhibitory activities using support vector regression and the integrated hERG dataset in AMED cardiotoxicity database","authors":"Tomohiro Sato, Hitomi Yuki, T. Honma","doi":"10.1273/cbij.21.70","DOIUrl":"https://doi.org/10.1273/cbij.21.70","url":null,"abstract":"The inhibition of hERG potassium channel is closely related to the prolonged QT interval, and to assess the risk could greatly contribute to the development of safer therapeutic compounds. In the hit-to-lead optimization stage of drug development, quantitative prediction of hERG inhibitory activity is crucial to design drug candidates without cardiotoxicity risk. Here, we developed a hERG regression model combining support vector regression (SVR) and descriptor selection by non-dominated sorting genetic algorithm (NSGA-II) based on AMED cardiotoxicity database consisting of hERG blocking information built by integrating public and commercial databases. To construct a regression model, 6,561 compounds with IC50 and/or Ki values were derived from AMED cardiotoxicity database, and randomly separated into training set (70%) for model building and test set (30%) for performance evaluation. To avoid overfitting by employing many non-relevant explanatory variables, NSGA-II, a variation of genetic algorithm for multiple objective optimization, was used for descriptor selection in order to maximize Q2 and minimize RMSE in 5-fold cross validation and minimize the number of used descriptors spontaneously. The prediction performance was then compared to those of ADMET predictor, commercial software providing various ADMET property predictions. The SVR model recorded R2 of 0.594 and RMSE of 0.604 for test set, clearly exceeding those of ADMET predictor (0.134 and 0.690, respectively). The regression model is available at our home page (https://drugdesign.riken.jp/hERG).","PeriodicalId":40659,"journal":{"name":"Chem-Bio Informatics Journal","volume":"6 1","pages":""},"PeriodicalIF":0.3,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88701172","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Appropriate Evaluation Measurements for Regression Models 回归模型的适当评价测量

IF 0.3

Chem-Bio Informatics Journal Pub Date : 2021-09-15 DOI: 10.1273/cbij.21.59

Tsuyoshi Esaki

{"title":"Appropriate Evaluation Measurements for Regression Models","authors":"Tsuyoshi Esaki","doi":"10.1273/cbij.21.59","DOIUrl":"https://doi.org/10.1273/cbij.21.59","url":null,"abstract":"In recent years, accelerating the speed of finding seed compounds and reducing the cost of pharmaceutical research has become a necessity. The contribution of in silico drug discovery methods, which predict candidates as new drugs using physicochemical features and substructure fingerprints of compounds, is thus expected. Selecting the seed compounds without conducting experiments could enable us to reduce the time and cost required for drug development. However, estimating the characteristics of compounds in our body using a simple linear model alone is unsatisfactory because effects and distribution of compounds are determined by the environment in our body and their interactions with other molecules. Compared to simple models, more complex models have been prepared to estimate compound characteristics with high predictive accuracy. Thus, it is increasingly important to correctly evaluate the predictive performance when selecting the models appropriate for research purposes. The determinant coefficient, famous as R 2 , is one of the most famous statistical measures for evaluating regression models. However, this measure cannot be used to evaluate nonlinear models. In this paper, the difficulty of using the determinant coefficient is explained and the proper statistical measures were suggested under the following two conditions: mean squared error (MSE) for cross-validation, and MSE along with correlation coefficients for the observed and predicted values of test data. As understanding statistical measures and using them appropriately is necessary, the suggested measures will support the effective selection of promising seed compounds and accelerate drug discovery.","PeriodicalId":40659,"journal":{"name":"Chem-Bio Informatics Journal","volume":"40 2 1","pages":""},"PeriodicalIF":0.3,"publicationDate":"2021-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83128039","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Logistic regression and random forest unveil key molecular descriptors of druglikeness 逻辑回归和随机森林揭示了药物相似性的关键分子描述符

IF 0.3

Chem-Bio Informatics Journal Pub Date : 2021-09-08 DOI: 10.1273/CBIJ.21.39

L. T. Billones, Nadia B. Morales, J. Billones

{"title":"Logistic regression and random forest unveil key molecular descriptors of druglikeness","authors":"L. T. Billones, Nadia B. Morales, J. Billones","doi":"10.1273/CBIJ.21.39","DOIUrl":"https://doi.org/10.1273/CBIJ.21.39","url":null,"abstract":"The identification of molecular descriptors that embody the chemical information for druglikeness will be a step forward in data-driven drug discovery and development endeavor. In this study, over 4000 Dragon-type molecular properties were generated for approximately 2000 known drugs and 2000 surrogate nondrugs. Logistic Regression (LogR) and Random Forest (RF) techniques were carried out to unveil the crucial molecular descriptors that can adequately classify a compound as drug or nondrug. Ten one-variable LogR models each demonstrated at least 70% prediction accuracy. A two-variable model consisting of HVcpx and MDDD correctly classified 85% of the test compounds. The best LogR model with 89.0% prediction accuracy identified five most influential descriptors for druglikeness: an information index HVcpx , topological index MDDD , a ring descriptor NNRS , X2A or average connectivity index of order 2, and walk and path count SRW05. The best RF model involving 10 only weakly correlated descriptors was found to be 92.5% accurate and at par with the RF and LogR models that consisted of over 200 variables. The model featured: molecular weight, MW ; average molecular weight, AMW ; rotatable bond fraction, RBF; percentage carbon, C%; maximal electrotopological negative variation, MAXDN ; all-path Wiener index, Wap ; structural information content index, neighborhood symmetry of 1 order, SIC1 ; number of nitrogen atoms, nN; 2D Petitjean shape index, PJI2 ; and self-returning walk count of order 5, SRW05 . Many of these descriptors have straightforward chemical interpretability and future applicability as druglikeness filters in virtual high throughput drug discovery.","PeriodicalId":40659,"journal":{"name":"Chem-Bio Informatics Journal","volume":"113 1","pages":"39-58"},"PeriodicalIF":0.3,"publicationDate":"2021-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79190641","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Inference of genetic networks using random forests: performance improvement using a new variable importance measure 使用随机森林的遗传网络推理:使用一种新的变量重要性度量来提高性能

IF 0.3

Chem-Bio Informatics Journal Pub Date : 2021-07-26 DOI: 10.21203/rs.3.rs-737867/v1

Shuhei Kimura, Yahiro Takeda, M. Tokuhisa, Mariko Okada

{"title":"Inference of genetic networks using random forests: performance improvement using a new variable importance measure","authors":"Shuhei Kimura, Yahiro Takeda, M. Tokuhisa, Mariko Okada","doi":"10.21203/rs.3.rs-737867/v1","DOIUrl":"https://doi.org/10.21203/rs.3.rs-737867/v1","url":null,"abstract":"\u0000 Background: Among the various methods so far proposed for genetic network inference, this study focuses on the random-forest-based methods. Confidence values are assigned to all of the candidate regulations when taking the random-forest-based approach. To our knowledge, all of the random-forest-based methods make the assignments using the standard variable importance measure defined in tree-based machine learning techniques. We think however that this measure has drawbacks in the inference of genetic networks. Results: In this study we therefore propose an alternative measure, what we call ``the random-input variable importance measure,'' and design a new inference method that uses the proposed measure in place of the standard measure in the existing random-forest-based inference method. We show, through numerical experiments, that the use of the random-input variable importance measure improves the performance of the existing random-forest-based inference method by as much as 45.5% with respect to the area under the recall-precision curve (AURPC). Conclusion: This study proposed the random-input variable importance measure for the inference of genetic networks. The use of our measure improved the performance of the random-forest-based inference method. In this study, we checked the performance of the proposed measure only on several genetic network inference problems. However, the experimental results suggest that the proposed measure will work well in other applications of random forests.","PeriodicalId":40659,"journal":{"name":"Chem-Bio Informatics Journal","volume":"23 1","pages":""},"PeriodicalIF":0.3,"publicationDate":"2021-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85828545","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Web Server with a Simple Interface for Coarse-grained Molecular Dynamics of DNA Nanostructures 带有简单接口的Web服务器，用于DNA纳米结构的粗粒度分子动力学

IF 0.3

Chem-Bio Informatics Journal Pub Date : 2021-04-30 DOI: 10.1273/CBIJ.21.28

Yudai Yamashita, Kotaro Watanabe, S. Murata, I. Kawamata

{"title":"Web Server with a Simple Interface for Coarse-grained Molecular Dynamics of DNA Nanostructures","authors":"Yudai Yamashita, Kotaro Watanabe, S. Murata, I. Kawamata","doi":"10.1273/CBIJ.21.28","DOIUrl":"https://doi.org/10.1273/CBIJ.21.28","url":null,"abstract":"We introduce an automated procedure of coarse-grained molecular dynamic simulation for DNA nanostructure that has great potential for realizing molecular robotics. As DNA origami is now a standardized technology to fabricate DNA nanostructures with high precision, various computer-aided design software has been developed. For example, a design tool called caDNAno with a simple and intuitive interface is widely used for designing DNA origami structures. Further, a simulation tool called oxDNA is used to predict the behavior of such nanostructures based on coarse-grained molecular dynamics. These tools, however, are not linked directly; thus, repeating the cycle of design and simulation is cumbersome to the user. Moreover, the computer skills required to setup, launch, and run an oxDNA simulation are a potential barrier for non-experts. In our proposal, oxDNA simulation can be launched on a web server simply by providing a caDNAno file; the web server then analyzes the simulation results and provides a visual response. The validity of the proposal is demonstrated using an example. The advantages of our proposed method compared with other conventional methods are also described. This simple-to-use interface for user-friendly simulation of DNA origami eliminates stress to users and accelerates the design process of complicated DNA nanostructures such as wireframe architecture.","PeriodicalId":40659,"journal":{"name":"Chem-Bio Informatics Journal","volume":"47 1","pages":""},"PeriodicalIF":0.3,"publicationDate":"2021-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81289172","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

In Silico Discovery of Natural Products Against Dengue RNA-Dependent RNA Polymerase Drug Target 针对登革热RNA依赖RNA聚合酶药物靶点的天然产物的计算机发现

IF 0.3

Chem-Bio Informatics Journal Pub Date : 2021-03-15 DOI: 10.1273/CBIJ.21.11

J. Billones, N. A. B. Clavio

引用次数: 0

Combining self-organizing maps and hierarchical clustering for protein–ligand interaction analysis in post-fragment molecular orbital calculation 结合自组织图谱和层次聚类分析片段后分子轨道计算中的蛋白质-配体相互作用

IF 0.3

Chem-Bio Informatics Journal Pub Date : 2021-01-29 DOI: 10.1273/CBIJ.21.1

Y. Kawashima, Natsumi Mori, N. Kawashita, Yu-Shi Tian, T. Takagi

{"title":"Combining self-organizing maps and hierarchical clustering for protein–ligand interaction analysis in post-fragment molecular orbital calculation","authors":"Y. Kawashima, Natsumi Mori, N. Kawashita, Yu-Shi Tian, T. Takagi","doi":"10.1273/CBIJ.21.1","DOIUrl":"https://doi.org/10.1273/CBIJ.21.1","url":null,"abstract":"Fragment molecular orbital (FMO) calculation is a useful ab initio method for analyzing protein–ligand interactions in the current structure-based drug design. When multiple ligands exist for one receptor, a post-FMO calculation tool is required because of large numbers of interaction energy decomposition terms calculated using this method. In this study, a method that combines self-organizing maps (SOM) and hierarchical clustering analysis (HCA) was proposed to analyze the results of the FMO energy components. This method could effectively compress the high-dimensional energy terms and is expected to be useful to analyze the interaction between protein and ligands. A case study of antitype 2 diabetes mellitus target DPP-IV and its inhibitors was analyzed to verify the feasibility of the proposed method. After performing dimensional compression using SOM and further grouping using HCA, we obtained superclasses of the inhibitors based on the dispersion energy (DI), which showed consistency with structural information, indicating that further analyses of detailed energies per superclass can be an effective approach for obtaining important ligand–protein interactions.","PeriodicalId":40659,"journal":{"name":"Chem-Bio Informatics Journal","volume":"105 1","pages":""},"PeriodicalIF":0.3,"publicationDate":"2021-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80866109","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Estimation of relationships between chemical substructures and antibiotic resistance-related gene expression in bacteria: Adapting a canonical correlation analysis for small sample data of gathered features using consensus clustering 估计细菌中化学亚结构与抗生素耐药性相关基因表达之间的关系:使用共识聚类对收集到的特征的小样本数据进行典型相关分析

IF 0.3

Chem-Bio Informatics Journal Pub Date : 2020-09-30 DOI: 10.1273/CBIJ.20.58

Tsuyoshi Esaki, Takaaki Horinouchi, Yayoi Natsume-Kitatani, Yosui Nojima, I. Sakane, H. Matsui

引用次数: 0

Skin sensitizer classification using dual-input machine learning model 基于双输入机器学习模型的皮肤致敏剂分类

IF 0.3

Chem-Bio Informatics Journal Pub Date : 2020-09-11 DOI: 10.1273/cbij.20.54

K. Matsumura

引用次数: 1

A distribution-dependent analysis of open-field test movies 露天测试影片的分布相关分析

IF 0.3

Chem-Bio Informatics Journal Pub Date : 2020-08-31 DOI: 10.1273/cbij.20.44

T. Konishi, Haruna Ohrui

{"title":"A distribution-dependent analysis of open-field test movies","authors":"T. Konishi, Haruna Ohrui","doi":"10.1273/cbij.20.44","DOIUrl":"https://doi.org/10.1273/cbij.20.44","url":null,"abstract":"Although the open-field test has been widely used, its reliability and compatibility are frequently questioned. Many indicating parameters were introduced for this test; however, they did not take data distributions into consideration. This oversight may have caused the problems mentioned above. Here, an exploratory approach for the analysis of video records of tests of elderly mice was taken that described the distributions using the least number of parameters. The locomotor activity of the animals was separated into two clusters: dash and search. The accelerations found in each of the clusters were distributed normally. The speed and the duration of the clusters exhibited an exponential distribution. Although the exponential model includes a single parameter, an additional parameter that indicated instability of the behaviour was required in many cases for fitting to the data. As this instability parameter exhibited an inverse correlation with speed, the function of the brain that maintained stability would be required for a better performance. According to the distributions, the travel distance, which has been regarded as an important indicator, was not a robust estimator of the animals’ condition.","PeriodicalId":40659,"journal":{"name":"Chem-Bio Informatics Journal","volume":"145 1","pages":""},"PeriodicalIF":0.3,"publicationDate":"2020-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73685542","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1