Hiroshi SAKIYAMA, Ryushi MOTOKI, Takashi OKUNO, Jian-Qiang LIU
{"title":"余弦相似度对血脑屏障渗透率预测的改进","authors":"Hiroshi SAKIYAMA, Ryushi MOTOKI, Takashi OKUNO, Jian-Qiang LIU","doi":"10.2477/jccjie.2023-0017","DOIUrl":null,"url":null,"abstract":"Prediction of blood-brain barrier permeability for chemicals is one of the key issues in brain drug development. In this study, the effect of using training data relatively similar to the test data was investigated in order to improve the performance of machine learning methods in predicting blood-brain barrier permeability. The results showed that selecting training data with high cosine similarity to the test data improved prediction performance with a smaller number of training data. The best model in this study also showed improved scores on two external test sets to examine generalization performance, outperforming excellent existing models. The cosine similarity method is expected to be effective for predicting the properties of compounds with large diversity and a small number of data.","PeriodicalId":41909,"journal":{"name":"Journal of Computer Chemistry-Japan","volume":"72 1","pages":"0"},"PeriodicalIF":0.1000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Improvement of Blood-Brain Barrier Permeability Prediction Using Cosine Similarity\",\"authors\":\"Hiroshi SAKIYAMA, Ryushi MOTOKI, Takashi OKUNO, Jian-Qiang LIU\",\"doi\":\"10.2477/jccjie.2023-0017\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Prediction of blood-brain barrier permeability for chemicals is one of the key issues in brain drug development. In this study, the effect of using training data relatively similar to the test data was investigated in order to improve the performance of machine learning methods in predicting blood-brain barrier permeability. The results showed that selecting training data with high cosine similarity to the test data improved prediction performance with a smaller number of training data. The best model in this study also showed improved scores on two external test sets to examine generalization performance, outperforming excellent existing models. The cosine similarity method is expected to be effective for predicting the properties of compounds with large diversity and a small number of data.\",\"PeriodicalId\":41909,\"journal\":{\"name\":\"Journal of Computer Chemistry-Japan\",\"volume\":\"72 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.1000,\"publicationDate\":\"2023-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Computer Chemistry-Japan\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.2477/jccjie.2023-0017\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"CHEMISTRY, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Computer Chemistry-Japan","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2477/jccjie.2023-0017","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}
Improvement of Blood-Brain Barrier Permeability Prediction Using Cosine Similarity
Prediction of blood-brain barrier permeability for chemicals is one of the key issues in brain drug development. In this study, the effect of using training data relatively similar to the test data was investigated in order to improve the performance of machine learning methods in predicting blood-brain barrier permeability. The results showed that selecting training data with high cosine similarity to the test data improved prediction performance with a smaller number of training data. The best model in this study also showed improved scores on two external test sets to examine generalization performance, outperforming excellent existing models. The cosine similarity method is expected to be effective for predicting the properties of compounds with large diversity and a small number of data.