Division of dataset into training and validation subsets by the jackknife validations to predict the pH optimum for beta-cellobiosidase

Shaomin Yan, Guang Wu
{"title":"Division of dataset into training and validation subsets by the jackknife validations to predict the pH optimum for beta-cellobiosidase","authors":"Shaomin Yan, Guang Wu","doi":"10.1109/aemcse55572.2022.00136","DOIUrl":null,"url":null,"abstract":"In modeling, it is generally to divide the dataset into training and validation sub-datasets. Although it appears simple, how to divide the dataset is still somewhat debatable. Of various methods to make the division, the jackknife method is very popular and advocated by professor Kuo-Chen Chou. However, the jackknife method is in fact mainly referenced to the delete-1 jackknife validation because the rest jackknife methods are extremely time-consuming and computationally intensive. In this study, we use the jackknife validations from delete-1 to delete- n+2 to develop a neural network model for the optimization of pH in an enzymatic reaction of beta-cellobiosidase, which gets more and more attention from biofeul industries, but has a small number of documented operational parameters. The best neural network model and the best predictor were elaborated from 31 candidates of neural network with different layers and neurons, and 11 predictors related to the amino acid primary structure. The jackknife validation was performed from delete-1 to delete- 18. The results show that the [6], [1] model provides the best performance among two-layer models, and that multi-layer models perform better than the two-layer model. The delete-6 jackknife strategy has the best performance, which suggests the division of dataset at the ratio of one third.","PeriodicalId":309096,"journal":{"name":"2022 5th International Conference on Advanced Electronic Materials, Computers and Software Engineering (AEMCSE)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 5th International Conference on Advanced Electronic Materials, Computers and Software Engineering (AEMCSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/aemcse55572.2022.00136","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

In modeling, it is generally to divide the dataset into training and validation sub-datasets. Although it appears simple, how to divide the dataset is still somewhat debatable. Of various methods to make the division, the jackknife method is very popular and advocated by professor Kuo-Chen Chou. However, the jackknife method is in fact mainly referenced to the delete-1 jackknife validation because the rest jackknife methods are extremely time-consuming and computationally intensive. In this study, we use the jackknife validations from delete-1 to delete- n+2 to develop a neural network model for the optimization of pH in an enzymatic reaction of beta-cellobiosidase, which gets more and more attention from biofeul industries, but has a small number of documented operational parameters. The best neural network model and the best predictor were elaborated from 31 candidates of neural network with different layers and neurons, and 11 predictors related to the amino acid primary structure. The jackknife validation was performed from delete-1 to delete- 18. The results show that the [6], [1] model provides the best performance among two-layer models, and that multi-layer models perform better than the two-layer model. The delete-6 jackknife strategy has the best performance, which suggests the division of dataset at the ratio of one third.
将数据集划分为训练子集和验证子集,通过刀切验证来预测β -纤维素生物苷酶的最佳pH值
在建模中,通常将数据集分为训练子数据集和验证子数据集。尽管看起来很简单,但如何划分数据集仍然存在一些争议。在各种除法中,迭刀法最为流行,是周国臣教授所提倡的。然而,实际上,叠刀方法主要是参考delete-1的叠刀验证,因为其他的叠刀方法都是非常耗时和计算密集的。在这项研究中,我们利用从delete-1到delete- n+2的刀切验证,建立了一个神经网络模型,用于优化β -纤维素生物苷酶的酶反应的pH值,该模型越来越受到生物工业的关注,但记录的操作参数很少。从31个具有不同层数和神经元的候选神经网络和11个与氨基酸一级结构相关的预测因子中,阐述了最佳神经网络模型和最佳预测因子。从delete-1到delete- 18进行刀切验证。结果表明,[6]、[1]模型在两层模型中性能最好,且多层模型的性能优于两层模型。delete-6 jackknife策略的性能最好,建议按1 / 3的比例划分数据集。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信