{"title":"Multiview Random Vector Functional Link Network for Predicting DNA-Binding Proteins","authors":"A. Quadir, M. Sajid, M. Tanveer","doi":"arxiv-2409.02588","DOIUrl":null,"url":null,"abstract":"The identification of DNA-binding proteins (DBPs) is a critical task due to\ntheir significant impact on various biological activities. Understanding the\nmechanisms underlying protein-DNA interactions is essential for elucidating\nvarious life activities. In recent years, machine learning-based models have\nbeen prominently utilized for DBP prediction. In this paper, to predict DBPs,\nwe propose a novel framework termed a multiview random vector functional link\n(MvRVFL) network, which fuses neural network architecture with multiview\nlearning. The proposed MvRVFL model combines the benefits of late and early\nfusion, allowing for distinct regularization parameters across different views\nwhile leveraging a closed-form solution to determine unknown parameters\nefficiently. The primal objective function incorporates a coupling term aimed\nat minimizing a composite of errors stemming from all views. From each of the\nthree protein views of the DBP datasets, we extract five features. These\nfeatures are then fused together by incorporating a hidden feature during the\nmodel training process. The performance of the proposed MvRVFL model on the DBP\ndataset surpasses that of baseline models, demonstrating its superior\neffectiveness. Furthermore, we extend our assessment to the UCI, KEEL, AwA, and\nCorel5k datasets, to establish the practicality of the proposed models. The\nconsistency error bound, the generalization error bound, and empirical\nfindings, coupled with rigorous statistical analyses, confirm the superior\ngeneralization capabilities of the MvRVFL model compared to the baseline\nmodels.","PeriodicalId":501022,"journal":{"name":"arXiv - QuanBio - Biomolecules","volume":"19 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - QuanBio - Biomolecules","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.02588","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The identification of DNA-binding proteins (DBPs) is a critical task due to
their significant impact on various biological activities. Understanding the
mechanisms underlying protein-DNA interactions is essential for elucidating
various life activities. In recent years, machine learning-based models have
been prominently utilized for DBP prediction. In this paper, to predict DBPs,
we propose a novel framework termed a multiview random vector functional link
(MvRVFL) network, which fuses neural network architecture with multiview
learning. The proposed MvRVFL model combines the benefits of late and early
fusion, allowing for distinct regularization parameters across different views
while leveraging a closed-form solution to determine unknown parameters
efficiently. The primal objective function incorporates a coupling term aimed
at minimizing a composite of errors stemming from all views. From each of the
three protein views of the DBP datasets, we extract five features. These
features are then fused together by incorporating a hidden feature during the
model training process. The performance of the proposed MvRVFL model on the DBP
dataset surpasses that of baseline models, demonstrating its superior
effectiveness. Furthermore, we extend our assessment to the UCI, KEEL, AwA, and
Corel5k datasets, to establish the practicality of the proposed models. The
consistency error bound, the generalization error bound, and empirical
findings, coupled with rigorous statistical analyses, confirm the superior
generalization capabilities of the MvRVFL model compared to the baseline
models.