结合LASSO特征选择和软投票分类器识别复制位点起源。

IF 1.8 4区 生物学 Q4 BIOCHEMISTRY & MOLECULAR BIOLOGY
Yingying Yao, Shengli Zhang, Tian Xue
{"title":"结合LASSO特征选择和软投票分类器识别复制位点起源。","authors":"Yingying Yao,&nbsp;Shengli Zhang,&nbsp;Tian Xue","doi":"10.2174/1389202923666220214122506","DOIUrl":null,"url":null,"abstract":"<p><p><b><i>Background</i>:</b> DNA replication plays an indispensable role in the transmission of genetic information. It is considered to be the basis of biological inheritance and the most fundamental process in all biological life. Considering that DNA replication initiates with a special location, namely the origin of replication, a better and accurate prediction of the origins of replication sites (ORIs) is essential to gain insight into the relationship with gene expression. <b><i>Objective</i>:</b> In this study, we have developed an efficient predictor called iORI-LAVT for ORIs identification. <b><i>Methods</i>:</b> This work focuses on extracting feature information from three aspects, including mono-nucleotide encoding, <i>k</i>-mer and ring-function-hydrogen-chemical properties. Subsequently, least absolute shrinkage and selection operator (LASSO) as a feature selection is applied to select the optimal features. Comparing the different combined soft voting classifiers results, the soft voting classifier based on GaussianNB and Logistic Regression is employed as the final classifier. <b><i>Results</i>:</b> Based on 10-fold cross-validation test, the prediction accuracies of two benchmark datasets are 90.39% and 95.96%, respectively. As for the independent dataset, our method achieves high accuracy of 91.3%. <b><i>Conclusion</i>:</b> Compared with previous predictors, iORI-LAVT outperforms the existing methods. It is believed that iORI-LAVT predictor is a promising alternative for further research on identifying ORIs.</p>","PeriodicalId":10803,"journal":{"name":"Current Genomics","volume":"23 2","pages":"83-93"},"PeriodicalIF":1.8000,"publicationDate":"2022-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/2b/70/CG-23-83.PMC9878833.pdf","citationCount":"0","resultStr":"{\"title\":\"Integrating LASSO Feature Selection and Soft Voting Classifier to Identify Origins of Replication Sites.\",\"authors\":\"Yingying Yao,&nbsp;Shengli Zhang,&nbsp;Tian Xue\",\"doi\":\"10.2174/1389202923666220214122506\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p><b><i>Background</i>:</b> DNA replication plays an indispensable role in the transmission of genetic information. It is considered to be the basis of biological inheritance and the most fundamental process in all biological life. Considering that DNA replication initiates with a special location, namely the origin of replication, a better and accurate prediction of the origins of replication sites (ORIs) is essential to gain insight into the relationship with gene expression. <b><i>Objective</i>:</b> In this study, we have developed an efficient predictor called iORI-LAVT for ORIs identification. <b><i>Methods</i>:</b> This work focuses on extracting feature information from three aspects, including mono-nucleotide encoding, <i>k</i>-mer and ring-function-hydrogen-chemical properties. Subsequently, least absolute shrinkage and selection operator (LASSO) as a feature selection is applied to select the optimal features. Comparing the different combined soft voting classifiers results, the soft voting classifier based on GaussianNB and Logistic Regression is employed as the final classifier. <b><i>Results</i>:</b> Based on 10-fold cross-validation test, the prediction accuracies of two benchmark datasets are 90.39% and 95.96%, respectively. As for the independent dataset, our method achieves high accuracy of 91.3%. <b><i>Conclusion</i>:</b> Compared with previous predictors, iORI-LAVT outperforms the existing methods. It is believed that iORI-LAVT predictor is a promising alternative for further research on identifying ORIs.</p>\",\"PeriodicalId\":10803,\"journal\":{\"name\":\"Current Genomics\",\"volume\":\"23 2\",\"pages\":\"83-93\"},\"PeriodicalIF\":1.8000,\"publicationDate\":\"2022-06-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/2b/70/CG-23-83.PMC9878833.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Current Genomics\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.2174/1389202923666220214122506\",\"RegionNum\":4,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"BIOCHEMISTRY & MOLECULAR BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Current Genomics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.2174/1389202923666220214122506","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0

摘要

背景:DNA复制在遗传信息的传递中起着不可缺少的作用。它被认为是生物遗传的基础,是所有生物生命中最基本的过程。考虑到DNA复制是从一个特殊的位置开始的,即复制的起源,更好、准确地预测复制位点的起源(ORIs)对于深入了解其与基因表达的关系至关重要。目的:在本研究中,我们开发了一种称为iORI-LAVT的有效预测器来识别ORIs。方法:重点从单核苷酸编码、k-mer和环功能氢化学性质三个方面提取特征信息。然后,将最小绝对收缩和选择算子(LASSO)作为特征选择来选择最优特征。比较不同组合软投票分类器的分类结果,采用基于高斯annb和Logistic回归的软投票分类器作为最终分类器。结果:基于10倍交叉验证检验,两个基准数据集的预测准确率分别为90.39%和95.96%。对于独立数据集,我们的方法达到了91.3%的准确率。结论:与以往的预测方法相比,iORI-LAVT优于现有的预测方法。认为iORI-LAVT预测器是进一步研究识别ORIs的一个有希望的替代方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

Integrating LASSO Feature Selection and Soft Voting Classifier to Identify Origins of Replication Sites.

Integrating LASSO Feature Selection and Soft Voting Classifier to Identify Origins of Replication Sites.

Integrating LASSO Feature Selection and Soft Voting Classifier to Identify Origins of Replication Sites.

Integrating LASSO Feature Selection and Soft Voting Classifier to Identify Origins of Replication Sites.

Background: DNA replication plays an indispensable role in the transmission of genetic information. It is considered to be the basis of biological inheritance and the most fundamental process in all biological life. Considering that DNA replication initiates with a special location, namely the origin of replication, a better and accurate prediction of the origins of replication sites (ORIs) is essential to gain insight into the relationship with gene expression. Objective: In this study, we have developed an efficient predictor called iORI-LAVT for ORIs identification. Methods: This work focuses on extracting feature information from three aspects, including mono-nucleotide encoding, k-mer and ring-function-hydrogen-chemical properties. Subsequently, least absolute shrinkage and selection operator (LASSO) as a feature selection is applied to select the optimal features. Comparing the different combined soft voting classifiers results, the soft voting classifier based on GaussianNB and Logistic Regression is employed as the final classifier. Results: Based on 10-fold cross-validation test, the prediction accuracies of two benchmark datasets are 90.39% and 95.96%, respectively. As for the independent dataset, our method achieves high accuracy of 91.3%. Conclusion: Compared with previous predictors, iORI-LAVT outperforms the existing methods. It is believed that iORI-LAVT predictor is a promising alternative for further research on identifying ORIs.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Current Genomics
Current Genomics 生物-生化与分子生物学
CiteScore
5.20
自引率
0.00%
发文量
29
审稿时长
>0 weeks
期刊介绍: Current Genomics is a peer-reviewed journal that provides essential reading about the latest and most important developments in genome science and related fields of research. Systems biology, systems modeling, machine learning, network inference, bioinformatics, computational biology, epigenetics, single cell genomics, extracellular vesicles, quantitative biology, and synthetic biology for the study of evolution, development, maintenance, aging and that of human health, human diseases, clinical genomics and precision medicine are topics of particular interest. The journal covers plant genomics. The journal will not consider articles dealing with breeding and livestock. Current Genomics publishes three types of articles including: i) Research papers from internationally-recognized experts reporting on new and original data generated at the genome scale level. Position papers dealing with new or challenging methodological approaches, whether experimental or mathematical, are greatly welcome in this section. ii) Authoritative and comprehensive full-length or mini reviews from widely recognized experts, covering the latest developments in genome science and related fields of research such as systems biology, statistics and machine learning, quantitative biology, and precision medicine. Proposals for mini-hot topics (2-3 review papers) and full hot topics (6-8 review papers) guest edited by internationally-recognized experts are welcome in this section. Hot topic proposals should not contain original data and they should contain articles originating from at least 2 different countries. iii) Opinion papers from internationally recognized experts addressing contemporary questions and issues in the field of genome science and systems biology and basic and clinical research practices.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信