W2V-repeated index: Prediction of enhancers and their strength based on repeated fragments

IF 3.4 2区 生物学 Q2 BIOTECHNOLOGY & APPLIED MICROBIOLOGY
Weiming Xie , Zhaomin Yao , Yizhe Yuan , Jingwei Too , Fei Li , Hongyu Wang , Ying Zhan , Xiaodan Wu , Zhiguo Wang , Guoxu Zhang
{"title":"W2V-repeated index: Prediction of enhancers and their strength based on repeated fragments","authors":"Weiming Xie ,&nbsp;Zhaomin Yao ,&nbsp;Yizhe Yuan ,&nbsp;Jingwei Too ,&nbsp;Fei Li ,&nbsp;Hongyu Wang ,&nbsp;Ying Zhan ,&nbsp;Xiaodan Wu ,&nbsp;Zhiguo Wang ,&nbsp;Guoxu Zhang","doi":"10.1016/j.ygeno.2024.110906","DOIUrl":null,"url":null,"abstract":"<div><p>Enhancers are crucial in gene expression regulation, dictating the specificity and timing of transcriptional activity, which highlights the importance of their identification for unravelling the intricacies of genetic regulation. Therefore, it is critical to identify enhancers and their strengths. Repeated sequences in the genome are repeats of the same or symmetrical fragments. There has been a great deal of evidence that repetitive sequences contain enormous amounts of genetic information. Thus, We introduce the W2V-Repeated Index, designed to identify enhancer sequence fragments and evaluates their strength through the analysis of repeated K-mer sequences in enhancer regions. Utilizing the word2vector algorithm for numerical conversion and Manta Ray Foraging Optimization for feature selection, this method effectively captures the frequency and distribution of K-mer sequences. By concentrating on repeated K-mer sequences, it minimizes computational complexity and facilitates the analysis of larger K values. Experiments indicate that our method performs better than all other advanced methods on almost all indicators.</p></div>","PeriodicalId":12521,"journal":{"name":"Genomics","volume":"116 5","pages":"Article 110906"},"PeriodicalIF":3.4000,"publicationDate":"2024-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0888754324001277/pdfft?md5=0d0ea4c1427e7c0c571c9c7409b124e9&pid=1-s2.0-S0888754324001277-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Genomics","FirstCategoryId":"99","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0888754324001277","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOTECHNOLOGY & APPLIED MICROBIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Enhancers are crucial in gene expression regulation, dictating the specificity and timing of transcriptional activity, which highlights the importance of their identification for unravelling the intricacies of genetic regulation. Therefore, it is critical to identify enhancers and their strengths. Repeated sequences in the genome are repeats of the same or symmetrical fragments. There has been a great deal of evidence that repetitive sequences contain enormous amounts of genetic information. Thus, We introduce the W2V-Repeated Index, designed to identify enhancer sequence fragments and evaluates their strength through the analysis of repeated K-mer sequences in enhancer regions. Utilizing the word2vector algorithm for numerical conversion and Manta Ray Foraging Optimization for feature selection, this method effectively captures the frequency and distribution of K-mer sequences. By concentrating on repeated K-mer sequences, it minimizes computational complexity and facilitates the analysis of larger K values. Experiments indicate that our method performs better than all other advanced methods on almost all indicators.

W2V 重复指数:基于重复片段预测增强子及其强度
增强子在基因表达调控中至关重要,它决定了转录活动的特异性和时间,这凸显了识别增强子对于揭示错综复杂的基因调控的重要性。因此,识别增强子及其强度至关重要。基因组中的重复序列是相同或对称片段的重复。已有大量证据表明,重复序列包含大量遗传信息。因此,我们引入了 W2V-Repeated Index,旨在通过分析增强子区域中重复的 K-mer 序列来识别增强子序列片段并评估其强度。该方法利用 word2vector 算法进行数值转换,并利用 Manta Ray Foraging Optimization 进行特征选择,从而有效捕捉 K-mer 序列的频率和分布。通过集中分析重复的 K-mer 序列,该方法最大程度地降低了计算复杂度,便于分析较大的 K 值。实验表明,在几乎所有指标上,我们的方法都优于所有其他先进方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Genomics
Genomics 生物-生物工程与应用微生物
CiteScore
9.60
自引率
2.30%
发文量
260
审稿时长
60 days
期刊介绍: Genomics is a forum for describing the development of genome-scale technologies and their application to all areas of biological investigation. As a journal that has evolved with the field that carries its name, Genomics focuses on the development and application of cutting-edge methods, addressing fundamental questions with potential interest to a wide audience. Our aim is to publish the highest quality research and to provide authors with rapid, fair and accurate review and publication of manuscripts falling within our scope.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信