W2V-repeated index: Prediction of enhancers and their strength based on repeated fragments

IF 3.4 2区生物学 Q2 BIOTECHNOLOGY & APPLIED MICROBIOLOGY

Genomics Pub Date : 2024-07-29 DOI:10.1016/j.ygeno.2024.110906

Weiming Xie , Zhaomin Yao , Yizhe Yuan , Jingwei Too , Fei Li , Hongyu Wang , Ying Zhan , Xiaodan Wu , Zhiguo Wang , Guoxu Zhang

{"title":"W2V-repeated index: Prediction of enhancers and their strength based on repeated fragments","authors":"Weiming Xie , Zhaomin Yao , Yizhe Yuan , Jingwei Too , Fei Li , Hongyu Wang , Ying Zhan , Xiaodan Wu , Zhiguo Wang , Guoxu Zhang","doi":"10.1016/j.ygeno.2024.110906","DOIUrl":null,"url":null,"abstract":"<div><p>Enhancers are crucial in gene expression regulation, dictating the specificity and timing of transcriptional activity, which highlights the importance of their identification for unravelling the intricacies of genetic regulation. Therefore, it is critical to identify enhancers and their strengths. Repeated sequences in the genome are repeats of the same or symmetrical fragments. There has been a great deal of evidence that repetitive sequences contain enormous amounts of genetic information. Thus, We introduce the W2V-Repeated Index, designed to identify enhancer sequence fragments and evaluates their strength through the analysis of repeated K-mer sequences in enhancer regions. Utilizing the word2vector algorithm for numerical conversion and Manta Ray Foraging Optimization for feature selection, this method effectively captures the frequency and distribution of K-mer sequences. By concentrating on repeated K-mer sequences, it minimizes computational complexity and facilitates the analysis of larger K values. Experiments indicate that our method performs better than all other advanced methods on almost all indicators.</p></div>","PeriodicalId":12521,"journal":{"name":"Genomics","volume":"116 5","pages":"Article 110906"},"PeriodicalIF":3.4000,"publicationDate":"2024-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0888754324001277/pdfft?md5=0d0ea4c1427e7c0c571c9c7409b124e9&pid=1-s2.0-S0888754324001277-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Genomics","FirstCategoryId":"99","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0888754324001277","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOTECHNOLOGY & APPLIED MICROBIOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Enhancers are crucial in gene expression regulation, dictating the specificity and timing of transcriptional activity, which highlights the importance of their identification for unravelling the intricacies of genetic regulation. Therefore, it is critical to identify enhancers and their strengths. Repeated sequences in the genome are repeats of the same or symmetrical fragments. There has been a great deal of evidence that repetitive sequences contain enormous amounts of genetic information. Thus, We introduce the W2V-Repeated Index, designed to identify enhancer sequence fragments and evaluates their strength through the analysis of repeated K-mer sequences in enhancer regions. Utilizing the word2vector algorithm for numerical conversion and Manta Ray Foraging Optimization for feature selection, this method effectively captures the frequency and distribution of K-mer sequences. By concentrating on repeated K-mer sequences, it minimizes computational complexity and facilitates the analysis of larger K values. Experiments indicate that our method performs better than all other advanced methods on almost all indicators.

查看原文本刊更多论文

W2V 重复指数：基于重复片段预测增强子及其强度

增强子在基因表达调控中至关重要，它决定了转录活动的特异性和时间，这凸显了识别增强子对于揭示错综复杂的基因调控的重要性。因此，识别增强子及其强度至关重要。基因组中的重复序列是相同或对称片段的重复。已有大量证据表明，重复序列包含大量遗传信息。因此，我们引入了 W2V-Repeated Index，旨在通过分析增强子区域中重复的 K-mer 序列来识别增强子序列片段并评估其强度。该方法利用 word2vector 算法进行数值转换，并利用 Manta Ray Foraging Optimization 进行特征选择，从而有效捕捉 K-mer 序列的频率和分布。通过集中分析重复的 K-mer 序列，该方法最大程度地降低了计算复杂度，便于分析较大的 K 值。实验表明，在几乎所有指标上，我们的方法都优于所有其他先进方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Genomics 生物-生物工程与应用微生物

CiteScore

9.60

自引率

2.30%

发文量

260

审稿时长

60 days

期刊介绍： Genomics is a forum for describing the development of genome-scale technologies and their application to all areas of biological investigation. As a journal that has evolved with the field that carries its name, Genomics focuses on the development and application of cutting-edge methods, addressing fundamental questions with potential interest to a wide audience. Our aim is to publish the highest quality research and to provide authors with rapid, fair and accurate review and publication of manuscripts falling within our scope.