Elliptic geometry-based kernel matrix for improved biological sequence classification

IF 7.2 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
{"title":"Elliptic geometry-based kernel matrix for improved biological sequence classification","authors":"","doi":"10.1016/j.knosys.2024.112479","DOIUrl":null,"url":null,"abstract":"<div><p>Protein sequence classification plays a pivotal role in bioinformatics as it enables the comprehension of protein functions and their involvement in diverse biological processes. While numerous machine learning models have been proposed to tackle this challenge, traditional approaches face limitations in capturing the intricate relationships and hierarchical structures inherent in genomic sequences. These limitations stem from operating within high-dimensional non-Euclidean spaces. To address this issue, we introduce the application of the elliptic geometry-based approach for protein sequence classification. First, we transform the problem in elliptic geometry and integrate it with the Gaussian kernel to map the problem into the Mercer kernel. The Gaussian-Elliptic approach allows for the implicit mapping of data into a higher-dimensional feature space, enabling the capture of complex nonlinear relationships. This feature becomes particularly advantageous when dealing with hierarchical or tree-like structures commonly encountered in biological sequences. Experimental results highlight the effectiveness of the proposed model in protein sequence classification, showcasing the advantages of utilizing elliptic geometry in bioinformatics analyses. It outperforms state-of-the-art methods by achieving 76% and 84% accuracies for DNA and Protein datasets, respectively. Furthermore, we provide theoretical justifications for the proposed model. This study contributes to the burgeoning field of geometric deep learning, offering insights into the potential applications of elliptic representations in the analysis of biological data.</p></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":null,"pages":null},"PeriodicalIF":7.2000,"publicationDate":"2024-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Knowledge-Based Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0950705124011134","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Protein sequence classification plays a pivotal role in bioinformatics as it enables the comprehension of protein functions and their involvement in diverse biological processes. While numerous machine learning models have been proposed to tackle this challenge, traditional approaches face limitations in capturing the intricate relationships and hierarchical structures inherent in genomic sequences. These limitations stem from operating within high-dimensional non-Euclidean spaces. To address this issue, we introduce the application of the elliptic geometry-based approach for protein sequence classification. First, we transform the problem in elliptic geometry and integrate it with the Gaussian kernel to map the problem into the Mercer kernel. The Gaussian-Elliptic approach allows for the implicit mapping of data into a higher-dimensional feature space, enabling the capture of complex nonlinear relationships. This feature becomes particularly advantageous when dealing with hierarchical or tree-like structures commonly encountered in biological sequences. Experimental results highlight the effectiveness of the proposed model in protein sequence classification, showcasing the advantages of utilizing elliptic geometry in bioinformatics analyses. It outperforms state-of-the-art methods by achieving 76% and 84% accuracies for DNA and Protein datasets, respectively. Furthermore, we provide theoretical justifications for the proposed model. This study contributes to the burgeoning field of geometric deep learning, offering insights into the potential applications of elliptic representations in the analysis of biological data.

基于椭圆几何的核矩阵用于改进生物序列分类
蛋白质序列分类在生物信息学中起着举足轻重的作用,因为它有助于理解蛋白质的功能及其在各种生物过程中的参与。虽然已经提出了许多机器学习模型来应对这一挑战,但传统方法在捕捉基因组序列中固有的复杂关系和层次结构方面存在局限性。这些限制源于在高维非欧几里得空间中的操作。为了解决这个问题,我们介绍了基于椭圆几何的蛋白质序列分类方法的应用。首先,我们在椭圆几何中转换问题,并将其与高斯核整合,从而将问题映射到美世核中。高斯椭圆方法允许将数据隐式映射到更高维度的特征空间,从而捕捉复杂的非线性关系。在处理生物序列中常见的分层结构或树状结构时,这一特点尤为有利。实验结果凸显了所提模型在蛋白质序列分类中的有效性,展示了在生物信息学分析中利用椭圆几何的优势。该模型在 DNA 和蛋白质数据集上的准确率分别达到了 76% 和 84%,优于最先进的方法。此外,我们还为所提出的模型提供了理论依据。这项研究为正在蓬勃发展的几何深度学习领域做出了贡献,为椭圆表示法在生物数据分析中的潜在应用提供了见解。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Knowledge-Based Systems
Knowledge-Based Systems 工程技术-计算机:人工智能
CiteScore
14.80
自引率
12.50%
发文量
1245
审稿时长
7.8 months
期刊介绍: Knowledge-Based Systems, an international and interdisciplinary journal in artificial intelligence, publishes original, innovative, and creative research results in the field. It focuses on knowledge-based and other artificial intelligence techniques-based systems. The journal aims to support human prediction and decision-making through data science and computation techniques, provide a balanced coverage of theory and practical study, and encourage the development and implementation of knowledge-based intelligence models, methods, systems, and software tools. Applications in business, government, education, engineering, and healthcare are emphasized.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信