Learned Indexing in Proteins: Substituting Complex Distance Calculations with Embedding and Clustering Techniques

Jaroslav Olha, Terézia Slanináková, Martin Gendiar, Matej Antol, Vlastislav Dohnal
{"title":"Learned Indexing in Proteins: Substituting Complex Distance Calculations with Embedding and Clustering Techniques","authors":"Jaroslav Olha, Terézia Slanináková, Martin Gendiar, Matej Antol, Vlastislav Dohnal","doi":"10.48550/arXiv.2208.08910","DOIUrl":null,"url":null,"abstract":". Despite the constant evolution of similarity searching research, it continues to face the same challenges stemming from the complexity of the data, such as the curse of dimensionality and computationally expensive distance functions. Various machine learning techniques have proven capable of replacing elaborate mathematical models with combinations of simple linear functions, often gaining speed and sim-plicity at the cost of formal guarantees of accuracy and correctness of querying.Theauthors explore the potential of this research trend by presenting a lightweight solution for the complex problem of 3D protein structure search. The solution consists of three steps – (i) transformation of 3D protein structural information into very compact vectors, (ii) use of a probabilistic model to group these vectors and respond to queries by returning a given number of similar objects, and (iii) a final filtering step which applies basic vector distance functions to refine the result.","PeriodicalId":90051,"journal":{"name":"Similarity search and applications : proceedings of the ... International Conference on Similarity Search and Applications","volume":"19 1","pages":"274-282"},"PeriodicalIF":0.0000,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Similarity search and applications : proceedings of the ... International Conference on Similarity Search and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2208.08910","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

. Despite the constant evolution of similarity searching research, it continues to face the same challenges stemming from the complexity of the data, such as the curse of dimensionality and computationally expensive distance functions. Various machine learning techniques have proven capable of replacing elaborate mathematical models with combinations of simple linear functions, often gaining speed and sim-plicity at the cost of formal guarantees of accuracy and correctness of querying.Theauthors explore the potential of this research trend by presenting a lightweight solution for the complex problem of 3D protein structure search. The solution consists of three steps – (i) transformation of 3D protein structural information into very compact vectors, (ii) use of a probabilistic model to group these vectors and respond to queries by returning a given number of similar objects, and (iii) a final filtering step which applies basic vector distance functions to refine the result.
蛋白质的学习索引:用嵌入和聚类技术代替复杂的距离计算
. 尽管相似搜索研究不断发展,但它仍然面临着数据复杂性带来的挑战,例如维数诅咒和计算代价昂贵的距离函数。各种机器学习技术已经被证明能够用简单线性函数的组合取代复杂的数学模型,通常以牺牲查询的准确性和正确性的形式保证为代价获得速度和简洁性。作者通过提出3D蛋白质结构搜索复杂问题的轻量级解决方案来探索这一研究趋势的潜力。该解决方案包括三个步骤- (i)将3D蛋白质结构信息转换为非常紧凑的向量,(ii)使用概率模型对这些向量进行分组,并通过返回给定数量的相似对象来响应查询,以及(iii)最后的过滤步骤,该步骤应用基本向量距离函数来优化结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信