A learning search algorithm for the Restricted Longest Common Subsequence problem

IF 7.5 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Expert Systems with Applications Pub Date : 2025-04-30 DOI:10.1016/j.eswa.2025.127731

Marko Djukanović , Jaume Reixach , Ana Nikolikj , Tome Eftimov , Aleksandar Kartelj , Christian Blum

{"title":"A learning search algorithm for the Restricted Longest Common Subsequence problem","authors":"Marko Djukanović , Jaume Reixach , Ana Nikolikj , Tome Eftimov , Aleksandar Kartelj , Christian Blum","doi":"10.1016/j.eswa.2025.127731","DOIUrl":null,"url":null,"abstract":"<div><div>This paper addresses the Restricted Longest Common Subsequence (RLCS) problem, an extension of the well-known Longest Common Subsequence (LCS) problem. This problem has significant applications in bioinformatics, particularly for identifying similarities and discovering mutual patterns and important motifs among DNA, RNA, and protein sequences. Building on recent advancements in solving this problem through a general search framework, this paper introduces two novel heuristic approaches designed to enhance the search process by steering it towards promising regions in the search space. The first heuristic employs a probabilistic model to evaluate partial solutions during the search process. The second heuristic is based on a neural network model trained offline using a genetic algorithm. A key aspect of this approach is extracting problem-specific features of partial solutions and the complete problem instance. An effective hybrid method, referred to as the learning beam search, is developed by combining the trained neural network model with a beam search framework. An important contribution of this paper is found in the generation of real-world instances where scientific abstracts serve as input strings, and a set of frequently occurring academic words from the literature are used as restricted patterns. Comprehensive experimental evaluations demonstrate the effectiveness of the proposed approaches in solving the RLCS problem. Finally, an empirical explainability analysis is applied to the obtained results. In this way, key feature combinations and their respective contributions to the success or failure of the algorithms across different problem types are identified.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"284 ","pages":"Article 127731"},"PeriodicalIF":7.5000,"publicationDate":"2025-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Expert Systems with Applications","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0957417425013533","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

This paper addresses the Restricted Longest Common Subsequence (RLCS) problem, an extension of the well-known Longest Common Subsequence (LCS) problem. This problem has significant applications in bioinformatics, particularly for identifying similarities and discovering mutual patterns and important motifs among DNA, RNA, and protein sequences. Building on recent advancements in solving this problem through a general search framework, this paper introduces two novel heuristic approaches designed to enhance the search process by steering it towards promising regions in the search space. The first heuristic employs a probabilistic model to evaluate partial solutions during the search process. The second heuristic is based on a neural network model trained offline using a genetic algorithm. A key aspect of this approach is extracting problem-specific features of partial solutions and the complete problem instance. An effective hybrid method, referred to as the learning beam search, is developed by combining the trained neural network model with a beam search framework. An important contribution of this paper is found in the generation of real-world instances where scientific abstracts serve as input strings, and a set of frequently occurring academic words from the literature are used as restricted patterns. Comprehensive experimental evaluations demonstrate the effectiveness of the proposed approaches in solving the RLCS problem. Finally, an empirical explainability analysis is applied to the obtained results. In this way, key feature combinations and their respective contributions to the success or failure of the algorithms across different problem types are identified.

查看原文本刊更多论文

受限最长公共子序列问题的学习搜索算法

本文研究了限制最长公共子序列（RLCS）问题，这是众所周知的最长公共子序列（LCS）问题的扩展。这个问题在生物信息学中有重要的应用，特别是在识别DNA、RNA和蛋白质序列之间的相似性和发现相互模式和重要基序方面。在通过一般搜索框架解决该问题的最新进展的基础上，本文介绍了两种新的启发式方法，旨在通过将其引导到搜索空间中的有前途的区域来增强搜索过程。第一种启发式算法在搜索过程中使用概率模型来评估部分解。第二种启发式方法是基于使用遗传算法离线训练的神经网络模型。该方法的一个关键方面是提取部分解决方案和完整问题实例的特定于问题的特征。将训练好的神经网络模型与束搜索框架相结合，提出了一种有效的混合方法——学习束搜索。本文的一个重要贡献在于生成了真实世界的实例，其中科学摘要作为输入字符串，并且使用了一组来自文献中频繁出现的学术词汇作为限制模式。综合实验评估证明了所提出的方法在解决RLCS问题方面的有效性。最后，对所得结果进行实证可解释性分析。通过这种方式，可以确定关键特征组合及其各自对不同问题类型的算法成功或失败的贡献。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Expert Systems with Applications 工程技术-工程：电子与电气

CiteScore

13.80

自引率

10.60%

发文量

2045

审稿时长

8.7 months

期刊介绍： Expert Systems With Applications is an international journal dedicated to the exchange of information on expert and intelligent systems used globally in industry, government, and universities. The journal emphasizes original papers covering the design, development, testing, implementation, and management of these systems, offering practical guidelines. It spans various sectors such as finance, engineering, marketing, law, project management, information management, medicine, and more. The journal also welcomes papers on multi-agent systems, knowledge management, neural networks, knowledge discovery, data mining, and other related areas, excluding applications to military/defense systems.