总结发散字符串序列及其在链信请愿中的应用

Annual Symposium on Combinatorial Pattern Matching Pub Date : 2020-04-01 DOI:10.4230/LIPIcs.CPM.2020.11

Patrick Commins, D. Liben-Nowell, Tina Liu, K. Tomlinson

{"title":"总结发散字符串序列及其在链信请愿中的应用","authors":"Patrick Commins, D. Liben-Nowell, Tina Liu, K. Tomlinson","doi":"10.4230/LIPIcs.CPM.2020.11","DOIUrl":null,"url":null,"abstract":"Algorithms to find optimal alignments among strings, or to find a parsimonious summary of a collection of strings, are well studied in a variety of contexts, addressing a wide range of interesting applications. In this paper, we consider chain letters, which contain a growing sequence of signatories added as the letter propagates. The unusual constellation of features exhibited by chain letters (one-ended growth, divergence, and mutation) make their propagation, and thus the corresponding reconstruction problem, both distinctive and rich. Here, inspired by these chain letters, we formally define the problem of computing an optimal summary of a set of diverging string sequences. From a collection of these sequences of names, with each sequence noisily corresponding to a branch of the unknown tree $T$ representing the letter's true dissemination, can we efficiently and accurately reconstruct a tree $T' \\approx T$? In this paper, we give efficient exact algorithms for this summarization problem when the number of sequences is small; for larger sets of sequences, we prove hardness and provide an efficient heuristic algorithm. We evaluate this heuristic on synthetic data sets chosen to emulate real chain letters, showing that our algorithm is competitive with or better than previous approaches, and that it also comes close to finding the true trees in these synthetic datasets.","PeriodicalId":236737,"journal":{"name":"Annual Symposium on Combinatorial Pattern Matching","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Summarizing Diverging String Sequences, with Applications to Chain-Letter Petitions\",\"authors\":\"Patrick Commins, D. Liben-Nowell, Tina Liu, K. Tomlinson\",\"doi\":\"10.4230/LIPIcs.CPM.2020.11\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Algorithms to find optimal alignments among strings, or to find a parsimonious summary of a collection of strings, are well studied in a variety of contexts, addressing a wide range of interesting applications. In this paper, we consider chain letters, which contain a growing sequence of signatories added as the letter propagates. The unusual constellation of features exhibited by chain letters (one-ended growth, divergence, and mutation) make their propagation, and thus the corresponding reconstruction problem, both distinctive and rich. Here, inspired by these chain letters, we formally define the problem of computing an optimal summary of a set of diverging string sequences. From a collection of these sequences of names, with each sequence noisily corresponding to a branch of the unknown tree $T$ representing the letter's true dissemination, can we efficiently and accurately reconstruct a tree $T' \\\\approx T$? In this paper, we give efficient exact algorithms for this summarization problem when the number of sequences is small; for larger sets of sequences, we prove hardness and provide an efficient heuristic algorithm. We evaluate this heuristic on synthetic data sets chosen to emulate real chain letters, showing that our algorithm is competitive with or better than previous approaches, and that it also comes close to finding the true trees in these synthetic datasets.\",\"PeriodicalId\":236737,\"journal\":{\"name\":\"Annual Symposium on Combinatorial Pattern Matching\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Annual Symposium on Combinatorial Pattern Matching\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.4230/LIPIcs.CPM.2020.11\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Annual Symposium on Combinatorial Pattern Matching","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4230/LIPIcs.CPM.2020.11","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

查找字符串之间的最佳对齐或查找字符串集合的简洁摘要的算法在各种上下文中得到了很好的研究，解决了各种有趣的应用程序。在本文中，我们考虑连锁信件，其中包含随着信件传播而增加的签名序列。连锁字母所表现出的不寻常的特征(一端生长、发散和突变)使它们的传播，从而相应的重建问题，既独特又丰富。在这里，受这些连锁字母的启发，我们正式定义了计算一组发散字符串序列的最优总结的问题。从这些名称序列的集合中，每个序列噪声地对应于代表字母真实传播的未知树$T$的一个分支，我们能否高效准确地重建一棵$T' \ \近似于T$的树?本文给出了在序列数较少时的高效精确的总结算法;对于更大的序列集，我们证明了硬度并提供了一个有效的启发式算法。我们在模拟真实链字母的合成数据集上评估了这种启发式算法，结果表明我们的算法与以前的方法相竞争或更好，并且它也接近于在这些合成数据集中找到真正的树。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Summarizing Diverging String Sequences, with Applications to Chain-Letter Petitions

Algorithms to find optimal alignments among strings, or to find a parsimonious summary of a collection of strings, are well studied in a variety of contexts, addressing a wide range of interesting applications. In this paper, we consider chain letters, which contain a growing sequence of signatories added as the letter propagates. The unusual constellation of features exhibited by chain letters (one-ended growth, divergence, and mutation) make their propagation, and thus the corresponding reconstruction problem, both distinctive and rich. Here, inspired by these chain letters, we formally define the problem of computing an optimal summary of a set of diverging string sequences. From a collection of these sequences of names, with each sequence noisily corresponding to a branch of the unknown tree $T$ representing the letter's true dissemination, can we efficiently and accurately reconstruct a tree $T' \approx T$? In this paper, we give efficient exact algorithms for this summarization problem when the number of sequences is small; for larger sets of sequences, we prove hardness and provide an efficient heuristic algorithm. We evaluate this heuristic on synthetic data sets chosen to emulate real chain letters, showing that our algorithm is competitive with or better than previous approaches, and that it also comes close to finding the true trees in these synthetic datasets.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Annual Symposium on Combinatorial Pattern Matching

自引率

0.00%

发文量