{"title":"标记有向图中高效字符串匹配的可逆变换","authors":"Abhinav Nellore, Austin Nguyen, Reid F. Thompson","doi":"10.4230/LIPIcs.CPM.2021.20","DOIUrl":null,"url":null,"abstract":"Let $G = (V, E)$ be a digraph where each vertex is unlabeled, each edge is labeled by a character in some alphabet $\\Omega$, and any two edges with both the same head and the same tail have different labels. The powerset construction gives a transform of $G$ into a weakly connected digraph $G' = (V', E')$ that enables solving the decision problem of whether there is a walk in $G$ matching an arbitrarily long query string $q$ in time linear in $|q|$ and independent of $|E|$ and $|V|$. We show $G$ is uniquely determined by $G'$ when for every $v_\\ell \\in V$, there is some distinct string $s_\\ell$ on $\\Omega$ such that $v_\\ell$ is the origin of a closed walk in $G$ matching $s_\\ell$, and no other walk in $G$ matches $s_\\ell$ unless it starts and ends at $v_\\ell$. We then exploit this invertibility condition to strategically alter any $G$ so its transform $G'$ enables retrieval of all $t$ terminal vertices of walks in the unaltered $G$ matching $q$ in $O(|q| + t \\log |V|)$ time. We conclude by proposing two defining properties of a class of transforms that includes the Burrows-Wheeler transform and the transform presented here.","PeriodicalId":236737,"journal":{"name":"Annual Symposium on Combinatorial Pattern Matching","volume":"31 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"An Invertible Transform for Efficient String Matching in Labeled Digraphs\",\"authors\":\"Abhinav Nellore, Austin Nguyen, Reid F. Thompson\",\"doi\":\"10.4230/LIPIcs.CPM.2021.20\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Let $G = (V, E)$ be a digraph where each vertex is unlabeled, each edge is labeled by a character in some alphabet $\\\\Omega$, and any two edges with both the same head and the same tail have different labels. The powerset construction gives a transform of $G$ into a weakly connected digraph $G' = (V', E')$ that enables solving the decision problem of whether there is a walk in $G$ matching an arbitrarily long query string $q$ in time linear in $|q|$ and independent of $|E|$ and $|V|$. We show $G$ is uniquely determined by $G'$ when for every $v_\\\\ell \\\\in V$, there is some distinct string $s_\\\\ell$ on $\\\\Omega$ such that $v_\\\\ell$ is the origin of a closed walk in $G$ matching $s_\\\\ell$, and no other walk in $G$ matches $s_\\\\ell$ unless it starts and ends at $v_\\\\ell$. We then exploit this invertibility condition to strategically alter any $G$ so its transform $G'$ enables retrieval of all $t$ terminal vertices of walks in the unaltered $G$ matching $q$ in $O(|q| + t \\\\log |V|)$ time. We conclude by proposing two defining properties of a class of transforms that includes the Burrows-Wheeler transform and the transform presented here.\",\"PeriodicalId\":236737,\"journal\":{\"name\":\"Annual Symposium on Combinatorial Pattern Matching\",\"volume\":\"31 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-05-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Annual Symposium on Combinatorial Pattern Matching\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.4230/LIPIcs.CPM.2021.20\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Annual Symposium on Combinatorial Pattern Matching","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4230/LIPIcs.CPM.2021.20","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
An Invertible Transform for Efficient String Matching in Labeled Digraphs
Let $G = (V, E)$ be a digraph where each vertex is unlabeled, each edge is labeled by a character in some alphabet $\Omega$, and any two edges with both the same head and the same tail have different labels. The powerset construction gives a transform of $G$ into a weakly connected digraph $G' = (V', E')$ that enables solving the decision problem of whether there is a walk in $G$ matching an arbitrarily long query string $q$ in time linear in $|q|$ and independent of $|E|$ and $|V|$. We show $G$ is uniquely determined by $G'$ when for every $v_\ell \in V$, there is some distinct string $s_\ell$ on $\Omega$ such that $v_\ell$ is the origin of a closed walk in $G$ matching $s_\ell$, and no other walk in $G$ matches $s_\ell$ unless it starts and ends at $v_\ell$. We then exploit this invertibility condition to strategically alter any $G$ so its transform $G'$ enables retrieval of all $t$ terminal vertices of walks in the unaltered $G$ matching $q$ in $O(|q| + t \log |V|)$ time. We conclude by proposing two defining properties of a class of transforms that includes the Burrows-Wheeler transform and the transform presented here.