映射器复杂的有趣路径

Q4 Mathematics

International Journal of Computational Geometry & Applications Pub Date : 2019-01-01 DOI:10.20382/jocg.v10i1a17

A. Kalyanaraman, M. Kamruzzaman, Bala Krishnamoorthy

{"title":"映射器复杂的有趣路径","authors":"A. Kalyanaraman, M. Kamruzzaman, Bala Krishnamoorthy","doi":"10.20382/jocg.v10i1a17","DOIUrl":null,"url":null,"abstract":"Given a high dimensional point cloud of data with functions defined on the points, the mapper algorithm produces a compact summary in the form of a simplicial complex connecting the points. We study the problem of quantifying the interestingness of subpopulations in a given mapper complex. First, we create a weighted directed graph G = (V,E) using the 1-skeleton of the mapper complex. We use the average values at the vertices of a target function (dependent variable) to direct the edges from low to high values, and assign the difference (high−low) as the weight of the edge. Covariation of the remaining h functions (independent variables) is captured by a h-bit binary signature assigned to the edge. An interesting path in G is a directed path whose edges all have the same signature. The interestingness score of such a path as a sum of its edge weights multiplied by a nonlinear function of their corresponding ranks, i.e., the depths of the edges along the path. Such a nonlinear function could model application use-cases where the growth in the dependent variable values is expected to be concentrated in specific intervals of a path. Second, we study three optimization problems on this graph G to quantify interesting subpopulations. In the problem Max-IP, the goal is to find the most interesting path in G, i.e., an interesting path with the maximum interestingness score. For the case where G is a directed acyclic graph (DAG), we show that Max-IP can be solved in polynomial time. In the more general problem IP, the goal is to find a collection of interesting paths that are edge-disjoint, and the sum of interestingness scores of all paths is maximized. We also study a variant of IP termed k-IP, where the goal is to identify a collection of edgedisjoint interesting paths each with k edges, and the total interestingness score of all paths is maximized. While k-IP can be solved in polynomial time for k ≤ 2, we show k-IP is NP-complete for k ≥ 3 even when G is a DAG. We develop heuristics for IP and k-IP on DAGs, which use the algorithm for Max-IP on DAGs as a subroutine. We have released open source implementations of our algorithms to find interesting paths. We also present a detailed experimental evaluation of this software framework on a real-world maize plant phenomics data set. We use interesting paths identified on several mapper graphs to explain how the genotype and environmental factors influence the growth rate, both in isolation as well as in combinations. ∗School of Electrical Engineering and Computer Science, Washington State University, Pullman, USA †Department of Mathematics and Statistics, Washington State University, Vancouver, USA {ananth,md.kamruzzaman,kbala}@wsu.edu","PeriodicalId":54969,"journal":{"name":"International Journal of Computational Geometry & Applications","volume":"24 1","pages":"500-531"},"PeriodicalIF":0.0000,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Interesting paths in the mapper complex\",\"authors\":\"A. Kalyanaraman, M. Kamruzzaman, Bala Krishnamoorthy\",\"doi\":\"10.20382/jocg.v10i1a17\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Given a high dimensional point cloud of data with functions defined on the points, the mapper algorithm produces a compact summary in the form of a simplicial complex connecting the points. We study the problem of quantifying the interestingness of subpopulations in a given mapper complex. First, we create a weighted directed graph G = (V,E) using the 1-skeleton of the mapper complex. We use the average values at the vertices of a target function (dependent variable) to direct the edges from low to high values, and assign the difference (high−low) as the weight of the edge. Covariation of the remaining h functions (independent variables) is captured by a h-bit binary signature assigned to the edge. An interesting path in G is a directed path whose edges all have the same signature. The interestingness score of such a path as a sum of its edge weights multiplied by a nonlinear function of their corresponding ranks, i.e., the depths of the edges along the path. Such a nonlinear function could model application use-cases where the growth in the dependent variable values is expected to be concentrated in specific intervals of a path. Second, we study three optimization problems on this graph G to quantify interesting subpopulations. In the problem Max-IP, the goal is to find the most interesting path in G, i.e., an interesting path with the maximum interestingness score. For the case where G is a directed acyclic graph (DAG), we show that Max-IP can be solved in polynomial time. In the more general problem IP, the goal is to find a collection of interesting paths that are edge-disjoint, and the sum of interestingness scores of all paths is maximized. We also study a variant of IP termed k-IP, where the goal is to identify a collection of edgedisjoint interesting paths each with k edges, and the total interestingness score of all paths is maximized. While k-IP can be solved in polynomial time for k ≤ 2, we show k-IP is NP-complete for k ≥ 3 even when G is a DAG. We develop heuristics for IP and k-IP on DAGs, which use the algorithm for Max-IP on DAGs as a subroutine. We have released open source implementations of our algorithms to find interesting paths. We also present a detailed experimental evaluation of this software framework on a real-world maize plant phenomics data set. We use interesting paths identified on several mapper graphs to explain how the genotype and environmental factors influence the growth rate, both in isolation as well as in combinations. ∗School of Electrical Engineering and Computer Science, Washington State University, Pullman, USA †Department of Mathematics and Statistics, Washington State University, Vancouver, USA {ananth,md.kamruzzaman,kbala}@wsu.edu\",\"PeriodicalId\":54969,\"journal\":{\"name\":\"International Journal of Computational Geometry & Applications\",\"volume\":\"24 1\",\"pages\":\"500-531\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Computational Geometry & Applications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.20382/jocg.v10i1a17\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"Mathematics\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Computational Geometry & Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.20382/jocg.v10i1a17","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"Mathematics","Score":null,"Total":0}

引用次数: 6

摘要

给定一个高维数据点云，在点上定义了函数，mapper算法以连接点的简单复合体的形式生成一个紧凑的摘要。研究了给定映射复合体中子种群兴趣度的量化问题。首先，我们使用映射复合体的1-骨架创建一个加权有向图G = (V,E)。我们使用目标函数(因变量)顶点的平均值来指导边缘从低到高的值，并将差值(高-低)分配为边缘的权重。剩余的h个函数(自变量)的协变由分配给边缘的h位二进制签名捕获。G中一个有趣的路径是有向路径它的边都有相同的特征。这种路径的有趣度分数是其边权的和乘以其相应的秩的非线性函数，即沿路径的边的深度。这样的非线性函数可以为应用程序用例建模，其中因变量值的增长预计集中在路径的特定间隔中。其次，我们研究了图G上的三个优化问题，以量化感兴趣的子群。在Max-IP问题中，目标是在G中找到最有趣的路径，即具有最大兴趣分数的有趣路径。对于G是有向无环图(DAG)的情况，我们证明了Max-IP可以在多项式时间内求解。在更一般的问题IP中，目标是找到一组边不相交的有趣路径，并且最大化所有路径的兴趣分数之和。我们还研究了IP的一种变体，称为k-IP，其目标是识别一组边不相交的有趣路径，每条路径有k条边，并且所有路径的总兴趣分数最大化。当k≤2时，k- ip可以在多项式时间内求解，但当k≥3时，即使G是DAG, k- ip也是np完全的。我们开发了dag上的IP和k-IP启发式算法，它们使用dag上的Max-IP算法作为子程序。我们已经发布了算法的开源实现，以寻找有趣的路径。我们还在真实世界的玉米植物表型组数据集上对该软件框架进行了详细的实验评估。我们使用在几个绘图图上确定的有趣路径来解释基因型和环境因素如何影响生长速度，无论是单独的还是组合的。*华盛顿州立大学电气工程与计算机科学学院，普尔曼，美国†华盛顿州立大学数学与统计学系，温哥华，美国{ananth,md.kamruzzaman,kbala}@wsu.edu

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Interesting paths in the mapper complex

Given a high dimensional point cloud of data with functions defined on the points, the mapper algorithm produces a compact summary in the form of a simplicial complex connecting the points. We study the problem of quantifying the interestingness of subpopulations in a given mapper complex. First, we create a weighted directed graph G = (V,E) using the 1-skeleton of the mapper complex. We use the average values at the vertices of a target function (dependent variable) to direct the edges from low to high values, and assign the difference (high−low) as the weight of the edge. Covariation of the remaining h functions (independent variables) is captured by a h-bit binary signature assigned to the edge. An interesting path in G is a directed path whose edges all have the same signature. The interestingness score of such a path as a sum of its edge weights multiplied by a nonlinear function of their corresponding ranks, i.e., the depths of the edges along the path. Such a nonlinear function could model application use-cases where the growth in the dependent variable values is expected to be concentrated in specific intervals of a path. Second, we study three optimization problems on this graph G to quantify interesting subpopulations. In the problem Max-IP, the goal is to find the most interesting path in G, i.e., an interesting path with the maximum interestingness score. For the case where G is a directed acyclic graph (DAG), we show that Max-IP can be solved in polynomial time. In the more general problem IP, the goal is to find a collection of interesting paths that are edge-disjoint, and the sum of interestingness scores of all paths is maximized. We also study a variant of IP termed k-IP, where the goal is to identify a collection of edgedisjoint interesting paths each with k edges, and the total interestingness score of all paths is maximized. While k-IP can be solved in polynomial time for k ≤ 2, we show k-IP is NP-complete for k ≥ 3 even when G is a DAG. We develop heuristics for IP and k-IP on DAGs, which use the algorithm for Max-IP on DAGs as a subroutine. We have released open source implementations of our algorithms to find interesting paths. We also present a detailed experimental evaluation of this software framework on a real-world maize plant phenomics data set. We use interesting paths identified on several mapper graphs to explain how the genotype and environmental factors influence the growth rate, both in isolation as well as in combinations. ∗School of Electrical Engineering and Computer Science, Washington State University, Pullman, USA †Department of Mathematics and Statistics, Washington State University, Vancouver, USA {ananth,md.kamruzzaman,kbala}@wsu.edu

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

International Journal of Computational Geometry & Applications 数学-计算机：理论方法

CiteScore

0.80

自引率

0.00%

发文量

审稿时长

>12 weeks

期刊介绍： The International Journal of Computational Geometry & Applications (IJCGA) is a quarterly journal devoted to the field of computational geometry within the framework of design and analysis of algorithms. Emphasis is placed on the computational aspects of geometric problems that arise in various fields of science and engineering including computer-aided geometry design (CAGD), computer graphics, constructive solid geometry (CSG), operations research, pattern recognition, robotics, solid modelling, VLSI routing/layout, and others. Research contributions ranging from theoretical results in algorithm design — sequential or parallel, probabilistic or randomized algorithms — to applications in the above-mentioned areas are welcome. Research findings or experiences in the implementations of geometric algorithms, such as numerical stability, and papers with a geometric flavour related to algorithms or the application areas of computational geometry are also welcome.