A. Kalyanaraman, M. Kamruzzaman, Bala Krishnamoorthy
{"title":"Interesting paths in the mapper complex","authors":"A. Kalyanaraman, M. Kamruzzaman, Bala Krishnamoorthy","doi":"10.20382/jocg.v10i1a17","DOIUrl":null,"url":null,"abstract":"Given a high dimensional point cloud of data with functions defined on the points, the mapper algorithm produces a compact summary in the form of a simplicial complex connecting the points. We study the problem of quantifying the interestingness of subpopulations in a given mapper complex. First, we create a weighted directed graph G = (V,E) using the 1-skeleton of the mapper complex. We use the average values at the vertices of a target function (dependent variable) to direct the edges from low to high values, and assign the difference (high−low) as the weight of the edge. Covariation of the remaining h functions (independent variables) is captured by a h-bit binary signature assigned to the edge. An interesting path in G is a directed path whose edges all have the same signature. The interestingness score of such a path as a sum of its edge weights multiplied by a nonlinear function of their corresponding ranks, i.e., the depths of the edges along the path. Such a nonlinear function could model application use-cases where the growth in the dependent variable values is expected to be concentrated in specific intervals of a path. Second, we study three optimization problems on this graph G to quantify interesting subpopulations. In the problem Max-IP, the goal is to find the most interesting path in G, i.e., an interesting path with the maximum interestingness score. For the case where G is a directed acyclic graph (DAG), we show that Max-IP can be solved in polynomial time. In the more general problem IP, the goal is to find a collection of interesting paths that are edge-disjoint, and the sum of interestingness scores of all paths is maximized. We also study a variant of IP termed k-IP, where the goal is to identify a collection of edgedisjoint interesting paths each with k edges, and the total interestingness score of all paths is maximized. While k-IP can be solved in polynomial time for k ≤ 2, we show k-IP is NP-complete for k ≥ 3 even when G is a DAG. We develop heuristics for IP and k-IP on DAGs, which use the algorithm for Max-IP on DAGs as a subroutine. We have released open source implementations of our algorithms to find interesting paths. We also present a detailed experimental evaluation of this software framework on a real-world maize plant phenomics data set. We use interesting paths identified on several mapper graphs to explain how the genotype and environmental factors influence the growth rate, both in isolation as well as in combinations. ∗School of Electrical Engineering and Computer Science, Washington State University, Pullman, USA †Department of Mathematics and Statistics, Washington State University, Vancouver, USA {ananth,md.kamruzzaman,kbala}@wsu.edu","PeriodicalId":54969,"journal":{"name":"International Journal of Computational Geometry & Applications","volume":"24 1","pages":"500-531"},"PeriodicalIF":0.0000,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Computational Geometry & Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.20382/jocg.v10i1a17","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"Mathematics","Score":null,"Total":0}
引用次数: 6
Abstract
Given a high dimensional point cloud of data with functions defined on the points, the mapper algorithm produces a compact summary in the form of a simplicial complex connecting the points. We study the problem of quantifying the interestingness of subpopulations in a given mapper complex. First, we create a weighted directed graph G = (V,E) using the 1-skeleton of the mapper complex. We use the average values at the vertices of a target function (dependent variable) to direct the edges from low to high values, and assign the difference (high−low) as the weight of the edge. Covariation of the remaining h functions (independent variables) is captured by a h-bit binary signature assigned to the edge. An interesting path in G is a directed path whose edges all have the same signature. The interestingness score of such a path as a sum of its edge weights multiplied by a nonlinear function of their corresponding ranks, i.e., the depths of the edges along the path. Such a nonlinear function could model application use-cases where the growth in the dependent variable values is expected to be concentrated in specific intervals of a path. Second, we study three optimization problems on this graph G to quantify interesting subpopulations. In the problem Max-IP, the goal is to find the most interesting path in G, i.e., an interesting path with the maximum interestingness score. For the case where G is a directed acyclic graph (DAG), we show that Max-IP can be solved in polynomial time. In the more general problem IP, the goal is to find a collection of interesting paths that are edge-disjoint, and the sum of interestingness scores of all paths is maximized. We also study a variant of IP termed k-IP, where the goal is to identify a collection of edgedisjoint interesting paths each with k edges, and the total interestingness score of all paths is maximized. While k-IP can be solved in polynomial time for k ≤ 2, we show k-IP is NP-complete for k ≥ 3 even when G is a DAG. We develop heuristics for IP and k-IP on DAGs, which use the algorithm for Max-IP on DAGs as a subroutine. We have released open source implementations of our algorithms to find interesting paths. We also present a detailed experimental evaluation of this software framework on a real-world maize plant phenomics data set. We use interesting paths identified on several mapper graphs to explain how the genotype and environmental factors influence the growth rate, both in isolation as well as in combinations. ∗School of Electrical Engineering and Computer Science, Washington State University, Pullman, USA †Department of Mathematics and Statistics, Washington State University, Vancouver, USA {ananth,md.kamruzzaman,kbala}@wsu.edu
期刊介绍:
The International Journal of Computational Geometry & Applications (IJCGA) is a quarterly journal devoted to the field of computational geometry within the framework of design and analysis of algorithms.
Emphasis is placed on the computational aspects of geometric problems that arise in various fields of science and engineering including computer-aided geometry design (CAGD), computer graphics, constructive solid geometry (CSG), operations research, pattern recognition, robotics, solid modelling, VLSI routing/layout, and others. Research contributions ranging from theoretical results in algorithm design — sequential or parallel, probabilistic or randomized algorithms — to applications in the above-mentioned areas are welcome. Research findings or experiences in the implementations of geometric algorithms, such as numerical stability, and papers with a geometric flavour related to algorithms or the application areas of computational geometry are also welcome.