Nathan Mankovich, Helene Andrews-Polymenis, David Threadgill, Michael Kirby
{"title":"Module representatives for refining gene co-expression modules.","authors":"Nathan Mankovich, Helene Andrews-Polymenis, David Threadgill, Michael Kirby","doi":"10.1088/1478-3975/acce8d","DOIUrl":null,"url":null,"abstract":"<p><p>This paper concerns the identification of gene co-expression modules in transcriptomics data, i.e. collections of genes which are highly co-expressed and potentially linked to a biological mechanism. Weighted gene co-expression network analysis (WGCNA) is a widely used method for module detection based on the computation of eigengenes, the weights of the first principal component for the module gene expression matrix. This eigengene has been used as a centroid in a<i>k</i>-means algorithm to improve module memberships. In this paper, we present four new module representatives: the eigengene subspace, flag mean, flag median and module expression vector. The eigengene subspace, flag mean and flag median are subspace module representatives which capture more variance of the gene expression within a module. The module expression vector is a weighted centroid of the module which leverages the structure of the module gene co-expression network. We use these module representatives in Linde-Buzo-Gray clustering algorithms to refine WGCNA module membership. We evaluate these methodologies on two transcriptomics data sets. We find that most of our module refinement techniques improve upon the WGCNA modules by two statistics: (1) module classification between phenotype and (2) module biological significance according to Gene Ontology terms.</p>","PeriodicalId":20207,"journal":{"name":"Physical biology","volume":null,"pages":null},"PeriodicalIF":2.0000,"publicationDate":"2023-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Physical biology","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1088/1478-3975/acce8d","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
This paper concerns the identification of gene co-expression modules in transcriptomics data, i.e. collections of genes which are highly co-expressed and potentially linked to a biological mechanism. Weighted gene co-expression network analysis (WGCNA) is a widely used method for module detection based on the computation of eigengenes, the weights of the first principal component for the module gene expression matrix. This eigengene has been used as a centroid in ak-means algorithm to improve module memberships. In this paper, we present four new module representatives: the eigengene subspace, flag mean, flag median and module expression vector. The eigengene subspace, flag mean and flag median are subspace module representatives which capture more variance of the gene expression within a module. The module expression vector is a weighted centroid of the module which leverages the structure of the module gene co-expression network. We use these module representatives in Linde-Buzo-Gray clustering algorithms to refine WGCNA module membership. We evaluate these methodologies on two transcriptomics data sets. We find that most of our module refinement techniques improve upon the WGCNA modules by two statistics: (1) module classification between phenotype and (2) module biological significance according to Gene Ontology terms.
期刊介绍:
Physical Biology publishes articles in the broad interdisciplinary field bridging biology with the physical sciences and engineering. This journal focuses on research in which quantitative approaches – experimental, theoretical and modeling – lead to new insights into biological systems at all scales of space and time, and all levels of organizational complexity.
Physical Biology accepts contributions from a wide range of biological sub-fields, including topics such as:
molecular biophysics, including single molecule studies, protein-protein and protein-DNA interactions
subcellular structures, organelle dynamics, membranes, protein assemblies, chromosome structure
intracellular processes, e.g. cytoskeleton dynamics, cellular transport, cell division
systems biology, e.g. signaling, gene regulation and metabolic networks
cells and their microenvironment, e.g. cell mechanics and motility, chemotaxis, extracellular matrix, biofilms
cell-material interactions, e.g. biointerfaces, electrical stimulation and sensing, endocytosis
cell-cell interactions, cell aggregates, organoids, tissues and organs
developmental dynamics, including pattern formation and morphogenesis
physical and evolutionary aspects of disease, e.g. cancer progression, amyloid formation
neuronal systems, including information processing by networks, memory and learning
population dynamics, ecology, and evolution
collective action and emergence of collective phenomena.