{"title":"A Novel Method for Gene Regulatory Network Inference with Pseudotime Data Using Information Criterion","authors":"Shuhei Yao, Kaito Uemura, S. Seno, H. Matsuda","doi":"10.17706/ijbbb.2022.12.3.43-52","DOIUrl":null,"url":null,"abstract":": Trajectory inference has been used to model cellular dynamic processes by using single-cell RNA sequence data. The inference often computes pseudotime representing the progression through the process along the trajectory. Several methods to infer gene regulatory networks have been proposed using the gene expression profiles of the cells ordered with the pseudotime to elucidate the regulatory relationships between genes in a dynamic process. In this paper, we propose a novel method for the inference of such gene regulatory networks. To predict highly accurate gene regulatory relationships in the network, we introduce an edge-scoring scheme with bootstrap sampling. We demonstrate the accuracy of the proposed methods by comparing the results with those of existing methods using synthetic and real single-cell RNA-seq data. a type in which the regulatory relationships of genes are connected in a long straight line [6]. Each network was generated with five different patterns of cell numbers: 100, 200, 500, 2000, and 5000. In each condition, 10 networks were generated and evaluated as described in [6]. For the actual data, we extracted the expression data of 10 genes from the network of transcription factors experimentally confirmed in the report of [11] from the expression data of cells in the differentiation lineage from pluripotent progenitor cells to monocytes as described above. 315 cells in which at least 3 of the 10 genes were expressed. The same time point was defined as the number truncated after the decimal point of the pseudotime. Using these data as input, network inference was performed using the DBN approach of SiGN-BN, and edge gain was calculated using the approach described above. We performed 100 bootstrap sampling for the synthetic data and 1000 bootstrap sampling for the real data. For each edge of the obtained network, the bootstrap probability, which means the probability that a regulated edge appears in the network inferred from multiple data sets generated by the bootstrap method, was calculated separately from the proposed method. The results obtained by the bootstrap probabilities and SINCERITIES were compared with the proposed method. The AUROC (area under the receiver operating characteristic curve) and AUPRC for each method were computed with the PRROC package [12], and all graphs were plotted in R. For the network visualization, Cytoscape was used.","PeriodicalId":13816,"journal":{"name":"International Journal of Bioscience, Biochemistry and Bioinformatics","volume":"1 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Bioscience, Biochemistry and Bioinformatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.17706/ijbbb.2022.12.3.43-52","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
: Trajectory inference has been used to model cellular dynamic processes by using single-cell RNA sequence data. The inference often computes pseudotime representing the progression through the process along the trajectory. Several methods to infer gene regulatory networks have been proposed using the gene expression profiles of the cells ordered with the pseudotime to elucidate the regulatory relationships between genes in a dynamic process. In this paper, we propose a novel method for the inference of such gene regulatory networks. To predict highly accurate gene regulatory relationships in the network, we introduce an edge-scoring scheme with bootstrap sampling. We demonstrate the accuracy of the proposed methods by comparing the results with those of existing methods using synthetic and real single-cell RNA-seq data. a type in which the regulatory relationships of genes are connected in a long straight line [6]. Each network was generated with five different patterns of cell numbers: 100, 200, 500, 2000, and 5000. In each condition, 10 networks were generated and evaluated as described in [6]. For the actual data, we extracted the expression data of 10 genes from the network of transcription factors experimentally confirmed in the report of [11] from the expression data of cells in the differentiation lineage from pluripotent progenitor cells to monocytes as described above. 315 cells in which at least 3 of the 10 genes were expressed. The same time point was defined as the number truncated after the decimal point of the pseudotime. Using these data as input, network inference was performed using the DBN approach of SiGN-BN, and edge gain was calculated using the approach described above. We performed 100 bootstrap sampling for the synthetic data and 1000 bootstrap sampling for the real data. For each edge of the obtained network, the bootstrap probability, which means the probability that a regulated edge appears in the network inferred from multiple data sets generated by the bootstrap method, was calculated separately from the proposed method. The results obtained by the bootstrap probabilities and SINCERITIES were compared with the proposed method. The AUROC (area under the receiver operating characteristic curve) and AUPRC for each method were computed with the PRROC package [12], and all graphs were plotted in R. For the network visualization, Cytoscape was used.