A Novel Method for Gene Regulatory Network Inference with Pseudotime Data Using Information Criterion

International Journal of Bioscience, Biochemistry and Bioinformatics Pub Date : 2022-07-01 DOI:10.17706/ijbbb.2022.12.3.43-52

Shuhei Yao, Kaito Uemura, S. Seno, H. Matsuda

{"title":"A Novel Method for Gene Regulatory Network Inference with Pseudotime Data Using Information Criterion","authors":"Shuhei Yao, Kaito Uemura, S. Seno, H. Matsuda","doi":"10.17706/ijbbb.2022.12.3.43-52","DOIUrl":null,"url":null,"abstract":": Trajectory inference has been used to model cellular dynamic processes by using single-cell RNA sequence data. The inference often computes pseudotime representing the progression through the process along the trajectory. Several methods to infer gene regulatory networks have been proposed using the gene expression profiles of the cells ordered with the pseudotime to elucidate the regulatory relationships between genes in a dynamic process. In this paper, we propose a novel method for the inference of such gene regulatory networks. To predict highly accurate gene regulatory relationships in the network, we introduce an edge-scoring scheme with bootstrap sampling. We demonstrate the accuracy of the proposed methods by comparing the results with those of existing methods using synthetic and real single-cell RNA-seq data. a type in which the regulatory relationships of genes are connected in a long straight line [6]. Each network was generated with five different patterns of cell numbers: 100, 200, 500, 2000, and 5000. In each condition, 10 networks were generated and evaluated as described in [6]. For the actual data, we extracted the expression data of 10 genes from the network of transcription factors experimentally confirmed in the report of [11] from the expression data of cells in the differentiation lineage from pluripotent progenitor cells to monocytes as described above. 315 cells in which at least 3 of the 10 genes were expressed. The same time point was defined as the number truncated after the decimal point of the pseudotime. Using these data as input, network inference was performed using the DBN approach of SiGN-BN, and edge gain was calculated using the approach described above. We performed 100 bootstrap sampling for the synthetic data and 1000 bootstrap sampling for the real data. For each edge of the obtained network, the bootstrap probability, which means the probability that a regulated edge appears in the network inferred from multiple data sets generated by the bootstrap method, was calculated separately from the proposed method. The results obtained by the bootstrap probabilities and SINCERITIES were compared with the proposed method. The AUROC (area under the receiver operating characteristic curve) and AUPRC for each method were computed with the PRROC package [12], and all graphs were plotted in R. For the network visualization, Cytoscape was used.","PeriodicalId":13816,"journal":{"name":"International Journal of Bioscience, Biochemistry and Bioinformatics","volume":"1 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Bioscience, Biochemistry and Bioinformatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.17706/ijbbb.2022.12.3.43-52","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

: Trajectory inference has been used to model cellular dynamic processes by using single-cell RNA sequence data. The inference often computes pseudotime representing the progression through the process along the trajectory. Several methods to infer gene regulatory networks have been proposed using the gene expression profiles of the cells ordered with the pseudotime to elucidate the regulatory relationships between genes in a dynamic process. In this paper, we propose a novel method for the inference of such gene regulatory networks. To predict highly accurate gene regulatory relationships in the network, we introduce an edge-scoring scheme with bootstrap sampling. We demonstrate the accuracy of the proposed methods by comparing the results with those of existing methods using synthetic and real single-cell RNA-seq data. a type in which the regulatory relationships of genes are connected in a long straight line [6]. Each network was generated with five different patterns of cell numbers: 100, 200, 500, 2000, and 5000. In each condition, 10 networks were generated and evaluated as described in [6]. For the actual data, we extracted the expression data of 10 genes from the network of transcription factors experimentally confirmed in the report of [11] from the expression data of cells in the differentiation lineage from pluripotent progenitor cells to monocytes as described above. 315 cells in which at least 3 of the 10 genes were expressed. The same time point was defined as the number truncated after the decimal point of the pseudotime. Using these data as input, network inference was performed using the DBN approach of SiGN-BN, and edge gain was calculated using the approach described above. We performed 100 bootstrap sampling for the synthetic data and 1000 bootstrap sampling for the real data. For each edge of the obtained network, the bootstrap probability, which means the probability that a regulated edge appears in the network inferred from multiple data sets generated by the bootstrap method, was calculated separately from the proposed method. The results obtained by the bootstrap probabilities and SINCERITIES were compared with the proposed method. The AUROC (area under the receiver operating characteristic curve) and AUPRC for each method were computed with the PRROC package [12], and all graphs were plotted in R. For the network visualization, Cytoscape was used.

查看原文本刊更多论文

基于信息准则的伪时间数据基因调控网络推断新方法

轨迹推理已被用于利用单细胞RNA序列数据来模拟细胞动态过程。推理通常计算伪时间，表示沿着轨迹的过程的进展。人们提出了几种利用细胞的基因表达谱来推断基因调控网络的方法，以阐明在一个动态过程中基因之间的调控关系。在本文中，我们提出了一种新的方法来推断这种基因调控网络。为了准确预测网络中的基因调控关系，我们引入了一种带自举采样的边缘评分方案。我们通过将结果与使用合成和真实单细胞RNA-seq数据的现有方法的结果进行比较，证明了所提出方法的准确性。一种基因调控关系以一条长直线连接的类型。每个网络都由五种不同的单元号码模式生成:100、200、500、2000和5000。在每种情况下，生成10个网络，并按照[6]的描述进行评估。对于实际数据，我们从上述多能祖细胞向单核细胞分化谱系中细胞的表达数据中提取了[11]报告中实验证实的转录因子网络中10个基因的表达数据。315个细胞至少表达了10个基因中的3个。同一时间点被定义为在伪时间小数点后截断的数字。使用这些数据作为输入，使用SiGN-BN的DBN方法进行网络推理，并使用上述方法计算边缘增益。我们对合成数据进行了100次自举抽样，对真实数据进行了1000次自举抽样。对于得到的网络中的每条边，分别计算自举概率(bootstrap probability)，即从自举方法生成的多个数据集推断出网络中出现一条调节边的概率。将自举概率和自举诚恳度得到的结果与所提方法进行了比较。使用proroc软件包[12]计算每种方法的AUROC(接收者工作特征曲线下面积)和AUPRC，所有图形以r绘制。网络可视化使用Cytoscape。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

International Journal of Bioscience, Biochemistry and Bioinformatics

自引率

0.00%

发文量