Pseudotime Analysis Imputes the Missing Liver NAFLD Status in Public RNA-Seq Cohorts

Tongyang Wang, Xiangmei Dou
{"title":"Pseudotime Analysis Imputes the Missing Liver NAFLD Status in Public RNA-Seq Cohorts","authors":"Tongyang Wang, Xiangmei Dou","doi":"10.1145/3543081.3543094","DOIUrl":null,"url":null,"abstract":"Existing gene expression analysis methods like microarray or RNA-sequencing are unable to resolve the complex mechanisms of progression of non-alcoholic fatty liver disease (NAFLD) due to insufficient accuracy and lack of phenotypic data. Particularly, incomplete phenotypic data in public liver gene expression cohorts have cumbered many studies on the progression of NAFLD. To address this issue, the cutting-edge pseudotime analysis is adopted to estimate liver health status in human liver gene expression data. A set of 25 genes differentially expressed between the healthy controls and the NAFLD group samples are identified by differential expression (DE) Analysis. The identified DE genes separate the NAFLD patients and the healthy controls in hierarchical clustering, and their related biological pathways are highly relevant to liver signaling and injury, implying the close relationship between the DE gene expressions and NAFLD. What's more, the pseudotime analysis we conducted simulates the deterioration of NAFLD by using liver fat percent to represent NAFLD severity and aligning the candidate samples on the estimated trajectory according to their respective gene expression and covariates; we verified the pseudotime model using another microarray cohort. The verified pseudotime model is further applied to an RNA-Seq cohort (GTEx) to estimate the liver health status of samples that lacked phenotypic details. This model recurs the timeline of NAFLD progression and verifies the potential key roles of the expression of DE genes in this process. In conclusion, the expressions of the genes and their changes in distinct groups of samples are chronologically consistent with the progression of NAFLD severity. The pseudotime model can be used to impute the missing NAFLD phenotypes in public liver gene expression cohorts.","PeriodicalId":432056,"journal":{"name":"Proceedings of the 6th International Conference on Biomedical Engineering and Applications","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 6th International Conference on Biomedical Engineering and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3543081.3543094","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Existing gene expression analysis methods like microarray or RNA-sequencing are unable to resolve the complex mechanisms of progression of non-alcoholic fatty liver disease (NAFLD) due to insufficient accuracy and lack of phenotypic data. Particularly, incomplete phenotypic data in public liver gene expression cohorts have cumbered many studies on the progression of NAFLD. To address this issue, the cutting-edge pseudotime analysis is adopted to estimate liver health status in human liver gene expression data. A set of 25 genes differentially expressed between the healthy controls and the NAFLD group samples are identified by differential expression (DE) Analysis. The identified DE genes separate the NAFLD patients and the healthy controls in hierarchical clustering, and their related biological pathways are highly relevant to liver signaling and injury, implying the close relationship between the DE gene expressions and NAFLD. What's more, the pseudotime analysis we conducted simulates the deterioration of NAFLD by using liver fat percent to represent NAFLD severity and aligning the candidate samples on the estimated trajectory according to their respective gene expression and covariates; we verified the pseudotime model using another microarray cohort. The verified pseudotime model is further applied to an RNA-Seq cohort (GTEx) to estimate the liver health status of samples that lacked phenotypic details. This model recurs the timeline of NAFLD progression and verifies the potential key roles of the expression of DE genes in this process. In conclusion, the expressions of the genes and their changes in distinct groups of samples are chronologically consistent with the progression of NAFLD severity. The pseudotime model can be used to impute the missing NAFLD phenotypes in public liver gene expression cohorts.
伪时间分析推测公共RNA-Seq队列中缺失的肝脏NAFLD状态
现有的基因表达分析方法,如微阵列或rna测序,由于准确性不足和缺乏表型数据,无法解决非酒精性脂肪性肝病(NAFLD)进展的复杂机制。特别是,公共肝脏基因表达队列中不完整的表型数据阻碍了许多关于NAFLD进展的研究。为了解决这一问题,采用前沿的伪时间分析来估计人类肝脏基因表达数据中的肝脏健康状况。通过差异表达(DE)分析确定了健康对照组与NAFLD组样本之间的25个差异表达基因。所鉴定的DE基因在分层聚类中将NAFLD患者与健康对照区分开,其相关生物学通路与肝脏信号和损伤高度相关,提示DE基因表达与NAFLD密切相关。更重要的是,我们进行的伪时间分析模拟了NAFLD的恶化,使用肝脏脂肪百分比代表NAFLD的严重程度,并根据各自的基因表达和协变量将候选样本对准估计的轨迹;我们使用另一个微阵列队列验证了伪时间模型。验证的伪时间模型进一步应用于RNA-Seq队列(GTEx),以估计缺乏表型细节的样本的肝脏健康状况。该模型重现了NAFLD进展的时间表,并验证了DE基因表达在这一过程中的潜在关键作用。总之,这些基因的表达及其在不同样本组中的变化与NAFLD严重程度的进展在时间上是一致的。伪时间模型可用于估算公共肝基因表达队列中缺失的NAFLD表型。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信