HC-UAP: Outliers detection method based-on hierarchical clustering for universally aligned time-series RNA-Seq profiles

IF 1.4 Q3 OPERATIONS RESEARCH & MANAGEMENT SCIENCE
A. Alkhateeb
{"title":"HC-UAP: Outliers detection method based-on hierarchical clustering for universally aligned time-series RNA-Seq profiles","authors":"A. Alkhateeb","doi":"10.5267/j.dsl.2022.10.004","DOIUrl":null,"url":null,"abstract":"Tracking abundant gene transcripts quantification over continuous cancer progression stages may reveal the mechanism of disease advancement. In this work, we profile the transcript quantification over the stages using a time-series approach, in which the stages/sub-stages of the disease are the time points, and the quantification measurements are the values. The values over time points are used to interpolate the growth of the progression using the cubic spline function. Then, the transcripts profiles are universally aligned and clustered using the time-series profile hierarchical clustering method based on the area between each pair of the aligned profiles; the method is named (HC-UAP). We compare the proposed method with a hierarchical clustering method based on Euclidean distance (HC-ED). Both methods were applied on two next-generation sequencing (NGS) prostate cancer datasets, the first from the Chinese and the second from the North American population. HC-ED clusters the dataset to find patterns while HC-UAP was able to single out outliers that trend differently in both datasets. While finding patterns in gene expression that trend over stages is the standard approach for analyzing time-series models, identifying outlier transcripts that grow differently than other transcripts can provide more details about the contribution of the mRNA transcriptional activity to the disease. They also can be a potential biomarker for the disease progression.","PeriodicalId":38141,"journal":{"name":"Decision Science Letters","volume":"5 1","pages":""},"PeriodicalIF":1.4000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Decision Science Letters","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5267/j.dsl.2022.10.004","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"OPERATIONS RESEARCH & MANAGEMENT SCIENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Tracking abundant gene transcripts quantification over continuous cancer progression stages may reveal the mechanism of disease advancement. In this work, we profile the transcript quantification over the stages using a time-series approach, in which the stages/sub-stages of the disease are the time points, and the quantification measurements are the values. The values over time points are used to interpolate the growth of the progression using the cubic spline function. Then, the transcripts profiles are universally aligned and clustered using the time-series profile hierarchical clustering method based on the area between each pair of the aligned profiles; the method is named (HC-UAP). We compare the proposed method with a hierarchical clustering method based on Euclidean distance (HC-ED). Both methods were applied on two next-generation sequencing (NGS) prostate cancer datasets, the first from the Chinese and the second from the North American population. HC-ED clusters the dataset to find patterns while HC-UAP was able to single out outliers that trend differently in both datasets. While finding patterns in gene expression that trend over stages is the standard approach for analyzing time-series models, identifying outlier transcripts that grow differently than other transcripts can provide more details about the contribution of the mRNA transcriptional activity to the disease. They also can be a potential biomarker for the disease progression.
HC-UAP:基于分层聚类的普遍对齐时间序列RNA-Seq图谱异常点检测方法
在连续的癌症进展阶段跟踪丰富的基因转录物定量可能揭示疾病进展的机制。在这项工作中,我们使用时间序列方法对各阶段的转录本量化进行了分析,其中疾病的阶段/亚阶段是时间点,量化测量是值。随时间点的值用于使用三次样条函数插值级数的增长。然后,利用时间序列序列分层聚类方法对转录本图谱进行全局对齐和聚类;该方法被命名为(HC-UAP)。我们将该方法与基于欧几里得距离(HC-ED)的分层聚类方法进行了比较。这两种方法都应用于两个下一代测序(NGS)前列腺癌数据集,第一个来自中国,第二个来自北美人群。HC-ED对数据集进行聚类以发现模式,而HC-UAP能够挑出两个数据集中趋势不同的异常值。虽然发现基因表达在各个阶段的趋势模式是分析时间序列模型的标准方法,但识别与其他转录物生长不同的异常转录物可以提供更多关于mRNA转录活性对疾病贡献的细节。它们也可能是疾病进展的潜在生物标志物。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Decision Science Letters
Decision Science Letters Decision Sciences-Decision Sciences (all)
CiteScore
3.40
自引率
5.30%
发文量
49
审稿时长
20 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信