HC-UAP: Outliers detection method based-on hierarchical clustering for universally aligned time-series RNA-Seq profiles

IF 1.1 Q3 OPERATIONS RESEARCH & MANAGEMENT SCIENCE

Decision Science Letters Pub Date : 2023-01-01 DOI:10.5267/j.dsl.2022.10.004

A. Alkhateeb

{"title":"HC-UAP: Outliers detection method based-on hierarchical clustering for universally aligned time-series RNA-Seq profiles","authors":"A. Alkhateeb","doi":"10.5267/j.dsl.2022.10.004","DOIUrl":null,"url":null,"abstract":"Tracking abundant gene transcripts quantification over continuous cancer progression stages may reveal the mechanism of disease advancement. In this work, we profile the transcript quantification over the stages using a time-series approach, in which the stages/sub-stages of the disease are the time points, and the quantification measurements are the values. The values over time points are used to interpolate the growth of the progression using the cubic spline function. Then, the transcripts profiles are universally aligned and clustered using the time-series profile hierarchical clustering method based on the area between each pair of the aligned profiles; the method is named (HC-UAP). We compare the proposed method with a hierarchical clustering method based on Euclidean distance (HC-ED). Both methods were applied on two next-generation sequencing (NGS) prostate cancer datasets, the first from the Chinese and the second from the North American population. HC-ED clusters the dataset to find patterns while HC-UAP was able to single out outliers that trend differently in both datasets. While finding patterns in gene expression that trend over stages is the standard approach for analyzing time-series models, identifying outlier transcripts that grow differently than other transcripts can provide more details about the contribution of the mRNA transcriptional activity to the disease. They also can be a potential biomarker for the disease progression.","PeriodicalId":38141,"journal":{"name":"Decision Science Letters","volume":"5 1","pages":""},"PeriodicalIF":1.1000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Decision Science Letters","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5267/j.dsl.2022.10.004","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"OPERATIONS RESEARCH & MANAGEMENT SCIENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Tracking abundant gene transcripts quantification over continuous cancer progression stages may reveal the mechanism of disease advancement. In this work, we profile the transcript quantification over the stages using a time-series approach, in which the stages/sub-stages of the disease are the time points, and the quantification measurements are the values. The values over time points are used to interpolate the growth of the progression using the cubic spline function. Then, the transcripts profiles are universally aligned and clustered using the time-series profile hierarchical clustering method based on the area between each pair of the aligned profiles; the method is named (HC-UAP). We compare the proposed method with a hierarchical clustering method based on Euclidean distance (HC-ED). Both methods were applied on two next-generation sequencing (NGS) prostate cancer datasets, the first from the Chinese and the second from the North American population. HC-ED clusters the dataset to find patterns while HC-UAP was able to single out outliers that trend differently in both datasets. While finding patterns in gene expression that trend over stages is the standard approach for analyzing time-series models, identifying outlier transcripts that grow differently than other transcripts can provide more details about the contribution of the mRNA transcriptional activity to the disease. They also can be a potential biomarker for the disease progression.

查看原文本刊更多论文

HC-UAP:基于分层聚类的普遍对齐时间序列RNA-Seq图谱异常点检测方法

在连续的癌症进展阶段跟踪丰富的基因转录物定量可能揭示疾病进展的机制。在这项工作中，我们使用时间序列方法对各阶段的转录本量化进行了分析，其中疾病的阶段/亚阶段是时间点，量化测量是值。随时间点的值用于使用三次样条函数插值级数的增长。然后，利用时间序列序列分层聚类方法对转录本图谱进行全局对齐和聚类;该方法被命名为(HC-UAP)。我们将该方法与基于欧几里得距离(HC-ED)的分层聚类方法进行了比较。这两种方法都应用于两个下一代测序(NGS)前列腺癌数据集，第一个来自中国，第二个来自北美人群。HC-ED对数据集进行聚类以发现模式，而HC-UAP能够挑出两个数据集中趋势不同的异常值。虽然发现基因表达在各个阶段的趋势模式是分析时间序列模型的标准方法，但识别与其他转录物生长不同的异常转录物可以提供更多关于mRNA转录活性对疾病贡献的细节。它们也可能是疾病进展的潜在生物标志物。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊