Mining Biological Information from 3D Medulloblastoma Cancerous Gene Expression Data Using TimesVector Triclustering Method

2020 4th International Conference on Informatics and Computational Sciences (ICICoS) Pub Date : 2020-11-10 DOI:10.1109/ICICoS51170.2020.9299108

Ika Marta Sari, S. Soemartojo, T. Siswantining, Devvi Sarwinda

{"title":"Mining Biological Information from 3D Medulloblastoma Cancerous Gene Expression Data Using TimesVector Triclustering Method","authors":"Ika Marta Sari, S. Soemartojo, T. Siswantining, Devvi Sarwinda","doi":"10.1109/ICICoS51170.2020.9299108","DOIUrl":null,"url":null,"abstract":"Triclustering analysis is the development of clustering analysis and biclustering analysis. The purpose of triclustering study is to group three-dimensional data simultaneously. The three-dimensional data can be in the form of observations, attributes, and context. One of the approaches used in tricluster analysis, namely an approach based on sample patterns, is the TimesVector method. The TimesVector method aims to group data matrices that show the same or different patterns in three-dimensional data. The TimesVector method has a work step that starts with reducing the three-dimensional data matrix to a two-dimensional data matrix to minimize complexity in the grouping. In this method, the Spherical K-means algorithm will be used in cluster it. The next step is to identify the pattern of the groups generated in the Spherical K-means. The pattern referred to consists of three types, namely DEP (Differentiated Patterns), ODEP (Differentiated Patterns), and SEP (Differentiated Patterns). The TimesVector method was applied on gene expression data, namely medulloblastoma cancerous data carried out in 6 scenarios. Each scenario uses the same many clusters but different threshold values. The six scenarios’ results will be validated using the coverage value and the tricluster diffusion (TD) value. The application of the TimesVector method shows that using a threshold of 1.5 gives the most optimal results because it has a high coverage value and a low TD value. High-value coverage indicates the method’s ability to extract data, and a low TD value suggests that the resulting tricluster has a large volume and high coherence. The best tricluster results can be used by medical experts to perform further actions on medulloblastoma cancerous patients.","PeriodicalId":122803,"journal":{"name":"2020 4th International Conference on Informatics and Computational Sciences (ICICoS)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 4th International Conference on Informatics and Computational Sciences (ICICoS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICICoS51170.2020.9299108","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Triclustering analysis is the development of clustering analysis and biclustering analysis. The purpose of triclustering study is to group three-dimensional data simultaneously. The three-dimensional data can be in the form of observations, attributes, and context. One of the approaches used in tricluster analysis, namely an approach based on sample patterns, is the TimesVector method. The TimesVector method aims to group data matrices that show the same or different patterns in three-dimensional data. The TimesVector method has a work step that starts with reducing the three-dimensional data matrix to a two-dimensional data matrix to minimize complexity in the grouping. In this method, the Spherical K-means algorithm will be used in cluster it. The next step is to identify the pattern of the groups generated in the Spherical K-means. The pattern referred to consists of three types, namely DEP (Differentiated Patterns), ODEP (Differentiated Patterns), and SEP (Differentiated Patterns). The TimesVector method was applied on gene expression data, namely medulloblastoma cancerous data carried out in 6 scenarios. Each scenario uses the same many clusters but different threshold values. The six scenarios’ results will be validated using the coverage value and the tricluster diffusion (TD) value. The application of the TimesVector method shows that using a threshold of 1.5 gives the most optimal results because it has a high coverage value and a low TD value. High-value coverage indicates the method’s ability to extract data, and a low TD value suggests that the resulting tricluster has a large volume and high coherence. The best tricluster results can be used by medical experts to perform further actions on medulloblastoma cancerous patients.

查看原文本刊更多论文

利用时间向量三聚类方法从成神经管细胞瘤癌基因表达数据中挖掘生物信息

三聚类分析是聚类分析和双聚类分析的发展。三聚类研究的目的是同时对三维数据进行分组。三维数据可以采用观察值、属性和上下文的形式。三聚类分析中使用的方法之一，即基于样本模式的方法，是时间向量方法。TimesVector方法旨在对三维数据中显示相同或不同模式的数据矩阵进行分组。TimesVector方法有一个工作步骤，首先将三维数据矩阵简化为二维数据矩阵，以最小化分组中的复杂性。在该方法中，将使用球面k均值算法对其进行聚类。下一步是确定在球形K-means中生成的组的模式。这里所说的模式包括三种类型，分别是DEP (Differentiated Patterns)、ODEP (Differentiated Patterns)和SEP (Differentiated Patterns)。将TimesVector方法应用于基因表达数据，即髓母细胞瘤在6种情况下的癌性数据。每个场景使用相同数量的集群，但阈值不同。将使用覆盖值和三簇扩散(TD)值验证六个场景的结果。TimesVector方法的应用表明，使用1.5的阈值具有较高的覆盖值和较低的TD值，可以获得最优的结果。高覆盖率表明该方法能够提取数据，低TD值表明得到的三聚体体积大，相干性高。最好的三聚体结果可用于医学专家对成神经管细胞瘤癌患者进行进一步的行动。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2020 4th International Conference on Informatics and Computational Sciences (ICICoS)

自引率

0.00%

发文量