Ika Marta Sari, S. Soemartojo, T. Siswantining, Devvi Sarwinda
{"title":"Mining Biological Information from 3D Medulloblastoma Cancerous Gene Expression Data Using TimesVector Triclustering Method","authors":"Ika Marta Sari, S. Soemartojo, T. Siswantining, Devvi Sarwinda","doi":"10.1109/ICICoS51170.2020.9299108","DOIUrl":null,"url":null,"abstract":"Triclustering analysis is the development of clustering analysis and biclustering analysis. The purpose of triclustering study is to group three-dimensional data simultaneously. The three-dimensional data can be in the form of observations, attributes, and context. One of the approaches used in tricluster analysis, namely an approach based on sample patterns, is the TimesVector method. The TimesVector method aims to group data matrices that show the same or different patterns in three-dimensional data. The TimesVector method has a work step that starts with reducing the three-dimensional data matrix to a two-dimensional data matrix to minimize complexity in the grouping. In this method, the Spherical K-means algorithm will be used in cluster it. The next step is to identify the pattern of the groups generated in the Spherical K-means. The pattern referred to consists of three types, namely DEP (Differentiated Patterns), ODEP (Differentiated Patterns), and SEP (Differentiated Patterns). The TimesVector method was applied on gene expression data, namely medulloblastoma cancerous data carried out in 6 scenarios. Each scenario uses the same many clusters but different threshold values. The six scenarios’ results will be validated using the coverage value and the tricluster diffusion (TD) value. The application of the TimesVector method shows that using a threshold of 1.5 gives the most optimal results because it has a high coverage value and a low TD value. High-value coverage indicates the method’s ability to extract data, and a low TD value suggests that the resulting tricluster has a large volume and high coherence. The best tricluster results can be used by medical experts to perform further actions on medulloblastoma cancerous patients.","PeriodicalId":122803,"journal":{"name":"2020 4th International Conference on Informatics and Computational Sciences (ICICoS)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 4th International Conference on Informatics and Computational Sciences (ICICoS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICICoS51170.2020.9299108","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Triclustering analysis is the development of clustering analysis and biclustering analysis. The purpose of triclustering study is to group three-dimensional data simultaneously. The three-dimensional data can be in the form of observations, attributes, and context. One of the approaches used in tricluster analysis, namely an approach based on sample patterns, is the TimesVector method. The TimesVector method aims to group data matrices that show the same or different patterns in three-dimensional data. The TimesVector method has a work step that starts with reducing the three-dimensional data matrix to a two-dimensional data matrix to minimize complexity in the grouping. In this method, the Spherical K-means algorithm will be used in cluster it. The next step is to identify the pattern of the groups generated in the Spherical K-means. The pattern referred to consists of three types, namely DEP (Differentiated Patterns), ODEP (Differentiated Patterns), and SEP (Differentiated Patterns). The TimesVector method was applied on gene expression data, namely medulloblastoma cancerous data carried out in 6 scenarios. Each scenario uses the same many clusters but different threshold values. The six scenarios’ results will be validated using the coverage value and the tricluster diffusion (TD) value. The application of the TimesVector method shows that using a threshold of 1.5 gives the most optimal results because it has a high coverage value and a low TD value. High-value coverage indicates the method’s ability to extract data, and a low TD value suggests that the resulting tricluster has a large volume and high coherence. The best tricluster results can be used by medical experts to perform further actions on medulloblastoma cancerous patients.