{"title":"Determination of the minimum sample size in microarray experiments to cluster genes using k-means clustering","authors":"Fang-Xiang Wu, W. Zhang, A. Kusalik","doi":"10.1109/BIBE.2003.1188979","DOIUrl":null,"url":null,"abstract":"Gene expression profiles obtained from time-series microarray experiments can reveal important information about biological processes. However, conducting such experiments is costly and time consuming. The cost and time required are linearly proportional to sample size. Therefore, it is worthwhile to provide a way to determine the minimal number of samples or trials required in a microarray experiment. One of the uses of microarray hybridization experiments is to group together genes with similar patterns of the expression using clustering techniques. In this paper, the k-means clustering technique is used. The basic idea of our approach is an incremental process in which testing, analysis and evaluation are integrated and iterated. The process is terminated when the evaluation of the results of two consecutive experiments shows they are sufficiently close. Two measures of \"closeness\" are proposed and two real microarray datasets are used to validate our approach. The results show that the sample size required to cluster genes in these two datasets can be reduced; i.e. the same results can be achieved with less cost. The approach can be used with other clustering techniques as well.","PeriodicalId":178814,"journal":{"name":"Third IEEE Symposium on Bioinformatics and Bioengineering, 2003. Proceedings.","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2003-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Third IEEE Symposium on Bioinformatics and Bioengineering, 2003. Proceedings.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BIBE.2003.1188979","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 11
Abstract
Gene expression profiles obtained from time-series microarray experiments can reveal important information about biological processes. However, conducting such experiments is costly and time consuming. The cost and time required are linearly proportional to sample size. Therefore, it is worthwhile to provide a way to determine the minimal number of samples or trials required in a microarray experiment. One of the uses of microarray hybridization experiments is to group together genes with similar patterns of the expression using clustering techniques. In this paper, the k-means clustering technique is used. The basic idea of our approach is an incremental process in which testing, analysis and evaluation are integrated and iterated. The process is terminated when the evaluation of the results of two consecutive experiments shows they are sufficiently close. Two measures of "closeness" are proposed and two real microarray datasets are used to validate our approach. The results show that the sample size required to cluster genes in these two datasets can be reduced; i.e. the same results can be achieved with less cost. The approach can be used with other clustering techniques as well.