{"title":"人类肠道微生物群的K-Means聚类","authors":"Wesam Sami Taie, Yasser Omar, A. Badr","doi":"10.1109/NCG.2018.8593154","DOIUrl":null,"url":null,"abstract":"According to most researches it is stated that 1–3% of the human body mass consist of microbiota. The gut and intestinal part of human has some several types of microorganisms, which is important for human health and diseases. Hence, understanding the behavior of the human gut and intestine microbiomes increase the chance of detecting and predicting the disease earlier to take the precautions for treatment. Time is an important measure for collecting more information about gut and intestine microbiota so, the proposed work used the 16S rRNA metagenomic approach which is a best suited approach that provides a knowledge-based way to understand the human microbiota much faster. The nucleotide database of bacterial 16S rRNA gene sequences isolated from human intestinal and fecal samples used to develop microbiota microarray that's includes Human Intestine Microbiomes, their Protein's Information and the weight of each protein in the dataset that's calculated used two efficient techniques such as KMeans Clustering Algorithm and Needleman-Wunsch Algorithm. This proposed work contribution highlights on avoiding time consumption of Needleman-Wunsch sequence alignment Algorithm on assigning weights to such large scale of proteins that counts 56117 Protein. In this work validation experiments, the microarray correctly identified genomic DNA from all 18bacterial species used. According to the analytical study of this approach on the dataset it proves that calculating the alignment distance for large amount of sequences become more efficient and faster when extracting some features that is considered an important factor in clustering the dataset into 8 clusters which reduce the runtime of full dataset from 2 years to 3 days. This microbiota microarrays will be clustered using Genetic algorithm taking into consideration the protein weight assigned by Needleman-Wunsch Algorithm to grouping the human intestine microbiomes' proteins to k clusters to get identity for proteins that has unknown structure and get the interaction between all proteins using Protein-Protein Interaction Model.","PeriodicalId":305464,"journal":{"name":"2018 21st Saudi Computer Society National Computer Conference (NCC)","volume":"78 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Clustering of Human Intestine Microbiomes with K-Means\",\"authors\":\"Wesam Sami Taie, Yasser Omar, A. Badr\",\"doi\":\"10.1109/NCG.2018.8593154\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"According to most researches it is stated that 1–3% of the human body mass consist of microbiota. The gut and intestinal part of human has some several types of microorganisms, which is important for human health and diseases. Hence, understanding the behavior of the human gut and intestine microbiomes increase the chance of detecting and predicting the disease earlier to take the precautions for treatment. Time is an important measure for collecting more information about gut and intestine microbiota so, the proposed work used the 16S rRNA metagenomic approach which is a best suited approach that provides a knowledge-based way to understand the human microbiota much faster. The nucleotide database of bacterial 16S rRNA gene sequences isolated from human intestinal and fecal samples used to develop microbiota microarray that's includes Human Intestine Microbiomes, their Protein's Information and the weight of each protein in the dataset that's calculated used two efficient techniques such as KMeans Clustering Algorithm and Needleman-Wunsch Algorithm. This proposed work contribution highlights on avoiding time consumption of Needleman-Wunsch sequence alignment Algorithm on assigning weights to such large scale of proteins that counts 56117 Protein. In this work validation experiments, the microarray correctly identified genomic DNA from all 18bacterial species used. According to the analytical study of this approach on the dataset it proves that calculating the alignment distance for large amount of sequences become more efficient and faster when extracting some features that is considered an important factor in clustering the dataset into 8 clusters which reduce the runtime of full dataset from 2 years to 3 days. This microbiota microarrays will be clustered using Genetic algorithm taking into consideration the protein weight assigned by Needleman-Wunsch Algorithm to grouping the human intestine microbiomes' proteins to k clusters to get identity for proteins that has unknown structure and get the interaction between all proteins using Protein-Protein Interaction Model.\",\"PeriodicalId\":305464,\"journal\":{\"name\":\"2018 21st Saudi Computer Society National Computer Conference (NCC)\",\"volume\":\"78 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 21st Saudi Computer Society National Computer Conference (NCC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/NCG.2018.8593154\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 21st Saudi Computer Society National Computer Conference (NCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NCG.2018.8593154","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Clustering of Human Intestine Microbiomes with K-Means
According to most researches it is stated that 1–3% of the human body mass consist of microbiota. The gut and intestinal part of human has some several types of microorganisms, which is important for human health and diseases. Hence, understanding the behavior of the human gut and intestine microbiomes increase the chance of detecting and predicting the disease earlier to take the precautions for treatment. Time is an important measure for collecting more information about gut and intestine microbiota so, the proposed work used the 16S rRNA metagenomic approach which is a best suited approach that provides a knowledge-based way to understand the human microbiota much faster. The nucleotide database of bacterial 16S rRNA gene sequences isolated from human intestinal and fecal samples used to develop microbiota microarray that's includes Human Intestine Microbiomes, their Protein's Information and the weight of each protein in the dataset that's calculated used two efficient techniques such as KMeans Clustering Algorithm and Needleman-Wunsch Algorithm. This proposed work contribution highlights on avoiding time consumption of Needleman-Wunsch sequence alignment Algorithm on assigning weights to such large scale of proteins that counts 56117 Protein. In this work validation experiments, the microarray correctly identified genomic DNA from all 18bacterial species used. According to the analytical study of this approach on the dataset it proves that calculating the alignment distance for large amount of sequences become more efficient and faster when extracting some features that is considered an important factor in clustering the dataset into 8 clusters which reduce the runtime of full dataset from 2 years to 3 days. This microbiota microarrays will be clustered using Genetic algorithm taking into consideration the protein weight assigned by Needleman-Wunsch Algorithm to grouping the human intestine microbiomes' proteins to k clusters to get identity for proteins that has unknown structure and get the interaction between all proteins using Protein-Protein Interaction Model.