{"title":"Penentuan Kekerabatan Hewan Berdasarkan Struktur Protein IGF2 Menggunakan Metode K-Means dan N-Gram","authors":"Ruth Ema Febrita, Maghfirotul Amaniyah","doi":"10.31294/inf.v9i2.13808","DOIUrl":null,"url":null,"abstract":"In Biology, there were various ways to determine the closeness between two individuals, such as by observing the similarity of physical morphologies then making a dendogram and also by making a phylogenetic tree to trace the kinship based on the evolutionary history. However, this approach is very difficult to do if the animal whose relatives are to be determined is not in a living condition, so it is very difficult to observe the existing physical characteristics. This study aims to provide a different approach in determining animal kinship using clustering algorithm to cluster the IGF2 protein structures. Kinship is determined using the K-Means clustering method. N-gram technique is used to break the sequence into several subsequences with the same length, because each sequence can have various length. Grouping with the K-Means method had been done and got the best results on the number of clusters as many as seven clusters, with an average silhouette coefficient of 0.331, a purityindex of 0.735, and a precisionof 0.823 which indicates the clustering process is quite effective.","PeriodicalId":32029,"journal":{"name":"Proxies Jurnal Informatika","volume":"210 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2022-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proxies Jurnal Informatika","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.31294/inf.v9i2.13808","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In Biology, there were various ways to determine the closeness between two individuals, such as by observing the similarity of physical morphologies then making a dendogram and also by making a phylogenetic tree to trace the kinship based on the evolutionary history. However, this approach is very difficult to do if the animal whose relatives are to be determined is not in a living condition, so it is very difficult to observe the existing physical characteristics. This study aims to provide a different approach in determining animal kinship using clustering algorithm to cluster the IGF2 protein structures. Kinship is determined using the K-Means clustering method. N-gram technique is used to break the sequence into several subsequences with the same length, because each sequence can have various length. Grouping with the K-Means method had been done and got the best results on the number of clusters as many as seven clusters, with an average silhouette coefficient of 0.331, a purityindex of 0.735, and a precisionof 0.823 which indicates the clustering process is quite effective.