{"title":"Clustering Datasets in Cloud Computing Environment for User Identification","authors":"Shallaw Mohammed Ali, G. Kecskeméti","doi":"10.1109/pdp55904.2022.00033","DOIUrl":null,"url":null,"abstract":"Users’ behaviours show a noticeable impact on cloud computing resources. Behaviour prediction models could foster usage awareness of cloud users. This requires training prediction models with datasets that provide user information. Unfortunately, such information is excluded from many relevant datasets. Therefore, in this work, we investigate the ability of extracting these identities via clustering methods. We conduct this by categorising workload datasets according to the availability of users information in their attributes. Then, we focus our attention on shared attributes between user information disclosing and non-disclosing datasets. Eventually, we evaluated the potential of several clustering approaches on user information disclosing datasets. Our results show that users’ identifications can be extracted with relatively high accuracy using clustering. They also show that the highest clustering precision is mostly obtained from the attributes representing request components that strongly relate to the user’s application.","PeriodicalId":210759,"journal":{"name":"2022 30th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 30th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/pdp55904.2022.00033","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Users’ behaviours show a noticeable impact on cloud computing resources. Behaviour prediction models could foster usage awareness of cloud users. This requires training prediction models with datasets that provide user information. Unfortunately, such information is excluded from many relevant datasets. Therefore, in this work, we investigate the ability of extracting these identities via clustering methods. We conduct this by categorising workload datasets according to the availability of users information in their attributes. Then, we focus our attention on shared attributes between user information disclosing and non-disclosing datasets. Eventually, we evaluated the potential of several clustering approaches on user information disclosing datasets. Our results show that users’ identifications can be extracted with relatively high accuracy using clustering. They also show that the highest clustering precision is mostly obtained from the attributes representing request components that strongly relate to the user’s application.