{"title":"Investigating cluster stability when analyzing transaction logs","authors":"D. Grech, Paul D. Clough","doi":"10.1145/2910896.2910923","DOIUrl":null,"url":null,"abstract":"Data-driven approaches have become increasingly popular as a means for analyzing transaction logs from web search engines and digital libraries, for example using cluster analysis to identify common patterns of search and navigation behavior. However, steps must be taken to ensure that results are reliable and repeatable. Although clustering patterns of user interaction behavior has been previously explored, one aspect that has received less attention is cluster stability that can be used to aid cluster validation. In this paper we compute stability based on the Jaccard coefficient to investigate the cluster stability when using different subsets of transaction log data from WorldCat.org. Results provide insights into different types of search behaviors and highlight that clusters of varying degrees of stability will result from the clustering process. However, we show that additional investigation beyond the results of cluster stability is required to fully validate the resulting clusters.","PeriodicalId":109613,"journal":{"name":"2016 IEEE/ACM Joint Conference on Digital Libraries (JCDL)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE/ACM Joint Conference on Digital Libraries (JCDL)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2910896.2910923","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5
Abstract
Data-driven approaches have become increasingly popular as a means for analyzing transaction logs from web search engines and digital libraries, for example using cluster analysis to identify common patterns of search and navigation behavior. However, steps must be taken to ensure that results are reliable and repeatable. Although clustering patterns of user interaction behavior has been previously explored, one aspect that has received less attention is cluster stability that can be used to aid cluster validation. In this paper we compute stability based on the Jaccard coefficient to investigate the cluster stability when using different subsets of transaction log data from WorldCat.org. Results provide insights into different types of search behaviors and highlight that clusters of varying degrees of stability will result from the clustering process. However, we show that additional investigation beyond the results of cluster stability is required to fully validate the resulting clusters.