{"title":"感官研究中聚类结果的不稳定性及解决策略","authors":"Rajesh Kumar, Edgar Chambers","doi":"10.3389/frfst.2024.1271193","DOIUrl":null,"url":null,"abstract":"Researchers commonly use hierarchical clustering (HC) or k-means (KM) for grouping products, attributes, or consumers. However, the results produced by these approaches can differ widely depending on the specific methods used or the initial “seed” aka “starting cluster centroid” chosen in clustering. Although recommendations for various clustering techniques have been made, the realities are that objects in groups can, and do, change their clusters. That can impact interpretation of the data. Researchers usually does not run the clustering algorithms multiple times to determine stability, nor do they often run multiple methods of clustering although that has been recommended previously. This study applied hierarchical agglomerative clustering (HAC), KM and fuzzy clustering (FC) to a large descriptive sensory data set and compared attribute clusters from the methods, including multiple iterations of same methods. Sensory attributes (objects) shuffled among clusters in varying ways, which could provide different interpretations of the data. That frequency was captured in the KM output and used to form the “best possible” clusters via manual clustering (MC). The HAC and FC results were studied and compared with KM results. Attribute correlation coefficients also were compared with clustering information. Using results from one clustering approach may not be reliable, and results should be confirmed using other clustering approaches. A strategy that combines multiple clustering approaches, including a MC process is suggested to determine consistent clusters in sensory data sets.","PeriodicalId":93753,"journal":{"name":"Frontiers in food science and technology","volume":"44 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Unreliability of clustering results in sensory studies and a strategy to address the issue\",\"authors\":\"Rajesh Kumar, Edgar Chambers\",\"doi\":\"10.3389/frfst.2024.1271193\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Researchers commonly use hierarchical clustering (HC) or k-means (KM) for grouping products, attributes, or consumers. However, the results produced by these approaches can differ widely depending on the specific methods used or the initial “seed” aka “starting cluster centroid” chosen in clustering. Although recommendations for various clustering techniques have been made, the realities are that objects in groups can, and do, change their clusters. That can impact interpretation of the data. Researchers usually does not run the clustering algorithms multiple times to determine stability, nor do they often run multiple methods of clustering although that has been recommended previously. This study applied hierarchical agglomerative clustering (HAC), KM and fuzzy clustering (FC) to a large descriptive sensory data set and compared attribute clusters from the methods, including multiple iterations of same methods. Sensory attributes (objects) shuffled among clusters in varying ways, which could provide different interpretations of the data. That frequency was captured in the KM output and used to form the “best possible” clusters via manual clustering (MC). The HAC and FC results were studied and compared with KM results. Attribute correlation coefficients also were compared with clustering information. Using results from one clustering approach may not be reliable, and results should be confirmed using other clustering approaches. A strategy that combines multiple clustering approaches, including a MC process is suggested to determine consistent clusters in sensory data sets.\",\"PeriodicalId\":93753,\"journal\":{\"name\":\"Frontiers in food science and technology\",\"volume\":\"44 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-04-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Frontiers in food science and technology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3389/frfst.2024.1271193\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in food science and technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/frfst.2024.1271193","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
研究人员通常使用分层聚类(HC)或 K-均值(KM)对产品、属性或消费者进行分组。然而,这些方法产生的结果可能会有很大差异,这取决于使用的具体方法或聚类时选择的初始 "种子"(又称 "起始聚类中心点")。虽然已经对各种聚类技术提出了建议,但现实情况是,群组中的对象可能而且确实会改变其聚类。这可能会影响对数据的解释。研究人员通常不会多次运行聚类算法来确定稳定性,也不会经常运行多种聚类方法,尽管以前曾有过这样的建议。本研究将分层聚类(HAC)、KM 和模糊聚类(FC)应用于一个大型的描述性感官数据集,并比较了这些方法的属性聚类,包括相同方法的多次迭代。感官属性(对象)以不同的方式在聚类间移动,这可能会对数据产生不同的解释。这种频率被记录在 KM 输出中,并通过手动聚类(MC)形成 "最佳 "聚类。对 HAC 和 FC 结果进行了研究,并与 KM 结果进行了比较。属性相关系数也与聚类信息进行了比较。使用一种聚类方法得出的结果可能并不可靠,应使用其他聚类方法对结果进行确认。建议采用一种结合多种聚类方法(包括 MC 过程)的策略来确定感官数据集中的一致聚类。
Unreliability of clustering results in sensory studies and a strategy to address the issue
Researchers commonly use hierarchical clustering (HC) or k-means (KM) for grouping products, attributes, or consumers. However, the results produced by these approaches can differ widely depending on the specific methods used or the initial “seed” aka “starting cluster centroid” chosen in clustering. Although recommendations for various clustering techniques have been made, the realities are that objects in groups can, and do, change their clusters. That can impact interpretation of the data. Researchers usually does not run the clustering algorithms multiple times to determine stability, nor do they often run multiple methods of clustering although that has been recommended previously. This study applied hierarchical agglomerative clustering (HAC), KM and fuzzy clustering (FC) to a large descriptive sensory data set and compared attribute clusters from the methods, including multiple iterations of same methods. Sensory attributes (objects) shuffled among clusters in varying ways, which could provide different interpretations of the data. That frequency was captured in the KM output and used to form the “best possible” clusters via manual clustering (MC). The HAC and FC results were studied and compared with KM results. Attribute correlation coefficients also were compared with clustering information. Using results from one clustering approach may not be reliable, and results should be confirmed using other clustering approaches. A strategy that combines multiple clustering approaches, including a MC process is suggested to determine consistent clusters in sensory data sets.