The analysis of association rules: Latent class analysis

IF 2.1 4区数学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Statistical Analysis and Data Mining Pub Date : 2024-05-01 DOI:10.1002/sam.11686

Ron S. Kenett, Chris Gotwalt

{"title":"The analysis of association rules: Latent class analysis","authors":"Ron S. Kenett, Chris Gotwalt","doi":"10.1002/sam.11686","DOIUrl":null,"url":null,"abstract":"Association rules are used to extract information from transactional databases with a collection of items also called “tokens” or “words.” The aim of association rule analysis is to indicate what and how items go with what items in a set of transactions called “documents.” This approach is used in the analysis of text records, of blogs in social media and of shopping baskets. We present here an approach to analyze documents using latent class analysis (LCA) clustering of document term matrices. A document term matrix (DTM) consists of rows referring to documents and columns corresponding to items. In binary weights, “1” indicates the presence of a term in a document and “0” otherwise. The clustering of similar documents provides stratified data sets used to enhance the interpretability of measures of interest such as lift, odds ratios and relative linkage disequilibrium. The article demonstrates the approach with two case studies. A first example consists of comments recorded in a survey aimed at pet owners. A second, much larger example, is based on online reviews to crocs sandals. Association rules describe combinations of terms in the pet survey and crocs reviews. In Section 3, we compute, for these case studies, association rule measures of interest defined in Section 2. We first introduce the case studies to motivate the methods proposed here. In Section 4, we provide a new approach with an enhanced interpretations of measures such as lift by comparing them across clusters derived from an LCA of the DTM. A key result is the application of clustered data in analyzing observational data. This enhances generalizability and interpretability of findings from text analytics. The article concludes with a discussion in Section 5.","PeriodicalId":48684,"journal":{"name":"Statistical Analysis and Data Mining","volume":"104 1","pages":""},"PeriodicalIF":2.1000,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Statistical Analysis and Data Mining","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1002/sam.11686","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Association rules are used to extract information from transactional databases with a collection of items also called “tokens” or “words.” The aim of association rule analysis is to indicate what and how items go with what items in a set of transactions called “documents.” This approach is used in the analysis of text records, of blogs in social media and of shopping baskets. We present here an approach to analyze documents using latent class analysis (LCA) clustering of document term matrices. A document term matrix (DTM) consists of rows referring to documents and columns corresponding to items. In binary weights, “1” indicates the presence of a term in a document and “0” otherwise. The clustering of similar documents provides stratified data sets used to enhance the interpretability of measures of interest such as lift, odds ratios and relative linkage disequilibrium. The article demonstrates the approach with two case studies. A first example consists of comments recorded in a survey aimed at pet owners. A second, much larger example, is based on online reviews to crocs sandals. Association rules describe combinations of terms in the pet survey and crocs reviews. In Section 3, we compute, for these case studies, association rule measures of interest defined in Section 2. We first introduce the case studies to motivate the methods proposed here. In Section 4, we provide a new approach with an enhanced interpretations of measures such as lift by comparing them across clusters derived from an LCA of the DTM. A key result is the application of clustered data in analyzing observational data. This enhances generalizability and interpretability of findings from text analytics. The article concludes with a discussion in Section 5.

查看原文本刊更多论文

关联规则分析潜类分析

关联规则用于从事务数据库中提取信息，数据库中的项目集合也称为 "标记 "或 "词"。关联规则分析的目的是指出在一组被称为 "文档 "的事务中，哪些项目与哪些项目有关联，以及如何关联。这种方法可用于分析文本记录、社交媒体中的博客和购物篮。我们在此介绍一种使用文档术语矩阵的潜在类分析（LCA）聚类来分析文档的方法。文档术语矩阵（DTM）由指文档的行和对应项目的列组成。在二进制权重中，"1 "表示文档中存在某个术语，"0 "表示不存在。相似文档的聚类提供了分层数据集，用于提高相关度量的可解释性，如提升率、几率比和相对联系不平衡。文章通过两个案例研究展示了这一方法。第一个例子是在一项针对宠物主人的调查中记录的评论。第二个更大的例子是基于对鳄鱼凉鞋的在线评论。关联规则描述了宠物调查和 Crocs 评论中的术语组合。在第 3 节中，我们将针对这些案例研究计算第 2 节中定义的关联规则度量。我们首先介绍了案例研究，以激发本文提出的方法。在第 4 节中，我们提供了一种新方法，通过比较 DTM 的 LCA 得出的聚类，加强了对提升等指标的解释。一个关键结果是在分析观测数据时应用聚类数据。这提高了文本分析结果的可推广性和可解释性。文章最后在第 5 节进行了讨论。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Statistical Analysis and Data Mining COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCEC-COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

CiteScore

3.20

自引率

7.70%

发文量

期刊介绍： Statistical Analysis and Data Mining addresses the broad area of data analysis, including statistical approaches, machine learning, data mining, and applications. Topics include statistical and computational approaches for analyzing massive and complex datasets, novel statistical and/or machine learning methods and theory, and state-of-the-art applications with high impact. Of special interest are articles that describe innovative analytical techniques, and discuss their application to real problems, in such a way that they are accessible and beneficial to domain experts across science, engineering, and commerce. The focus of the journal is on papers which satisfy one or more of the following criteria: Solve data analysis problems associated with massive, complex datasets Develop innovative statistical approaches, machine learning algorithms, or methods integrating ideas across disciplines, e.g., statistics, computer science, electrical engineering, operation research. Formulate and solve high-impact real-world problems which challenge existing paradigms via new statistical and/or computational models Provide survey to prominent research topics.