{"title":"Finite-Mixture Structural Equation Models for Response-Based Segmentation and Unobserved Heterogeneity","authors":"K. Jedidi, Harsharanjeet S. Jagpal, W. DeSarbo","doi":"10.1287/MKSC.16.1.39","DOIUrl":null,"url":null,"abstract":"Two endemic problems face researchers in the social sciences e.g., Marketing, Economics, Psychology, and Finance: unobserved heterogeneity and measurement error in data. Structural equation modeling is a powerful tool for dealing with these difficulties using a simultaneous equation framework with unobserved constructs and manifest indicators which are error-prone. When estimating structural equation models, however, researchers frequently treat the data as if they were collected from a single population Muthen [Muthen, Bengt O. 1989. Latent variable modeling in heterogeneous populations. Psychometrika54 557--585.]. This assumption of homogeneity is often unrealistic. For example, in multidimensional expectancy value models, consumers from different market segments can have different belief structures Bagozzi [Bagozzi, Richard P. 1982. A field investigation of causal relations among cognitions, affect, intentions, and behavior. J. Marketing Res.19 562--584.]. Research in satisfaction suggests that consumer decision processes vary across segments Day [Day, Ralph L. 1977. Extending the concept of consumer satisfaction. W. D. Perreault, ed. Advances in Consumer Research, Vol. 4. Association for Consumer Research, Atlanta, 149--154.]. \n \nThis paper shows that aggregate analysis which ignores heterogeneity in structural equation models produces misleading results and that traditional fit statistics are not useful for detecting unobserved heterogeneity in the data. Furthermore, sequential analyses that first form groups using cluster analysis and then apply multigroup structural equation modeling are not satisfactory. \n \nWe develop a general finite mixture structural equation model that simultaneously treats heterogeneity and forms market segments in the context of a specified model structure where all the observed variables are measured with error. The model is considerably more general than cluster analysis, multigroup confirmatory factor analysis, and multigroup structural equation modeling. In particular, the model subsumes several specialized models including finite mixture simultaneous equation models, finite mixture confirmatory factor analysis, and finite mixture second-order factor analysis. \n \nThe finite mixture structural equation model should be of interest to academics in a wide range of disciplines e.g., Consumer Behavior, Marketing, Economics, Finance, Psychology, and Sociology where unobserved heterogeneity and measurement error are problematic. In addition, the model should be of interest to market researchers and product managers for two reasons. First, the model allows the manager to perform response-based segmentation using a consumer decision process model, while explicitly allowing for both measurement and structural error. Second, the model allows managers to detect unobserved moderating factors which account for heterogeneity. Once managers have identified the moderating factors, they can link segment membership to observable individual-level characteristics e.g., socioeconomic and demographic variables and improve marketing policy. \n \nWe applied the finite mixture structural equation model to a direct marketing study of customer satisfaction and estimated a large model with 8 unobserved constructs and 23 manifest indicators. The results show that there are three consumer segments that vary considerably in terms of the importance they attach to the various dimensions of satisfaction. In contrast, aggregate analysis is misleading because it incorrectly suggests that except for price all dimensions of satisfaction are significant for all consumers. Methodologically, the finite mixture model is robust; that is, the parameter estimates are stable under double cross-validation and the method can be used to test large models. Furthermore, the double cross-validation results show that the finite mixture model is superior to sequential data analysis strategies in terms of goodness-of-fit and interpretability. \n \nWe performed four simulation experiments to test the robustness of the algorithm using both recursive and nonrecursive model specifications. Specifically, we examined the robustness of different model selection criteria e.g., CAIC, BIC, and GFI in choosing the correct number of clusters for exactly identified and overidentified models assuming that the distributional form is correctly specified. We also examined the effect of distributional misspecification i.e., departures from multivariate normality on model performance. The results show that when the data are heterogeneous, the standard goodness-of-fit statistics for the aggregate model are not useful for detecting heterogeneity. Furthermore, parameter recovery is poor. For the finite mixture model, however, the BIC and CAIC criteria perform well in detecting heterogeneity and in identifying the true number of segments. In particular, parameter recovery for both the measurement and structural models is highly satisfactory. The finite mixture method is robust to distributional misspecification; in addition, the method significantly outperforms aggregate and sequential data analysis methods when the form of heterogeneity is misspecified i.e., the true model has random coefficients. \n \nResearchers and practitioners should only use the mixture methodology when substantive theory supports the structural equation model, a priori segmentation is infeasible, and theory suggests that the data are heterogeneous and belong to a finite number of unobserved groups. We expect these conditions to hold in many social science applications and, in particular, market segmentation studies. \n \nFuture research should focus on large-scale simulation studies to test the structural equation mixture model using a wide range of models and statistical distributions. Theoretical research should extend the model by allowing the mixing proportions to depend on prior information and/or subject-specific variables. Finally, in order to provide a fuller treatment of heterogeneity, we need to develop a general random coefficient structural equation model. Such a model is presently unavailable in the statistical and psychometric literatures.","PeriodicalId":219959,"journal":{"name":"ERN: Other Econometrics: Single Equation Models (Topic)","volume":"117 9","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1997-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"425","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ERN: Other Econometrics: Single Equation Models (Topic)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1287/MKSC.16.1.39","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 425
Abstract
Two endemic problems face researchers in the social sciences e.g., Marketing, Economics, Psychology, and Finance: unobserved heterogeneity and measurement error in data. Structural equation modeling is a powerful tool for dealing with these difficulties using a simultaneous equation framework with unobserved constructs and manifest indicators which are error-prone. When estimating structural equation models, however, researchers frequently treat the data as if they were collected from a single population Muthen [Muthen, Bengt O. 1989. Latent variable modeling in heterogeneous populations. Psychometrika54 557--585.]. This assumption of homogeneity is often unrealistic. For example, in multidimensional expectancy value models, consumers from different market segments can have different belief structures Bagozzi [Bagozzi, Richard P. 1982. A field investigation of causal relations among cognitions, affect, intentions, and behavior. J. Marketing Res.19 562--584.]. Research in satisfaction suggests that consumer decision processes vary across segments Day [Day, Ralph L. 1977. Extending the concept of consumer satisfaction. W. D. Perreault, ed. Advances in Consumer Research, Vol. 4. Association for Consumer Research, Atlanta, 149--154.].
This paper shows that aggregate analysis which ignores heterogeneity in structural equation models produces misleading results and that traditional fit statistics are not useful for detecting unobserved heterogeneity in the data. Furthermore, sequential analyses that first form groups using cluster analysis and then apply multigroup structural equation modeling are not satisfactory.
We develop a general finite mixture structural equation model that simultaneously treats heterogeneity and forms market segments in the context of a specified model structure where all the observed variables are measured with error. The model is considerably more general than cluster analysis, multigroup confirmatory factor analysis, and multigroup structural equation modeling. In particular, the model subsumes several specialized models including finite mixture simultaneous equation models, finite mixture confirmatory factor analysis, and finite mixture second-order factor analysis.
The finite mixture structural equation model should be of interest to academics in a wide range of disciplines e.g., Consumer Behavior, Marketing, Economics, Finance, Psychology, and Sociology where unobserved heterogeneity and measurement error are problematic. In addition, the model should be of interest to market researchers and product managers for two reasons. First, the model allows the manager to perform response-based segmentation using a consumer decision process model, while explicitly allowing for both measurement and structural error. Second, the model allows managers to detect unobserved moderating factors which account for heterogeneity. Once managers have identified the moderating factors, they can link segment membership to observable individual-level characteristics e.g., socioeconomic and demographic variables and improve marketing policy.
We applied the finite mixture structural equation model to a direct marketing study of customer satisfaction and estimated a large model with 8 unobserved constructs and 23 manifest indicators. The results show that there are three consumer segments that vary considerably in terms of the importance they attach to the various dimensions of satisfaction. In contrast, aggregate analysis is misleading because it incorrectly suggests that except for price all dimensions of satisfaction are significant for all consumers. Methodologically, the finite mixture model is robust; that is, the parameter estimates are stable under double cross-validation and the method can be used to test large models. Furthermore, the double cross-validation results show that the finite mixture model is superior to sequential data analysis strategies in terms of goodness-of-fit and interpretability.
We performed four simulation experiments to test the robustness of the algorithm using both recursive and nonrecursive model specifications. Specifically, we examined the robustness of different model selection criteria e.g., CAIC, BIC, and GFI in choosing the correct number of clusters for exactly identified and overidentified models assuming that the distributional form is correctly specified. We also examined the effect of distributional misspecification i.e., departures from multivariate normality on model performance. The results show that when the data are heterogeneous, the standard goodness-of-fit statistics for the aggregate model are not useful for detecting heterogeneity. Furthermore, parameter recovery is poor. For the finite mixture model, however, the BIC and CAIC criteria perform well in detecting heterogeneity and in identifying the true number of segments. In particular, parameter recovery for both the measurement and structural models is highly satisfactory. The finite mixture method is robust to distributional misspecification; in addition, the method significantly outperforms aggregate and sequential data analysis methods when the form of heterogeneity is misspecified i.e., the true model has random coefficients.
Researchers and practitioners should only use the mixture methodology when substantive theory supports the structural equation model, a priori segmentation is infeasible, and theory suggests that the data are heterogeneous and belong to a finite number of unobserved groups. We expect these conditions to hold in many social science applications and, in particular, market segmentation studies.
Future research should focus on large-scale simulation studies to test the structural equation mixture model using a wide range of models and statistical distributions. Theoretical research should extend the model by allowing the mixing proportions to depend on prior information and/or subject-specific variables. Finally, in order to provide a fuller treatment of heterogeneity, we need to develop a general random coefficient structural equation model. Such a model is presently unavailable in the statistical and psychometric literatures.
市场营销学、经济学、心理学和金融学等社会科学领域的研究人员面临两个普遍问题:数据中的未观察到的异质性和测量误差。结构方程建模是处理这些困难的有力工具,它使用了一个带有未观察到的结构和容易出错的明显指标的联立方程框架。然而,当估计结构方程模型时,研究人员经常将数据视为从单个人群中收集的数据[Muthen, Bengt O. 1989]。异质性群体的潜在变量模型。Psychometrika54 557 - 585]。这种同质性的假设通常是不现实的。例如,在多维期望值模型中,来自不同细分市场的消费者可以有不同的信念结构[Bagozzi, Richard P. 1982]。对认知、情感、意图和行为之间因果关系的实地调查。[j].市场科学与技术,2009,(5):1 - 2。满意度研究表明,消费者决策过程在不同的细分市场中存在差异[j]。延伸消费者满意的概念。W. D. Perreault主编,《消费者研究进展》第4卷。消费者研究协会,亚特兰大,149—154。本文表明,忽略结构方程模型异质性的聚集分析会产生误导性的结果,传统的拟合统计对检测数据中未观察到的异质性无效。此外,先用聚类分析形成群体,再用多群体结构方程模型进行序列分析的结果也不令人满意。我们开发了一个通用的有限混合结构方程模型,该模型同时处理异质性并在特定模型结构的背景下形成细分市场,其中所有观察到的变量都是带误差测量的。该模型比聚类分析、多组验证性因子分析和多组结构方程模型更为通用。特别地,该模型包含了有限混合联立方程模型、有限混合验证性因子分析和有限混合二阶因子分析等几种专门模型。有限混合结构方程模型应该引起许多学科的学者的兴趣,例如,消费者行为、市场营销、经济学、金融学、心理学和社会学,在这些学科中,未观察到的异质性和测量误差是有问题的。此外,由于两个原因,该模型应该引起市场研究人员和产品经理的兴趣。首先,该模型允许管理者使用消费者决策过程模型执行基于响应的分割,同时显式地允许度量和结构错误。其次,该模型允许管理人员发现未观察到的调节因素,这些因素解释了异质性。一旦管理者确定了调节因素,他们就可以将部门成员关系与可观察到的个人层面特征(如社会经济和人口变量)联系起来,并改进营销政策。我们将有限混合结构方程模型应用于直销顾客满意度的研究,并估计了一个包含8个未观察结构和23个明显指标的大模型。结果表明,有三个消费者细分,在他们对满意度的各个维度的重视程度方面差异很大。相比之下,汇总分析是误导性的,因为它错误地表明,除了价格以外,满意度的所有维度对所有消费者都是重要的。在方法上,有限混合模型具有鲁棒性;即参数估计在双重交叉验证下是稳定的,该方法可用于大型模型的检验。此外,双交叉验证结果表明,有限混合模型在拟合优度和可解释性方面优于序列数据分析策略。我们使用递归和非递归模型规范进行了四个仿真实验来测试算法的鲁棒性。具体而言,我们检查了不同模型选择标准的鲁棒性,例如,CAIC, BIC和GFI,在为准确识别和过度识别的模型选择正确数量的集群时,假设分布形式是正确指定的。我们还检查了分布错误规范的影响,即偏离多元正态性对模型性能的影响。结果表明,当数据是异质的时,聚合模型的标准拟合优度统计量不能用于检测异质性。此外,参数恢复较差。然而,对于有限混合模型,BIC和CAIC准则在检测异质性和识别真实段数方面表现良好。特别是,测量模型和结构模型的参数恢复都非常令人满意。 有限混合方法对分布错配具有鲁棒性;此外,当异质性的形式被错误指定时,即真实模型具有随机系数时,该方法显著优于聚合和顺序数据分析方法。只有在实质性理论支持结构方程模型、先验分割不可行、理论表明数据是异质的、属于有限数量的未观察组的情况下,研究人员和实践者才应该使用混合方法。我们期望这些条件在许多社会科学应用中,特别是在市场细分研究中成立。未来的研究应侧重于大规模的模拟研究,利用更大范围的模型和统计分布来检验结构方程混合模型。理论研究应该通过允许混合比例依赖于先验信息和/或特定于主体的变量来扩展模型。最后,为了提供更充分的异质性处理,我们需要开发一个通用的随机系数结构方程模型。这样的模型目前在统计和心理测量学文献中是不可用的。