基于改进DEC和多模态语义表示的零售客户深度分割

IF 6.8 2区工程技术 Q1 ENGINEERING, MULTIDISCIPLINARY

alexandria engineering journal Pub Date : 2025-09-08 DOI:10.1016/j.aej.2025.09.012

Menglu Wang , Tong Meng , Xiaoyan Gu , Dandan Wang , Rong Wang , Rui Zhao

{"title":"基于改进DEC和多模态语义表示的零售客户深度分割","authors":"Menglu Wang , Tong Meng , Xiaoyan Gu , Dandan Wang , Rong Wang , Rui Zhao","doi":"10.1016/j.aej.2025.09.012","DOIUrl":null,"url":null,"abstract":"<div><div>With the advancement of digital transformation, the retail industry has accumulated a vast amount of customer data, particularly customer review data, which provides valuable insights into customer behavior and sentiment. Traditional customer segmentation methods mainly rely on market research and manual analysis. However, as the volume and complexity of data continue to grow, these traditional approaches struggle to meet the demands of precise segmentation and personalized marketing. As a result, machine learning-based customer segmentation methods have become a research focus. In particular, clustering algorithms are capable of identifying potential customer groups from large-scale datasets and providing a scientific basis for personalized marketing and product recommendations. With recent advances in natural language processing, especially the application of Bidirectional Encoder Representations from Transformers (BERT) models in text data processing, research on customer segmentation based on review data has gained increasing attention. Most current studies still focus on traditional clustering algorithms such as K-means and hierarchical clustering. However, these methods face limitations when dealing with high-dimensional sparse data and complex textual information. In addition, existing analyses of review texts often rely on traditional bag-of-words or Term Frequency–Inverse Document Frequency (TF-IDF) methods, which fail to fully capture the deep semantic information within the reviews. To address these challenges, this paper proposes an improved Deep Embedded Clustering (DEC) algorithm, incorporating BERT and Latent Dirichlet Allocation (LDA) models for vectorized representation and clustering analysis of review texts. This approach effectively overcomes the limitations of existing methods and enhances the accuracy and practicality of customer segmentation.</div></div>","PeriodicalId":7484,"journal":{"name":"alexandria engineering journal","volume":"130 ","pages":"Pages 1-10"},"PeriodicalIF":6.8000,"publicationDate":"2025-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Deep segmentation of retail customers based on improved DEC and multimodal semantic representation\",\"authors\":\"Menglu Wang , Tong Meng , Xiaoyan Gu , Dandan Wang , Rong Wang , Rui Zhao\",\"doi\":\"10.1016/j.aej.2025.09.012\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>With the advancement of digital transformation, the retail industry has accumulated a vast amount of customer data, particularly customer review data, which provides valuable insights into customer behavior and sentiment. Traditional customer segmentation methods mainly rely on market research and manual analysis. However, as the volume and complexity of data continue to grow, these traditional approaches struggle to meet the demands of precise segmentation and personalized marketing. As a result, machine learning-based customer segmentation methods have become a research focus. In particular, clustering algorithms are capable of identifying potential customer groups from large-scale datasets and providing a scientific basis for personalized marketing and product recommendations. With recent advances in natural language processing, especially the application of Bidirectional Encoder Representations from Transformers (BERT) models in text data processing, research on customer segmentation based on review data has gained increasing attention. Most current studies still focus on traditional clustering algorithms such as K-means and hierarchical clustering. However, these methods face limitations when dealing with high-dimensional sparse data and complex textual information. In addition, existing analyses of review texts often rely on traditional bag-of-words or Term Frequency–Inverse Document Frequency (TF-IDF) methods, which fail to fully capture the deep semantic information within the reviews. To address these challenges, this paper proposes an improved Deep Embedded Clustering (DEC) algorithm, incorporating BERT and Latent Dirichlet Allocation (LDA) models for vectorized representation and clustering analysis of review texts. This approach effectively overcomes the limitations of existing methods and enhances the accuracy and practicality of customer segmentation.</div></div>\",\"PeriodicalId\":7484,\"journal\":{\"name\":\"alexandria engineering journal\",\"volume\":\"130 \",\"pages\":\"Pages 1-10\"},\"PeriodicalIF\":6.8000,\"publicationDate\":\"2025-09-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"alexandria engineering journal\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1110016825009755\",\"RegionNum\":2,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"alexandria engineering journal","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1110016825009755","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, MULTIDISCIPLINARY","Score":null,"Total":0}

引用次数: 0

摘要

随着数字化转型的推进，零售行业积累了大量的客户数据，特别是客户评论数据，这些数据为了解客户行为和情绪提供了宝贵的见解。传统的客户细分方法主要依靠市场调研和人工分析。然而，随着数据量和复杂性的不断增长，这些传统方法难以满足精确细分和个性化营销的需求。因此，基于机器学习的客户细分方法已成为研究热点。特别是，聚类算法能够从大规模数据集中识别潜在客户群，为个性化营销和产品推荐提供科学依据。随着自然语言处理技术的发展，特别是BERT模型在文本数据处理中的应用，基于评论数据的客户细分研究日益受到关注。目前大多数研究仍然集中在传统的聚类算法，如K-means和分层聚类。然而，这些方法在处理高维稀疏数据和复杂文本信息时存在局限性。此外，现有的评论文本分析往往依赖于传统的词袋法或术语频率-逆文档频率（TF-IDF）方法，这些方法无法完全捕获评论文本中的深层语义信息。为了解决这些问题，本文提出了一种改进的深度嵌入聚类（DEC）算法，结合BERT和Latent Dirichlet Allocation （LDA）模型对评论文本进行矢量化表示和聚类分析。该方法有效地克服了现有方法的局限性，提高了客户细分的准确性和实用性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Deep segmentation of retail customers based on improved DEC and multimodal semantic representation

With the advancement of digital transformation, the retail industry has accumulated a vast amount of customer data, particularly customer review data, which provides valuable insights into customer behavior and sentiment. Traditional customer segmentation methods mainly rely on market research and manual analysis. However, as the volume and complexity of data continue to grow, these traditional approaches struggle to meet the demands of precise segmentation and personalized marketing. As a result, machine learning-based customer segmentation methods have become a research focus. In particular, clustering algorithms are capable of identifying potential customer groups from large-scale datasets and providing a scientific basis for personalized marketing and product recommendations. With recent advances in natural language processing, especially the application of Bidirectional Encoder Representations from Transformers (BERT) models in text data processing, research on customer segmentation based on review data has gained increasing attention. Most current studies still focus on traditional clustering algorithms such as K-means and hierarchical clustering. However, these methods face limitations when dealing with high-dimensional sparse data and complex textual information. In addition, existing analyses of review texts often rely on traditional bag-of-words or Term Frequency–Inverse Document Frequency (TF-IDF) methods, which fail to fully capture the deep semantic information within the reviews. To address these challenges, this paper proposes an improved Deep Embedded Clustering (DEC) algorithm, incorporating BERT and Latent Dirichlet Allocation (LDA) models for vectorized representation and clustering analysis of review texts. This approach effectively overcomes the limitations of existing methods and enhances the accuracy and practicality of customer segmentation.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

alexandria engineering journal Engineering-General Engineering

CiteScore

11.20

自引率

4.40%

发文量

1015

审稿时长

43 days

期刊介绍： Alexandria Engineering Journal is an international journal devoted to publishing high quality papers in the field of engineering and applied science. Alexandria Engineering Journal is cited in the Engineering Information Services (EIS) and the Chemical Abstracts (CA). The papers published in Alexandria Engineering Journal are grouped into five sections, according to the following classification: • Mechanical, Production, Marine and Textile Engineering • Electrical Engineering, Computer Science and Nuclear Engineering • Civil and Architecture Engineering • Chemical Engineering and Applied Sciences • Environmental Engineering