Menglu Wang , Tong Meng , Xiaoyan Gu , Dandan Wang , Rong Wang , Rui Zhao
{"title":"基于改进DEC和多模态语义表示的零售客户深度分割","authors":"Menglu Wang , Tong Meng , Xiaoyan Gu , Dandan Wang , Rong Wang , Rui Zhao","doi":"10.1016/j.aej.2025.09.012","DOIUrl":null,"url":null,"abstract":"<div><div>With the advancement of digital transformation, the retail industry has accumulated a vast amount of customer data, particularly customer review data, which provides valuable insights into customer behavior and sentiment. Traditional customer segmentation methods mainly rely on market research and manual analysis. However, as the volume and complexity of data continue to grow, these traditional approaches struggle to meet the demands of precise segmentation and personalized marketing. As a result, machine learning-based customer segmentation methods have become a research focus. In particular, clustering algorithms are capable of identifying potential customer groups from large-scale datasets and providing a scientific basis for personalized marketing and product recommendations. With recent advances in natural language processing, especially the application of Bidirectional Encoder Representations from Transformers (BERT) models in text data processing, research on customer segmentation based on review data has gained increasing attention. Most current studies still focus on traditional clustering algorithms such as K-means and hierarchical clustering. However, these methods face limitations when dealing with high-dimensional sparse data and complex textual information. In addition, existing analyses of review texts often rely on traditional bag-of-words or Term Frequency–Inverse Document Frequency (TF-IDF) methods, which fail to fully capture the deep semantic information within the reviews. To address these challenges, this paper proposes an improved Deep Embedded Clustering (DEC) algorithm, incorporating BERT and Latent Dirichlet Allocation (LDA) models for vectorized representation and clustering analysis of review texts. This approach effectively overcomes the limitations of existing methods and enhances the accuracy and practicality of customer segmentation.</div></div>","PeriodicalId":7484,"journal":{"name":"alexandria engineering journal","volume":"130 ","pages":"Pages 1-10"},"PeriodicalIF":6.8000,"publicationDate":"2025-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Deep segmentation of retail customers based on improved DEC and multimodal semantic representation\",\"authors\":\"Menglu Wang , Tong Meng , Xiaoyan Gu , Dandan Wang , Rong Wang , Rui Zhao\",\"doi\":\"10.1016/j.aej.2025.09.012\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>With the advancement of digital transformation, the retail industry has accumulated a vast amount of customer data, particularly customer review data, which provides valuable insights into customer behavior and sentiment. Traditional customer segmentation methods mainly rely on market research and manual analysis. However, as the volume and complexity of data continue to grow, these traditional approaches struggle to meet the demands of precise segmentation and personalized marketing. As a result, machine learning-based customer segmentation methods have become a research focus. In particular, clustering algorithms are capable of identifying potential customer groups from large-scale datasets and providing a scientific basis for personalized marketing and product recommendations. With recent advances in natural language processing, especially the application of Bidirectional Encoder Representations from Transformers (BERT) models in text data processing, research on customer segmentation based on review data has gained increasing attention. Most current studies still focus on traditional clustering algorithms such as K-means and hierarchical clustering. However, these methods face limitations when dealing with high-dimensional sparse data and complex textual information. In addition, existing analyses of review texts often rely on traditional bag-of-words or Term Frequency–Inverse Document Frequency (TF-IDF) methods, which fail to fully capture the deep semantic information within the reviews. To address these challenges, this paper proposes an improved Deep Embedded Clustering (DEC) algorithm, incorporating BERT and Latent Dirichlet Allocation (LDA) models for vectorized representation and clustering analysis of review texts. This approach effectively overcomes the limitations of existing methods and enhances the accuracy and practicality of customer segmentation.</div></div>\",\"PeriodicalId\":7484,\"journal\":{\"name\":\"alexandria engineering journal\",\"volume\":\"130 \",\"pages\":\"Pages 1-10\"},\"PeriodicalIF\":6.8000,\"publicationDate\":\"2025-09-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"alexandria engineering journal\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1110016825009755\",\"RegionNum\":2,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"alexandria engineering journal","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1110016825009755","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, MULTIDISCIPLINARY","Score":null,"Total":0}
Deep segmentation of retail customers based on improved DEC and multimodal semantic representation
With the advancement of digital transformation, the retail industry has accumulated a vast amount of customer data, particularly customer review data, which provides valuable insights into customer behavior and sentiment. Traditional customer segmentation methods mainly rely on market research and manual analysis. However, as the volume and complexity of data continue to grow, these traditional approaches struggle to meet the demands of precise segmentation and personalized marketing. As a result, machine learning-based customer segmentation methods have become a research focus. In particular, clustering algorithms are capable of identifying potential customer groups from large-scale datasets and providing a scientific basis for personalized marketing and product recommendations. With recent advances in natural language processing, especially the application of Bidirectional Encoder Representations from Transformers (BERT) models in text data processing, research on customer segmentation based on review data has gained increasing attention. Most current studies still focus on traditional clustering algorithms such as K-means and hierarchical clustering. However, these methods face limitations when dealing with high-dimensional sparse data and complex textual information. In addition, existing analyses of review texts often rely on traditional bag-of-words or Term Frequency–Inverse Document Frequency (TF-IDF) methods, which fail to fully capture the deep semantic information within the reviews. To address these challenges, this paper proposes an improved Deep Embedded Clustering (DEC) algorithm, incorporating BERT and Latent Dirichlet Allocation (LDA) models for vectorized representation and clustering analysis of review texts. This approach effectively overcomes the limitations of existing methods and enhances the accuracy and practicality of customer segmentation.
期刊介绍:
Alexandria Engineering Journal is an international journal devoted to publishing high quality papers in the field of engineering and applied science. Alexandria Engineering Journal is cited in the Engineering Information Services (EIS) and the Chemical Abstracts (CA). The papers published in Alexandria Engineering Journal are grouped into five sections, according to the following classification:
• Mechanical, Production, Marine and Textile Engineering
• Electrical Engineering, Computer Science and Nuclear Engineering
• Civil and Architecture Engineering
• Chemical Engineering and Applied Sciences
• Environmental Engineering