{"title":"Product return prediction in live streaming e-commerce with cross-modal contrastive transformer","authors":"Wen Zhang , Rui Xie , Pei Quan , Zhenzhong Ma","doi":"10.1016/j.dss.2025.114470","DOIUrl":null,"url":null,"abstract":"<div><div>The live-streaming e-commerce industry is suffering heavy economic losses due to the high product return rate, which leads to rising logistics costs, greater inventory pressure, and unsatisfactory consumer experiences. Accurate product return prediction is highly desirable for the vendors to optimize their business operations in advance to reduce return-related costs. This paper proposes a novel approach, called Contraformer (Contrastive transformer), to predict product returns in live streaming e-commerce by leveraging fine-grained streamer behavior features extracted from three modalities (i.e., visual, acoustic, and language). The primary contribution lies in that we adopt Transformer with the encoder-decoder architecture with a novel class-supervised contrastive learning (CSCL) to fuse streamer behavior for multimodal representation alignment and inter-modal interaction characterization. By using a real-world dataset with 2584 product streamers and 864 items collected from Tiktok China live streaming platform, we demonstrate that the proposed Contrasformer approach outperforms the baseline methods in predicting product return rate with a 25 % reduction in terms of mean absolute error. This study offers great managerial implications for vendors to manage their practice in live streaming commerce.</div></div>","PeriodicalId":55181,"journal":{"name":"Decision Support Systems","volume":"194 ","pages":"Article 114470"},"PeriodicalIF":6.7000,"publicationDate":"2025-05-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Decision Support Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167923625000715","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
The live-streaming e-commerce industry is suffering heavy economic losses due to the high product return rate, which leads to rising logistics costs, greater inventory pressure, and unsatisfactory consumer experiences. Accurate product return prediction is highly desirable for the vendors to optimize their business operations in advance to reduce return-related costs. This paper proposes a novel approach, called Contraformer (Contrastive transformer), to predict product returns in live streaming e-commerce by leveraging fine-grained streamer behavior features extracted from three modalities (i.e., visual, acoustic, and language). The primary contribution lies in that we adopt Transformer with the encoder-decoder architecture with a novel class-supervised contrastive learning (CSCL) to fuse streamer behavior for multimodal representation alignment and inter-modal interaction characterization. By using a real-world dataset with 2584 product streamers and 864 items collected from Tiktok China live streaming platform, we demonstrate that the proposed Contrasformer approach outperforms the baseline methods in predicting product return rate with a 25 % reduction in terms of mean absolute error. This study offers great managerial implications for vendors to manage their practice in live streaming commerce.
期刊介绍:
The common thread of articles published in Decision Support Systems is their relevance to theoretical and technical issues in the support of enhanced decision making. The areas addressed may include foundations, functionality, interfaces, implementation, impacts, and evaluation of decision support systems (DSSs).