{"title":"ROSI: A hybrid solution for omni-channel feature integration in E-commerce","authors":"Luyi Ma , Shengwei Tang , Anjana Ganesh, Jiao Chen, Aashika Padmanabhan, Malay Patel, Jianpeng Xu, Jason Cho, Evren Korpeoglu, Sushant Kumar, Kannan Achan","doi":"10.1016/j.datak.2025.102465","DOIUrl":null,"url":null,"abstract":"<div><div>Efficient integration of customer behavior data across multiple channels, including online and in-store interactions, is essential for developing recommendation systems that enhance customer experiences and maintain a competitive edge in e-commerce. However, the integration process faces several challenges, including data synchronization and discrepancies in data schemas. In this study, we introduce a hybrid data pipeline, <span>ROSI</span> (Retail Online-Store Integration), designed to integrate real-time streaming data from online platforms with batch data from in-store interactions. <span>ROSI</span> employs scalable, fault-tolerant streaming systems for online data and periodic batch processing for offline data, ensuring effective synchronization despite variations in data volume, update frequency, and schema. Our approach incorporates in-memory storage, sliding time windows, and feature registries to support applications such as machine learning model training and real-time inference in recommendation systems. Experimental results on a real-world retail data demonstrate that <span>ROSI</span> is highly robust, with a reduced growth rate of overall latency when data size increases linearly. Additionally, sequential recommendation systems built on the integrated dataset show a 6.25% improvement in ranking metrics. Overall, the proposed hybrid pipeline facilitates more personalized, omnichannel customer experiences while enhancing operational efficiency.</div></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"160 ","pages":"Article 102465"},"PeriodicalIF":2.7000,"publicationDate":"2025-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Data & Knowledge Engineering","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0169023X25000606","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Efficient integration of customer behavior data across multiple channels, including online and in-store interactions, is essential for developing recommendation systems that enhance customer experiences and maintain a competitive edge in e-commerce. However, the integration process faces several challenges, including data synchronization and discrepancies in data schemas. In this study, we introduce a hybrid data pipeline, ROSI (Retail Online-Store Integration), designed to integrate real-time streaming data from online platforms with batch data from in-store interactions. ROSI employs scalable, fault-tolerant streaming systems for online data and periodic batch processing for offline data, ensuring effective synchronization despite variations in data volume, update frequency, and schema. Our approach incorporates in-memory storage, sliding time windows, and feature registries to support applications such as machine learning model training and real-time inference in recommendation systems. Experimental results on a real-world retail data demonstrate that ROSI is highly robust, with a reduced growth rate of overall latency when data size increases linearly. Additionally, sequential recommendation systems built on the integrated dataset show a 6.25% improvement in ranking metrics. Overall, the proposed hybrid pipeline facilitates more personalized, omnichannel customer experiences while enhancing operational efficiency.
期刊介绍:
Data & Knowledge Engineering (DKE) stimulates the exchange of ideas and interaction between these two related fields of interest. DKE reaches a world-wide audience of researchers, designers, managers and users. The major aim of the journal is to identify, investigate and analyze the underlying principles in the design and effective use of these systems.