{"title":"秘鲁机器学习模型,用于南美、西班牙和葡萄牙的电子商务产品匹配","authors":"B. Arriaga, A. Gómez, A. Palacios, W. Aliaga","doi":"10.1016/j.joitmc.2025.100561","DOIUrl":null,"url":null,"abstract":"<div><div>The rapid growth of e-Commerce in Latin America, driven by the increase in digital adoption among younger generations and accelerated by the COVID-19 pandemic, has reshaped how businesses engage with consumers. In Peru alone, the number of online shoppers increased by 131% between 2019 and 2021. However, the lack of a standardized global product identifier continues to hinder product comparison across platforms, weakening the Zero Moment of Truth (ZMOT) and reducing consumers’ ability to make informed purchasing decisions. To address this challenge, this study proposes a multimodal product classification model that combines natural language processing and image analysis to identify and match similar products in online retail stores. The model leverages textual embeddings and visual features to overcome inconsistencies in product descriptions and naming conventions, particularly within the Peruvian market. A data set of local product listings was compiled and used to train and evaluate multiple classifiers, the XGBoost model achieving 92. 7% precision and a 93. 6% F1 score. Beyond local performance, the model was tested in additional South American markets, including Argentina, Brazil, Chile, and Colombia, demonstrating robustness against linguistic and cultural differences. The proposed system enables more accurate product discovery, price comparison, and competitor monitoring, offering practical benefits for both consumers and businesses. Ultimately, this work contributes to the advancement of E-Commerce infrastructure in emerging markets and supports more informed and efficient decision-making across diverse retail ecosystems.</div></div>","PeriodicalId":16678,"journal":{"name":"Journal of Open Innovation: Technology, Market, and Complexity","volume":"11 3","pages":"Article 100561"},"PeriodicalIF":0.0000,"publicationDate":"2025-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Peruvian Machine learning model for E-commerce product matching in South America, Spain and Portugal\",\"authors\":\"B. Arriaga, A. Gómez, A. Palacios, W. Aliaga\",\"doi\":\"10.1016/j.joitmc.2025.100561\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>The rapid growth of e-Commerce in Latin America, driven by the increase in digital adoption among younger generations and accelerated by the COVID-19 pandemic, has reshaped how businesses engage with consumers. In Peru alone, the number of online shoppers increased by 131% between 2019 and 2021. However, the lack of a standardized global product identifier continues to hinder product comparison across platforms, weakening the Zero Moment of Truth (ZMOT) and reducing consumers’ ability to make informed purchasing decisions. To address this challenge, this study proposes a multimodal product classification model that combines natural language processing and image analysis to identify and match similar products in online retail stores. The model leverages textual embeddings and visual features to overcome inconsistencies in product descriptions and naming conventions, particularly within the Peruvian market. A data set of local product listings was compiled and used to train and evaluate multiple classifiers, the XGBoost model achieving 92. 7% precision and a 93. 6% F1 score. Beyond local performance, the model was tested in additional South American markets, including Argentina, Brazil, Chile, and Colombia, demonstrating robustness against linguistic and cultural differences. The proposed system enables more accurate product discovery, price comparison, and competitor monitoring, offering practical benefits for both consumers and businesses. Ultimately, this work contributes to the advancement of E-Commerce infrastructure in emerging markets and supports more informed and efficient decision-making across diverse retail ecosystems.</div></div>\",\"PeriodicalId\":16678,\"journal\":{\"name\":\"Journal of Open Innovation: Technology, Market, and Complexity\",\"volume\":\"11 3\",\"pages\":\"Article 100561\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-06-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Open Innovation: Technology, Market, and Complexity\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2199853125000964\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"Economics, Econometrics and Finance\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Open Innovation: Technology, Market, and Complexity","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2199853125000964","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Economics, Econometrics and Finance","Score":null,"Total":0}
A Peruvian Machine learning model for E-commerce product matching in South America, Spain and Portugal
The rapid growth of e-Commerce in Latin America, driven by the increase in digital adoption among younger generations and accelerated by the COVID-19 pandemic, has reshaped how businesses engage with consumers. In Peru alone, the number of online shoppers increased by 131% between 2019 and 2021. However, the lack of a standardized global product identifier continues to hinder product comparison across platforms, weakening the Zero Moment of Truth (ZMOT) and reducing consumers’ ability to make informed purchasing decisions. To address this challenge, this study proposes a multimodal product classification model that combines natural language processing and image analysis to identify and match similar products in online retail stores. The model leverages textual embeddings and visual features to overcome inconsistencies in product descriptions and naming conventions, particularly within the Peruvian market. A data set of local product listings was compiled and used to train and evaluate multiple classifiers, the XGBoost model achieving 92. 7% precision and a 93. 6% F1 score. Beyond local performance, the model was tested in additional South American markets, including Argentina, Brazil, Chile, and Colombia, demonstrating robustness against linguistic and cultural differences. The proposed system enables more accurate product discovery, price comparison, and competitor monitoring, offering practical benefits for both consumers and businesses. Ultimately, this work contributes to the advancement of E-Commerce infrastructure in emerging markets and supports more informed and efficient decision-making across diverse retail ecosystems.