Extended ProMap datasets for product mapping

IF 3.7 4区 管理学 Q2 BUSINESS
Kateřina Macková, Martin Pilát
{"title":"Extended ProMap datasets for product mapping","authors":"Kateřina Macková, Martin Pilát","doi":"10.1007/s10660-024-09892-9","DOIUrl":null,"url":null,"abstract":"<p>Product mapping or product matching is the field of research dedicated to solving the problem of identifying which product listings (including names, descriptions, specifications, images, and other information) from different e-shops refer to the same product. The problem belongs among important data integration tasks processing data originating from different sources and with different structures. In our previous work, we created basic ProMapEn and ProMapCz datasets for product mapping in English and Czech. The main advantage of the ProMap datasets compared to existing product mapping datasets is that they contain different types of non-matches based on the similarity of the two products. In this paper, we extend the previous two datasets into a completely new collection of datasets for generalized product mapping in the Czech and English languages. We publish those datasets freely for other researchers in the area of product mapping on e-commerce. The main contributions are the extension of the ProMap datasets by adding a new class of non-matching products, the introduction of new ProMapMulti datasets of product pairs from multiple English e-shops, and the introduction of ProMapTransl datasets, obtained by translating the Czech datasets to English and vice versa. Moreover, we provide a very detailed analysis of these datasets with several experiments based on neural network techniques comparing different text preprocessing methods, and similarity computation methods. We also compare the differences among several product categories and evaluate state-of-the-art product mapping methods on these datasets. We also include generalised entity matching techniques and compare their behaviour on product mapping datasets which belong to this area. Finally, we include an appendix with a number of other basic experiments, such as an analysis of feature importances.</p>","PeriodicalId":47264,"journal":{"name":"Electronic Commerce Research","volume":null,"pages":null},"PeriodicalIF":3.7000,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Electronic Commerce Research","FirstCategoryId":"91","ListUrlMain":"https://doi.org/10.1007/s10660-024-09892-9","RegionNum":4,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BUSINESS","Score":null,"Total":0}
引用次数: 0

Abstract

Product mapping or product matching is the field of research dedicated to solving the problem of identifying which product listings (including names, descriptions, specifications, images, and other information) from different e-shops refer to the same product. The problem belongs among important data integration tasks processing data originating from different sources and with different structures. In our previous work, we created basic ProMapEn and ProMapCz datasets for product mapping in English and Czech. The main advantage of the ProMap datasets compared to existing product mapping datasets is that they contain different types of non-matches based on the similarity of the two products. In this paper, we extend the previous two datasets into a completely new collection of datasets for generalized product mapping in the Czech and English languages. We publish those datasets freely for other researchers in the area of product mapping on e-commerce. The main contributions are the extension of the ProMap datasets by adding a new class of non-matching products, the introduction of new ProMapMulti datasets of product pairs from multiple English e-shops, and the introduction of ProMapTransl datasets, obtained by translating the Czech datasets to English and vice versa. Moreover, we provide a very detailed analysis of these datasets with several experiments based on neural network techniques comparing different text preprocessing methods, and similarity computation methods. We also compare the differences among several product categories and evaluate state-of-the-art product mapping methods on these datasets. We also include generalised entity matching techniques and compare their behaviour on product mapping datasets which belong to this area. Finally, we include an appendix with a number of other basic experiments, such as an analysis of feature importances.

用于产品制图的扩展 ProMap 数据集
产品映射或产品匹配是一个研究领域,致力于解决识别来自不同电子商店的产品列表(包括名称、描述、规格、图像和其他信息)指的是同一种产品的问题。这个问题属于重要的数据整合任务,需要处理来自不同来源和不同结构的数据。在之前的工作中,我们创建了基本的 ProMapEn 和 ProMapCz 数据集,用于用英语和捷克语进行产品映射。与现有的产品映射数据集相比,ProMap 数据集的主要优势在于它们根据两种产品的相似性包含不同类型的非匹配数据。在本文中,我们将前两个数据集扩展为一个全新的数据集,用于捷克语和英语的通用产品映射。我们免费发布这些数据集,供电子商务产品映射领域的其他研究人员使用。我们的主要贡献在于通过添加新的非匹配产品类别扩展了 ProMap 数据集,引入了新的 ProMapMulti 数据集,这些数据集包含来自多个英语电子商店的产品对,还引入了 ProMapTransl 数据集,这些数据集是通过将捷克语数据集翻译成英语获得的,反之亦然。此外,我们还通过几个基于神经网络技术的实验对这些数据集进行了非常详细的分析,比较了不同的文本预处理方法和相似性计算方法。我们还比较了几个产品类别之间的差异,并在这些数据集上评估了最先进的产品映射方法。我们还包括通用实体匹配技术,并比较了它们在属于这一领域的产品映射数据集上的表现。最后,我们还在附录中列出了其他一些基本实验,如特征重要性分析。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
7.50
自引率
12.80%
发文量
99
期刊介绍: The Internet and the World Wide Web have brought a fundamental change in the way that individuals access data, information and services. Individuals have access to vast amounts of data, to experts and services that are not limited in time or space. This has forced business to change the way in which they conduct their commercial transactions with their end customers and with other businesses, resulting in the development of a global market through the Internet. The emergence of the Internet and electronic commerce raises many new research issues. The Electronic Commerce Research journal will serve as a forum for stimulating and disseminating research into all facets of electronic commerce - from research into core enabling technologies to work on assessing and understanding the implications of these technologies on societies, economies, businesses and individuals. The journal concentrates on theoretical as well as empirical research that leads to better understanding of electronic commerce and its implications. Topics covered by the journal include, but are not restricted to the following subjects as they relate to the Internet and electronic commerce: Dissemination of services through the Internet;Intelligent agents technologies and their impact;The global impact of electronic commerce;The economics of electronic commerce;Fraud reduction on the Internet;Mobile electronic commerce;Virtual electronic commerce systems;Application of computer and communication technologies to electronic commerce;Electronic market mechanisms and their impact;Auctioning over the Internet;Business models of Internet based companies;Service creation and provisioning;The job market created by the Internet and electronic commerce;Security, privacy, authorization and authentication of users and transactions on the Internet;Electronic data interc hange over the Internet;Electronic payment systems and electronic funds transfer;The impact of electronic commerce on organizational structures and processes;Supply chain management through the Internet;Marketing on the Internet;User adaptive advertisement;Standards in electronic commerce and their analysis;Metrics, measurement and prediction of user activity;On-line stock markets and financial trading;User devices for accessing the Internet and conducting electronic transactions;Efficient search techniques and engines on the WWW;Web based languages (e.g., HTML, XML, VRML, Java);Multimedia storage and distribution;Internet;Collaborative learning, gaming and work;Presentation page design techniques and tools;Virtual reality on the net and 3D visualization;Browsers and user interfaces;Web site management techniques and tools;Managing middleware to support electronic commerce;Web based education, and training;Electronic journals and publishing on the Internet;Legal issues, taxation and property rights;Modeling and design of networks to support Internet applications;Modeling, design and sizing of web site servers;Reliability of intensive on-line applications;Pervasive devices and pervasive computing in electronic commerce;Workflow for electronic commerce applications;Coordination technologies for electronic commerce;Personalization and mass customization technologies;Marketing and customer relationship management in electronic commerce;Service creation and provisioning. Audience: Academics and professionals involved in electronic commerce research and the application and use of the Internet. Managers, consultants, decision-makers and developers who value the use of electronic com merce research results. Special Issues: Electronic Commerce Research publishes from time to time a special issue of the devoted to a single subject area. If interested in serving as a guest editor for a special issue, please contact the Editor-in-Chief J. Christopher Westland at westland@uic.edu with a proposal for the special issue. Officially cited as: Electron Commer Res
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信