Proceedings of The Fifth Workshop on e-Commerce and NLP (ECNLP 5)最新文献

筛选
英文 中文
Improving Specificity in Review Response Generation with Data-Driven Data Filtering 用数据驱动的数据过滤提高评论反应生成的特异性
Proceedings of The Fifth Workshop on e-Commerce and NLP (ECNLP 5) Pub Date : 1900-01-01 DOI: 10.18653/v1/2022.ecnlp-1.15
Tannon Kew, M. Volk
{"title":"Improving Specificity in Review Response Generation with Data-Driven Data Filtering","authors":"Tannon Kew, M. Volk","doi":"10.18653/v1/2022.ecnlp-1.15","DOIUrl":"https://doi.org/10.18653/v1/2022.ecnlp-1.15","url":null,"abstract":"Responding to online customer reviews has become an essential part of successfully managing and growing a business both in e-commerce and the hospitality and tourism sectors. Recently, neural text generation methods intended to assist authors in composing responses have been shown to deliver highly fluent and natural looking texts. However, they also tend to learn a strong, undesirable bias towards generating overly generic, one-size-fits-all outputs to a wide range of inputs. While this often results in ‘safe’, high-probability responses, there are many practical settings in which greater specificity is preferable. In this work we examine the task of generating more specific responses for online reviews in the hospitality domain by identifying generic responses in the training data, filtering them and fine-tuning the generation model. We experiment with a range of data-driven filtering methods and show through automatic and human evaluation that, despite a 60% reduction in the amount of training data, filtering helps to derive models that are capable of generating more specific, useful responses.","PeriodicalId":384006,"journal":{"name":"Proceedings of The Fifth Workshop on e-Commerce and NLP (ECNLP 5)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132631275","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Clause Topic Classification in German and English Standard Form Contracts 德文和英文标准格式合同的条款主题分类
Proceedings of The Fifth Workshop on e-Commerce and NLP (ECNLP 5) Pub Date : 1900-01-01 DOI: 10.18653/v1/2022.ecnlp-1.23
Daniel Braun, F. Matthes
{"title":"Clause Topic Classification in German and English Standard Form Contracts","authors":"Daniel Braun, F. Matthes","doi":"10.18653/v1/2022.ecnlp-1.23","DOIUrl":"https://doi.org/10.18653/v1/2022.ecnlp-1.23","url":null,"abstract":"So-called standard form contracts, i.e. contracts that are drafted unilaterally by one party, like terms and conditions of online shops or terms of services of social networks, are cornerstones of our modern economy. Their processing is, therefore, of significant practical value. Often, the sheer size of these contracts allows the drafting party to hide unfavourable terms from the other party. In this paper, we compare different approaches for automatically classifying the topics of clauses in standard form contracts, based on a data-set of more than 6,000 clauses from more than 170 contracts, which we collected from German and English online shops and annotated based on a taxonomy of clause topics, that we developed together with legal experts. We will show that, in our comparison of seven approaches, from simple keyword matching to transformer language models, BERT performed best with an F1-score of up to 0.91, however much simpler and computationally cheaper models like logistic regression also achieved similarly good results of up to 0.87.","PeriodicalId":384006,"journal":{"name":"Proceedings of The Fifth Workshop on e-Commerce and NLP (ECNLP 5)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117160306","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Data Quality Estimation Framework for Faster Tax Code Classification 快速税号分类的数据质量估计框架
Proceedings of The Fifth Workshop on e-Commerce and NLP (ECNLP 5) Pub Date : 1900-01-01 DOI: 10.18653/v1/2022.ecnlp-1.4
R. Kondadadi, Allen Williams, Nicolas Nicolov
{"title":"Data Quality Estimation Framework for Faster Tax Code Classification","authors":"R. Kondadadi, Allen Williams, Nicolas Nicolov","doi":"10.18653/v1/2022.ecnlp-1.4","DOIUrl":"https://doi.org/10.18653/v1/2022.ecnlp-1.4","url":null,"abstract":"This paper describes a novel framework to estimate the data quality of a collection of product descriptions to identify required relevant information for accurate product listing classification for tax-code assignment. Our Data Quality Estimation (DQE) framework consists of a Question Answering (QA) based attribute value extraction model to identify missing attributes and a classification model to identify bad quality records. We show that our framework can accurately predict the quality of product descriptions. In addition to identifying low-quality product listings, our framework can also generate a detailed report at a category level showing missing product information resulting in a better customer experience.","PeriodicalId":384006,"journal":{"name":"Proceedings of The Fifth Workshop on e-Commerce and NLP (ECNLP 5)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115367613","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Can Pretrained Language Models Generate Persuasive, Faithful, and Informative Ad Text for Product Descriptions? 预训练的语言模型能生成有说服力的、忠实的、信息丰富的产品描述广告文本吗?
Proceedings of The Fifth Workshop on e-Commerce and NLP (ECNLP 5) Pub Date : 1900-01-01 DOI: 10.18653/v1/2022.ecnlp-1.27
Fajri Koto, Jey Han Lau, Timothy Baldwin
{"title":"Can Pretrained Language Models Generate Persuasive, Faithful, and Informative Ad Text for Product Descriptions?","authors":"Fajri Koto, Jey Han Lau, Timothy Baldwin","doi":"10.18653/v1/2022.ecnlp-1.27","DOIUrl":"https://doi.org/10.18653/v1/2022.ecnlp-1.27","url":null,"abstract":"For any e-commerce service, persuasive, faithful, and informative product descriptions can attract shoppers and improve sales. While not all sellers are capable of providing such interesting descriptions, a language generation system can be a source of such descriptions at scale, and potentially assist sellers to improve their product descriptions. Most previous work has addressed this task based on statistical approaches (Wang et al., 2017), limited attributes such as titles (Chen et al., 2019; Chan et al., 2020), and focused on only one product type (Wang et al., 2017; Munigala et al., 2018; Hong et al., 2021). In this paper, we jointly train image features and 10 text attributes across 23 diverse product types, with two different target text types with different writing styles: bullet points and paragraph descriptions. Our findings suggest that multimodal training with modern pretrained language models can generate fluent and persuasive advertisements, but are less faithful and informative, especially out of domain.","PeriodicalId":384006,"journal":{"name":"Proceedings of The Fifth Workshop on e-Commerce and NLP (ECNLP 5)","volume":"57 11","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120921205","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Structured Extraction of Terms and Conditions from German and English Online Shops 从德语和英语网上商店的条款和条件的结构化提取
Proceedings of The Fifth Workshop on e-Commerce and NLP (ECNLP 5) Pub Date : 1900-01-01 DOI: 10.18653/v1/2022.ecnlp-1.21
Tobias Schamel, Daniel Braun, F. Matthes
{"title":"Structured Extraction of Terms and Conditions from German and English Online Shops","authors":"Tobias Schamel, Daniel Braun, F. Matthes","doi":"10.18653/v1/2022.ecnlp-1.21","DOIUrl":"https://doi.org/10.18653/v1/2022.ecnlp-1.21","url":null,"abstract":"The automated analysis of Terms and Conditions has gained attention in recent years, mainly due to its relevance to consumer protection. Well-structured data sets are the base for every analysis. While content extraction, in general, is a well-researched field and many open source libraries are available, our evaluation shows, that existing solutions cannot extract Terms and Conditions in sufficient quality, mainly because of their special structure. In this paper, we present an approach to extract the content and hierarchy of Terms and Conditions from German and English online shops. Our evaluation shows, that the approach outperforms the current state of the art. A python implementation of the approach is made available under an open license.","PeriodicalId":384006,"journal":{"name":"Proceedings of The Fifth Workshop on e-Commerce and NLP (ECNLP 5)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127468134","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Product Titles-to-Attributes As a Text-to-Text Task 作为文本到文本任务的产品标题到属性
Proceedings of The Fifth Workshop on e-Commerce and NLP (ECNLP 5) Pub Date : 1900-01-01 DOI: 10.18653/v1/2022.ecnlp-1.12
Gilad Fuchs, Yoni Acriche
{"title":"Product Titles-to-Attributes As a Text-to-Text Task","authors":"Gilad Fuchs, Yoni Acriche","doi":"10.18653/v1/2022.ecnlp-1.12","DOIUrl":"https://doi.org/10.18653/v1/2022.ecnlp-1.12","url":null,"abstract":"Online marketplaces use attribute-value pairs, such as brand, size, size type, color, etc. to help define important and relevant facts about a listing. These help buyers to curate their search results using attribute filtering and overall create a richer experience. Although their critical importance for listings’ discoverability, getting sellers to input tens of different attribute-value pairs per listing is costly and often results in missing information. This can later translate to the unnecessary removal of relevant listings from the search results when buyers are filtering by attribute values. In this paper we demonstrate using a Text-to-Text hierarchical multi-label ranking model framework to predict the most relevant attributes per listing, along with their expected values, using historic user behavioral data. This solution helps sellers by allowing them to focus on verifying information on attributes that are likely to be used by buyers, and thus, increase the expected recall for their listings. Specifically for eBay’s case we show that using this model can improve the relevancy of the attribute extraction process by 33.2% compared to the current highly-optimized production system. Apart from the empirical contribution, the highly generalized nature of the framework presented in this paper makes it relevant for many high-volume search-driven websites.","PeriodicalId":384006,"journal":{"name":"Proceedings of The Fifth Workshop on e-Commerce and NLP (ECNLP 5)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134342905","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Utilizing Cross-Modal Contrastive Learning to Improve Item Categorization BERT Model 利用跨模态对比学习改进项目分类BERT模型
Proceedings of The Fifth Workshop on e-Commerce and NLP (ECNLP 5) Pub Date : 1900-01-01 DOI: 10.18653/v1/2022.ecnlp-1.25
L. Chen, Houwei Chou
{"title":"Utilizing Cross-Modal Contrastive Learning to Improve Item Categorization BERT Model","authors":"L. Chen, Houwei Chou","doi":"10.18653/v1/2022.ecnlp-1.25","DOIUrl":"https://doi.org/10.18653/v1/2022.ecnlp-1.25","url":null,"abstract":"Item categorization (IC) is a core natural language processing (NLP) task in e-commerce. As a special text classification task, fine-tuning pre-trained models, e.g., BERT, has become a mainstream solution. To improve IC performance further, other product metadata, e.g., product images, have been used. Although multimodal IC (MIC) systems show higher performance, expanding from processing text to more resource-demanding images brings large engineering impacts and hinders the deployment of such dual-input MIC systems. In this paper, we proposed a new way of using product images to improve text-only IC model: leveraging cross-modal signals between products’ titles and associated images to adapt BERT models in a self-supervised learning (SSL) way. Our experiments on the three genres in the public Amazon product dataset show that the proposed method generates improved prediction accuracy and macro-F1 values than simply using the original BERT. Moreover, the proposed method is able to keep using existing text-only IC inference implementation and shows a resource advantage than the deployment of a dual-input MIC system.","PeriodicalId":384006,"journal":{"name":"Proceedings of The Fifth Workshop on e-Commerce and NLP (ECNLP 5)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123465234","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Product Answer Generation from Heterogeneous Sources: A New Benchmark and Best Practices 从异构来源生成产品答案:一个新的基准和最佳实践
Proceedings of The Fifth Workshop on e-Commerce and NLP (ECNLP 5) Pub Date : 1900-01-01 DOI: 10.18653/v1/2022.ecnlp-1.13
Xiaoyu Shen, Gianni Barlacchi, Marco Del Tredici, Weiwei Cheng, B. Byrne, A. Gispert
{"title":"Product Answer Generation from Heterogeneous Sources: A New Benchmark and Best Practices","authors":"Xiaoyu Shen, Gianni Barlacchi, Marco Del Tredici, Weiwei Cheng, B. Byrne, A. Gispert","doi":"10.18653/v1/2022.ecnlp-1.13","DOIUrl":"https://doi.org/10.18653/v1/2022.ecnlp-1.13","url":null,"abstract":"It is of great value to answer product questions based on heterogeneous information sources available on web product pages, e.g., semi-structured attributes, text descriptions, user-provided contents, etc. However, these sources have different structures and writing styles, which poses challenges for (1) evidence ranking, (2) source selection, and (3) answer generation. In this paper, we build a benchmark with annotations for both evidence selection and answer generation covering 6 information sources. Based on this benchmark, we conduct a comprehensive study and present a set of best practices. We show that all sources are important and contribute to answering questions. Handling all sources within one single model can produce comparable confidence scores across sources and combining multiple sources for training always helps, even for sources with totally different structures. We further propose a novel data augmentation method to iteratively create training samples for answer generation, which achieves close-to-human performance with only a few thousandannotations. Finally, we perform an in-depth error analysis of model predictions and highlight the challenges for future research.","PeriodicalId":384006,"journal":{"name":"Proceedings of The Fifth Workshop on e-Commerce and NLP (ECNLP 5)","volume":"134 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124157004","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
OpenBrand: Open Brand Value Extraction from Product Descriptions OpenBrand:从产品描述中提取开放式品牌价值
Proceedings of The Fifth Workshop on e-Commerce and NLP (ECNLP 5) Pub Date : 1900-01-01 DOI: 10.18653/v1/2022.ecnlp-1.19
Kassem Sabeh, Mouna Kacimi, J. Gamper
{"title":"OpenBrand: Open Brand Value Extraction from Product Descriptions","authors":"Kassem Sabeh, Mouna Kacimi, J. Gamper","doi":"10.18653/v1/2022.ecnlp-1.19","DOIUrl":"https://doi.org/10.18653/v1/2022.ecnlp-1.19","url":null,"abstract":"Extracting attribute-value information from unstructured product descriptions continue to be of a vital importance in e-commerce applications. One of the most important product attributes is the brand which highly influences costumers’ purchasing behaviour. Thus, it is crucial to accurately extract brand information dealing with the main challenge of discovering new brand names. Under the open world assumption, several approaches have adopted deep learning models to extract attribute-values using sequence tagging paradigm. However, they did not employ finer grained data representations such as character level embeddings which improve generalizability. In this paper, we introduce OpenBrand, a novel approach for discovering brand names. OpenBrand is a BiLSTM-CRF-Attention model with embeddings at different granularities. Such embeddings are learned using CNN and LSTM architectures to provide more accurate representations. We further propose a new dataset for brand value extraction, with a very challenging task on zero-shot extraction. We have tested our approach, through extensive experiments, and shown that it outperforms state-of-the-art models in brand name discovery.","PeriodicalId":384006,"journal":{"name":"Proceedings of The Fifth Workshop on e-Commerce and NLP (ECNLP 5)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124200142","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Domain-specific knowledge distillation yields smaller and better models for conversational commerce 特定领域的知识精馏为会话式商务产生更小、更好的模型
Proceedings of The Fifth Workshop on e-Commerce and NLP (ECNLP 5) Pub Date : 1900-01-01 DOI: 10.18653/v1/2022.ecnlp-1.18
Kristen Howell, Jian Wang, Akshay Hazare, Joe Bradley, Chris Brew, Xi Chen, Matthew Dunn, Beth-Ann Hockey, Andrew Maurer, D. Widdows
{"title":"Domain-specific knowledge distillation yields smaller and better models for conversational commerce","authors":"Kristen Howell, Jian Wang, Akshay Hazare, Joe Bradley, Chris Brew, Xi Chen, Matthew Dunn, Beth-Ann Hockey, Andrew Maurer, D. Widdows","doi":"10.18653/v1/2022.ecnlp-1.18","DOIUrl":"https://doi.org/10.18653/v1/2022.ecnlp-1.18","url":null,"abstract":"We demonstrate that knowledge distillation can be used not only to reduce model size, but to simultaneously adapt a contextual language model to a specific domain. We use Multilingual BERT (mBERT; Devlin et al., 2019) as a starting point and follow the knowledge distillation approach of (Sahn et al., 2019) to train a smaller multilingual BERT model that is adapted to the domain at hand. We show that for in-domain tasks, the domain-specific model shows on average 2.3% improvement in F1 score, relative to a model distilled on domain-general data. Whereas much previous work with BERT has fine-tuned the encoder weights during task training, we show that the model improvements from distillation on in-domain data persist even when the encoder weights are frozen during task training, allowing a single encoder to support classifiers for multiple tasks and languages.","PeriodicalId":384006,"journal":{"name":"Proceedings of The Fifth Workshop on e-Commerce and NLP (ECNLP 5)","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127545313","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信