Proceedings of NLP Power! The First Workshop on Efficient Benchmarking in NLP最新文献

Towards Stronger Adversarial Baselines Through Human-AI Collaboration 通过人类与人工智能的合作实现更强的对抗基线

Proceedings of NLP Power! The First Workshop on Efficient Benchmarking in NLP Pub Date : 1900-01-01 DOI: 10.18653/v1/2022.nlppower-1.2

Wencong You, Daniel Lowd

{"title":"Towards Stronger Adversarial Baselines Through Human-AI Collaboration","authors":"Wencong You, Daniel Lowd","doi":"10.18653/v1/2022.nlppower-1.2","DOIUrl":"https://doi.org/10.18653/v1/2022.nlppower-1.2","url":null,"abstract":"Natural language processing (NLP) systems are often used for adversarial tasks such as detecting spam, abuse, hate speech, and fake news. Properly evaluating such systems requires dynamic evaluation that searches for weaknesses in the model, rather than a static test set. Prior work has evaluated such models on both manually and automatically generated examples, but both approaches have limitations: manually constructed examples are time-consuming to create and are limited by the imagination and intuition of the creators, while automatically constructed examples are often ungrammatical or labeled inconsistently. We propose to combine human and AI expertise in generating adversarial examples, benefiting from humans’ expertise in language and automated attacks’ ability to probe the target system more quickly and thoroughly. We present a system that facilitates attack construction, combining human judgment with automated attacks to create better attacks more efficiently. Preliminary results from our own experimentation suggest that human-AI hybrid attacks are more effective than either human-only or AI-only attacks. A complete user study to validate these hypotheses is still pending.","PeriodicalId":242673,"journal":{"name":"Proceedings of NLP Power! The First Workshop on Efficient Benchmarking in NLP","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125259744","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Benchmarking Post-Hoc Interpretability Approaches for Transformer-based Misogyny Detection 基于变压器的厌女症检测的基准事后可解释性方法

Proceedings of NLP Power! The First Workshop on Efficient Benchmarking in NLP Pub Date : 1900-01-01 DOI: 10.18653/v1/2022.nlppower-1.11

Giuseppe Attanasio, Debora Nozza, Eliana Pastor, Dirk Hovy

引用次数: 11

Automatically Discarding Straplines to Improve Data Quality for Abstractive News Summarization 自动丢弃带状线提高抽象新闻摘要的数据质量

Proceedings of NLP Power! The First Workshop on Efficient Benchmarking in NLP Pub Date : 1900-01-01 DOI: 10.18653/v1/2022.nlppower-1.5

Amr Keleg, Matthias Lindemann, Danyang Liu, Wanqiu Long, B. Webber

引用次数: 0

Raison d’être of the benchmark dataset: A Survey of Current Practices of Benchmark Dataset Sharing Platforms 基准数据集的理由être:基准数据集共享平台的现状调查

Proceedings of NLP Power! The First Workshop on Efficient Benchmarking in NLP Pub Date : 1900-01-01 DOI: 10.18653/v1/2022.nlppower-1.1

Jaihyun Park, Sullam Jeoung

{"title":"Raison d’être of the benchmark dataset: A Survey of Current Practices of Benchmark Dataset Sharing Platforms","authors":"Jaihyun Park, Sullam Jeoung","doi":"10.18653/v1/2022.nlppower-1.1","DOIUrl":"https://doi.org/10.18653/v1/2022.nlppower-1.1","url":null,"abstract":"This paper critically examines the current practices of benchmark dataset sharing in NLP and suggests a better way to inform reusers of the benchmark dataset. As the dataset sharing platform plays a key role not only in distributing the dataset but also in informing the potential reusers about the dataset, we believe data-sharing platforms should provide a comprehensive context of the datasets. We survey four benchmark dataset sharing platforms: HuggingFace, PaperswithCode, Tensorflow, and Pytorch to diagnose the current practices of how the dataset is shared which metadata is shared and omitted. To be specific, drawing on the concept of data curation which considers the future reuse when the data is made public, we advance the direction that benchmark dataset sharing platforms should take into consideration. We identify that four benchmark platforms have different practices of using metadata and there is a lack of consensus on what social impact metadata is. We believe the problem of missing a discussion around social impact in the dataset sharing platforms has to do with the failed agreement on who should be in charge. We propose that the benchmark dataset should develop social impact metadata and data curator should take a role in managing the social impact metadata.","PeriodicalId":242673,"journal":{"name":"Proceedings of NLP Power! The First Workshop on Efficient Benchmarking in NLP","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127280759","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2