Findings (Sydney (N.S.W.)最新文献_第10页

Context Generation Improves Open Domain Question Answering 上下文生成改进了开放域问答

Findings (Sydney (N.S.W.) Pub Date : 2022-10-12 DOI: 10.48550/arXiv.2210.06349

Dan Su, M. Patwary, Shrimai Prabhumoye, Peng Xu, R. Prenger, M. Shoeybi, Pascale Fung, Anima Anandkumar, Bryan Catanzaro

{"title":"Context Generation Improves Open Domain Question Answering","authors":"Dan Su, M. Patwary, Shrimai Prabhumoye, Peng Xu, R. Prenger, M. Shoeybi, Pascale Fung, Anima Anandkumar, Bryan Catanzaro","doi":"10.48550/arXiv.2210.06349","DOIUrl":"https://doi.org/10.48550/arXiv.2210.06349","url":null,"abstract":"Closed-book question answering (QA) requires a model to directly answer an open-domain question without access to any external knowledge. Prior work on closed-book QA either directly finetunes or prompts a pretrained language model (LM) to leverage the stored knowledge. However, they do not fully exploit the parameterized knowledge. To address this inefficiency, we propose a two-stage, closed-book QA framework which employs a coarse-to-fine approach to extract the relevant knowledge and answer a question. We first generate a related context for a given question by prompting a pretrained LM. We then prompt the same LM to generate an answer using the generated context and the question. Additionally, we marginalize over the generated contexts to improve the accuracies and reduce context uncertainty. Experimental results on three QA benchmarks show that our method significantly outperforms previous closed-book QA methods. For example on TriviaQA, our method improves exact match accuracy from 55.3% to 68.6%, and is on par with open-book QA methods (68.6% vs. 68.0%). Our results show that our new methodology is able to better exploit the stored knowledge in pretrained LMs without adding extra learnable parameters or needing finetuning, and paves the way for hybrid models that integrate pretrained LMs with external knowledge.","PeriodicalId":73025,"journal":{"name":"Findings (Sydney (N.S.W.)","volume":"1 1","pages":"781-796"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41763508","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Zero-Shot On-the-Fly Event Schema Induction 零样本即时事件模式归纳

Findings (Sydney (N.S.W.) Pub Date : 2022-10-12 DOI: 10.48550/arXiv.2210.06254

Rotem Dror, Haoyu Wang, D. Roth

引用次数: 8

PriMeSRL-Eval: A Practical Quality Metric for Semantic Role Labeling Systems Evaluation PriMeSRL-Eval:语义角色标注系统评价的实用质量度量

Findings (Sydney (N.S.W.) Pub Date : 2022-10-12 DOI: 10.48550/arXiv.2210.06408

Ishan Jindal, Alexandre Rademaker, Khoi-Nguyen Tran, Huaiyu Zhu, H. Kanayama, Marina Danilevsky, Yunyao Li

引用次数: 0

Machine Translation between Spoken Languages and Signed Languages Represented in SignWriting 手语语言与口语的机器翻译

Findings (Sydney (N.S.W.) Pub Date : 2022-10-11 DOI: 10.48550/arXiv.2210.05404

Zifan Jiang, Amit Moryossef, Mathias Muller, Sarah Ebling

引用次数: 8

ViLPAct: A Benchmark for Compositional Generalization on Multimodal Human Activities ViLPAct:多模式人类活动的合成概括基准

Findings (Sydney (N.S.W.) Pub Date : 2022-10-11 DOI: 10.48550/arXiv.2210.05556

Terry Yue Zhuo, Yaqing Liao, Yuecheng Lei, Lizhen Qu, Gerard de Melo, Xiaojun Chang, Yazhou Ren, Zenglin Xu

引用次数: 0

Hierarchical3D Adapters for Long Video-to-text Summarization 用于长视频到文本摘要的层次结构3D适配器

Findings (Sydney (N.S.W.) Pub Date : 2022-10-10 DOI: 10.48550/arXiv.2210.04829

Pinelopi Papalampidi, Mirella Lapata

引用次数: 2

Fighting FIRe with FIRE: Assessing the Validity of Text-to-Video Retrieval Benchmarks 用FIRe对抗FIRe：评估文本到视频检索基准的有效性

Findings (Sydney (N.S.W.) Pub Date : 2022-10-10 DOI: 10.48550/arXiv.2210.05038

Pedro Rodriguez, Mahmoud Azab, Becka Silvert, Renato Sanchez, Linzy Labson, Hardik Shah, Seungwhan Moon

{"title":"Fighting FIRe with FIRE: Assessing the Validity of Text-to-Video Retrieval Benchmarks","authors":"Pedro Rodriguez, Mahmoud Azab, Becka Silvert, Renato Sanchez, Linzy Labson, Hardik Shah, Seungwhan Moon","doi":"10.48550/arXiv.2210.05038","DOIUrl":"https://doi.org/10.48550/arXiv.2210.05038","url":null,"abstract":"Searching troves of videos with textual descriptions is a core multimodal retrieval task. Owing to the lack of a purpose-built dataset for text-to-video retrieval, video captioning datasets have been re-purposed to evaluate models by (1) treating captions as positive matches to their respective videos and (2) assuming all other videos to be negatives. However, this methodology leads to a fundamental flaw during evaluation: since captions are marked as relevant only to their original video, many alternate videos also match the caption, which introduces false-negative caption-video pairs. We show that when these false negatives are corrected, a recent state-of-the-art model gains 25% recall points—a difference that threatens the validity of the benchmark itself. To diagnose and mitigate this issue, we annotate and release 683K additional caption-video pairs. Using these, we recompute effectiveness scores for three models on two standard benchmarks (MSR-VTT and MSVD). We find that (1) the recomputed metrics are up to 25% recall points higher for the best models, (2) these benchmarks are nearing saturation for Recall@10, (3) caption length (generality) is related to the number of positives, and (4) annotation costs can be mitigated through sampling. We recommend retiring these benchmarks in their current form, and we make recommendations for future text-to-video retrieval benchmarks.","PeriodicalId":73025,"journal":{"name":"Findings (Sydney (N.S.W.)","volume":"1 1","pages":"47-68"},"PeriodicalIF":0.0,"publicationDate":"2022-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46065962","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Evaluating Rules for Aggregating Satisfaction with Activity-travel Episodes to a Day-level Satisfaction Measure 将对活动旅行事件的满意度汇总为一天级别满意度测量的评估规则

Findings (Sydney (N.S.W.) Pub Date : 2022-10-03 DOI: 10.32866/001c.38543

Wenbo Guo, T. Schwanen, C. Brand, Y. Chai

引用次数: 0

Examining Pre- and Post-Pandemic Cross-Border Trips Using Crowdsourced Data at the Second-Busiest US-Mexico Border Community 在第二繁忙的美墨边境社区使用众包数据检查疫情前后的跨境旅行

Findings (Sydney (N.S.W.) Pub Date : 2022-09-27 DOI: 10.32866/001c.38429

Erik Vargas, Okan Gurbuz, I. Sener, R. Aldrete

引用次数: 0

An Interrupted Time Series Analysis of the Sociodemographics of Crash Victims during the Illinois Stay at Home Order 伊利诺斯州住家令期间车祸受害者的社会人口学中断时间序列分析

Findings (Sydney (N.S.W.) Pub Date : 2022-09-23 DOI: 10.32866/001c.38490

Mickey Edwards

引用次数: 0