WEmbSim:一个简单而有效的图像标题度量

2020 Digital Image Computing: Techniques and Applications (DICTA) Pub Date : 2020-11-29 DOI:10.1109/DICTA51227.2020.9363392

Naeha Sharif, Lyndon White, Bennamoun, Wei Liu, Syed Afaq Ali Shah

{"title":"WEmbSim:一个简单而有效的图像标题度量","authors":"Naeha Sharif, Lyndon White, Bennamoun, Wei Liu, Syed Afaq Ali Shah","doi":"10.1109/DICTA51227.2020.9363392","DOIUrl":null,"url":null,"abstract":"The area of automatic image caption evaluation is still undergoing intensive research to address the needs of generating captions which can meet adequacy and fluency requirements. Based on our past attempts at developing highly sophisticated learning-based metrics, we have discovered that a simple cosine similarity measure using the Mean of Word Embeddings (MOWE) of captions can actually achieve a surprisingly high performance on unsupervised caption evaluation. This inspires our proposed work on an effective metric WEmbSim, which beats complex measures such as SPICE, CIDEr and WMD at system-level correlation with human judgments. Moreover, it also achieves the best accuracy at matching human consensus scores for caption pairs, against commonly used unsupervised methods. Therefore, we believe that WEmbSim sets a new baseline for any complex metric to be justified.","PeriodicalId":348164,"journal":{"name":"2020 Digital Image Computing: Techniques and Applications (DICTA)","volume":"74 5","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"WEmbSim: A Simple yet Effective Metric for Image Captioning\",\"authors\":\"Naeha Sharif, Lyndon White, Bennamoun, Wei Liu, Syed Afaq Ali Shah\",\"doi\":\"10.1109/DICTA51227.2020.9363392\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The area of automatic image caption evaluation is still undergoing intensive research to address the needs of generating captions which can meet adequacy and fluency requirements. Based on our past attempts at developing highly sophisticated learning-based metrics, we have discovered that a simple cosine similarity measure using the Mean of Word Embeddings (MOWE) of captions can actually achieve a surprisingly high performance on unsupervised caption evaluation. This inspires our proposed work on an effective metric WEmbSim, which beats complex measures such as SPICE, CIDEr and WMD at system-level correlation with human judgments. Moreover, it also achieves the best accuracy at matching human consensus scores for caption pairs, against commonly used unsupervised methods. Therefore, we believe that WEmbSim sets a new baseline for any complex metric to be justified.\",\"PeriodicalId\":348164,\"journal\":{\"name\":\"2020 Digital Image Computing: Techniques and Applications (DICTA)\",\"volume\":\"74 5\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-11-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 Digital Image Computing: Techniques and Applications (DICTA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/DICTA51227.2020.9363392\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 Digital Image Computing: Techniques and Applications (DICTA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DICTA51227.2020.9363392","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

图像标题自动评价领域仍在深入研究中，以解决生成满足充分性和流畅性要求的标题的需求。基于我们过去开发高度复杂的基于学习的指标的尝试，我们发现使用标题的词嵌入均值(MOWE)的简单余弦相似度度量实际上可以在无监督的标题评估中取得惊人的高性能。这激发了我们提出的有效度量WEmbSim的工作，它在与人类判断的系统级相关性方面击败了复杂的度量，如SPICE, CIDEr和WMD。此外，与常用的无监督方法相比，它在匹配人类对标题的共识分数方面也达到了最好的准确性。因此，我们相信WEmbSim为任何复杂的度量标准设定了一个新的基准。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

WEmbSim: A Simple yet Effective Metric for Image Captioning

The area of automatic image caption evaluation is still undergoing intensive research to address the needs of generating captions which can meet adequacy and fluency requirements. Based on our past attempts at developing highly sophisticated learning-based metrics, we have discovered that a simple cosine similarity measure using the Mean of Word Embeddings (MOWE) of captions can actually achieve a surprisingly high performance on unsupervised caption evaluation. This inspires our proposed work on an effective metric WEmbSim, which beats complex measures such as SPICE, CIDEr and WMD at system-level correlation with human judgments. Moreover, it also achieves the best accuracy at matching human consensus scores for caption pairs, against commonly used unsupervised methods. Therefore, we believe that WEmbSim sets a new baseline for any complex metric to be justified.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2020 Digital Image Computing: Techniques and Applications (DICTA)

自引率

0.00%

发文量