Tan Yu, Xiaokang Li, Jianwen Xie, Ruiyang Yin, Qing Xu, Ping Li
{"title":"MixBERT在广告中的形象广告相关性评分","authors":"Tan Yu, Xiaokang Li, Jianwen Xie, Ruiyang Yin, Qing Xu, Ping Li","doi":"10.1145/3459637.3482143","DOIUrl":null,"url":null,"abstract":"For a good advertising effect, images in the ad should be highly relevant with the ad title. The images in an ad are normally selected from the gallery based on their relevance scores with the ad's title. To ensure the selected images are relevant with the title, a reliable text-image matching model is necessary. The state-of-the-art text- image matching model, cross-modal BERT, only understands the visual content in the image, which is sub-optimal when the image description is available. In this work, we present MixBERT, an adimage relevance scoring model. It models the ad-image relevance by matching the ad title with the image description and visual content. MixBERT adopts a two-stream architecture. It adaptively selects the useful information from noisy image description and suppresses the noise impeding effective matching. To effectively describe the details in visual content of the image, a set of local convolutional features is used as the initial representation of the image. Moreover, to enhance the perceptual capability of our model in key entities which are important to advertising, we upgrade masked language modeling in vanilla BERT to masked key entity modeling. Offline and online experiments demonstrate its effectiveness.","PeriodicalId":405296,"journal":{"name":"Proceedings of the 30th ACM International Conference on Information & Knowledge Management","volume":"20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"MixBERT for Image-Ad Relevance Scoring in Advertising\",\"authors\":\"Tan Yu, Xiaokang Li, Jianwen Xie, Ruiyang Yin, Qing Xu, Ping Li\",\"doi\":\"10.1145/3459637.3482143\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"For a good advertising effect, images in the ad should be highly relevant with the ad title. The images in an ad are normally selected from the gallery based on their relevance scores with the ad's title. To ensure the selected images are relevant with the title, a reliable text-image matching model is necessary. The state-of-the-art text- image matching model, cross-modal BERT, only understands the visual content in the image, which is sub-optimal when the image description is available. In this work, we present MixBERT, an adimage relevance scoring model. It models the ad-image relevance by matching the ad title with the image description and visual content. MixBERT adopts a two-stream architecture. It adaptively selects the useful information from noisy image description and suppresses the noise impeding effective matching. To effectively describe the details in visual content of the image, a set of local convolutional features is used as the initial representation of the image. Moreover, to enhance the perceptual capability of our model in key entities which are important to advertising, we upgrade masked language modeling in vanilla BERT to masked key entity modeling. Offline and online experiments demonstrate its effectiveness.\",\"PeriodicalId\":405296,\"journal\":{\"name\":\"Proceedings of the 30th ACM International Conference on Information & Knowledge Management\",\"volume\":\"20 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-10-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 30th ACM International Conference on Information & Knowledge Management\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3459637.3482143\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 30th ACM International Conference on Information & Knowledge Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3459637.3482143","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
MixBERT for Image-Ad Relevance Scoring in Advertising
For a good advertising effect, images in the ad should be highly relevant with the ad title. The images in an ad are normally selected from the gallery based on their relevance scores with the ad's title. To ensure the selected images are relevant with the title, a reliable text-image matching model is necessary. The state-of-the-art text- image matching model, cross-modal BERT, only understands the visual content in the image, which is sub-optimal when the image description is available. In this work, we present MixBERT, an adimage relevance scoring model. It models the ad-image relevance by matching the ad title with the image description and visual content. MixBERT adopts a two-stream architecture. It adaptively selects the useful information from noisy image description and suppresses the noise impeding effective matching. To effectively describe the details in visual content of the image, a set of local convolutional features is used as the initial representation of the image. Moreover, to enhance the perceptual capability of our model in key entities which are important to advertising, we upgrade masked language modeling in vanilla BERT to masked key entity modeling. Offline and online experiments demonstrate its effectiveness.