{"title":"基于检索增强策略的多模态命名实体识别与关系提取","authors":"Xuming Hu","doi":"10.1145/3539618.3591790","DOIUrl":null,"url":null,"abstract":"Multimodal Named Entity Recognition (MNER) and Multimodal Relation Extraction (MRE) are tasks in information retrieval that aim to recognize entities and extract relations among them using information from multiple modalities, such as text and images. Although current methods have attempted a variety of modality fusion approaches to enhance the information in text, a large amount of readily available internet retrieval data has not been considered. Therefore, we attempt to retrieve real-world text related to images, objects, and entire sentences from the internet and use this retrieved text as input for cross-modal fusion to improve the performance of entity and relation extraction tasks in the text.","PeriodicalId":425056,"journal":{"name":"Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Multimodal Named Entity Recognition and Relation Extraction with Retrieval-Augmented Strategy\",\"authors\":\"Xuming Hu\",\"doi\":\"10.1145/3539618.3591790\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Multimodal Named Entity Recognition (MNER) and Multimodal Relation Extraction (MRE) are tasks in information retrieval that aim to recognize entities and extract relations among them using information from multiple modalities, such as text and images. Although current methods have attempted a variety of modality fusion approaches to enhance the information in text, a large amount of readily available internet retrieval data has not been considered. Therefore, we attempt to retrieve real-world text related to images, objects, and entire sentences from the internet and use this retrieved text as input for cross-modal fusion to improve the performance of entity and relation extraction tasks in the text.\",\"PeriodicalId\":425056,\"journal\":{\"name\":\"Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval\",\"volume\":\"10 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-07-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3539618.3591790\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3539618.3591790","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Multimodal Named Entity Recognition and Relation Extraction with Retrieval-Augmented Strategy
Multimodal Named Entity Recognition (MNER) and Multimodal Relation Extraction (MRE) are tasks in information retrieval that aim to recognize entities and extract relations among them using information from multiple modalities, such as text and images. Although current methods have attempted a variety of modality fusion approaches to enhance the information in text, a large amount of readily available internet retrieval data has not been considered. Therefore, we attempt to retrieve real-world text related to images, objects, and entire sentences from the internet and use this retrieved text as input for cross-modal fusion to improve the performance of entity and relation extraction tasks in the text.