{"title":"基于多嵌入模型的葡萄牙语句子相似度识别","authors":"Ana Carolina Rodrigues, R. Marcacini","doi":"10.1109/ICMLA55696.2022.00029","DOIUrl":null,"url":null,"abstract":"Distinct pre-trained embedding models perform differently in sentence similarity recognition tasks. The current assumption is that they encode different features due to differences in algorithm design and characteristics of the datasets employed in the pre-trained process. The perspective of benefiting from different encoded features to generate more suitable representations motivated the assembly of multiple embedding models, so-called meta-embedding. Meta-embedding methods combine different pre-trained embedding models to perform a task. Recently, multiple pre-trained language representations derived from Transformers architecture-based systems have been shown to be effective in many downstream tasks. This paper introduces a supervised meta-embedding neural network to combine contextualized pre-trained models for sentence similarity recognition in Portuguese. Our results show that combining multiple sentence pre-trained embedding models outperforms single models and can be a promising alternative to improve performance sentence similarity. Moreover, we also discuss the results considering our simple extension of a model explainability method to the meta-embedding context, allowing the visual identification of the impact of each token on the sentence similarity score.","PeriodicalId":128160,"journal":{"name":"2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Sentence Similarity Recognition in Portuguese from Multiple Embedding Models\",\"authors\":\"Ana Carolina Rodrigues, R. Marcacini\",\"doi\":\"10.1109/ICMLA55696.2022.00029\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Distinct pre-trained embedding models perform differently in sentence similarity recognition tasks. The current assumption is that they encode different features due to differences in algorithm design and characteristics of the datasets employed in the pre-trained process. The perspective of benefiting from different encoded features to generate more suitable representations motivated the assembly of multiple embedding models, so-called meta-embedding. Meta-embedding methods combine different pre-trained embedding models to perform a task. Recently, multiple pre-trained language representations derived from Transformers architecture-based systems have been shown to be effective in many downstream tasks. This paper introduces a supervised meta-embedding neural network to combine contextualized pre-trained models for sentence similarity recognition in Portuguese. Our results show that combining multiple sentence pre-trained embedding models outperforms single models and can be a promising alternative to improve performance sentence similarity. Moreover, we also discuss the results considering our simple extension of a model explainability method to the meta-embedding context, allowing the visual identification of the impact of each token on the sentence similarity score.\",\"PeriodicalId\":128160,\"journal\":{\"name\":\"2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA)\",\"volume\":\"37 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICMLA55696.2022.00029\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMLA55696.2022.00029","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Sentence Similarity Recognition in Portuguese from Multiple Embedding Models
Distinct pre-trained embedding models perform differently in sentence similarity recognition tasks. The current assumption is that they encode different features due to differences in algorithm design and characteristics of the datasets employed in the pre-trained process. The perspective of benefiting from different encoded features to generate more suitable representations motivated the assembly of multiple embedding models, so-called meta-embedding. Meta-embedding methods combine different pre-trained embedding models to perform a task. Recently, multiple pre-trained language representations derived from Transformers architecture-based systems have been shown to be effective in many downstream tasks. This paper introduces a supervised meta-embedding neural network to combine contextualized pre-trained models for sentence similarity recognition in Portuguese. Our results show that combining multiple sentence pre-trained embedding models outperforms single models and can be a promising alternative to improve performance sentence similarity. Moreover, we also discuss the results considering our simple extension of a model explainability method to the meta-embedding context, allowing the visual identification of the impact of each token on the sentence similarity score.