Tsegaye Misikir Tashu, Sara Fattouh, Peter Kiss, Tomáš Horváth
{"title":"基于层次融合的多模式电子商务产品分类","authors":"Tsegaye Misikir Tashu, Sara Fattouh, Peter Kiss, Tomáš Horváth","doi":"10.1109/CITDS54976.2022.9914136","DOIUrl":null,"url":null,"abstract":"In this work, we present a multi-modal model for commercial product classification, that combines features extracted by multiple neural network models from textual (Camem-BERT and FlauBERT) and visual data (SE-ResNeXt-50), using simple fusion techniques. The proposed method significantly outperformed the performance of the unimodal models, as well as the reported performance of similar models on our specific task. We made experiments with multiple fusing techniques, and found, that the best preforming technique to combine the individual embedding of the unimodal network is based on the combination of concatenation and averaging the feature vectors. Each modality complemented the shortcomings of the other modalities, demonstrating that increasing the number of modalities can be an effective method for improving the performance of multi-label and multimodal classification problems.","PeriodicalId":271992,"journal":{"name":"2022 IEEE 2nd Conference on Information Technology and Data Science (CITDS)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Multimodal E-Commerce Product Classification Using Hierarchical Fusion\",\"authors\":\"Tsegaye Misikir Tashu, Sara Fattouh, Peter Kiss, Tomáš Horváth\",\"doi\":\"10.1109/CITDS54976.2022.9914136\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this work, we present a multi-modal model for commercial product classification, that combines features extracted by multiple neural network models from textual (Camem-BERT and FlauBERT) and visual data (SE-ResNeXt-50), using simple fusion techniques. The proposed method significantly outperformed the performance of the unimodal models, as well as the reported performance of similar models on our specific task. We made experiments with multiple fusing techniques, and found, that the best preforming technique to combine the individual embedding of the unimodal network is based on the combination of concatenation and averaging the feature vectors. Each modality complemented the shortcomings of the other modalities, demonstrating that increasing the number of modalities can be an effective method for improving the performance of multi-label and multimodal classification problems.\",\"PeriodicalId\":271992,\"journal\":{\"name\":\"2022 IEEE 2nd Conference on Information Technology and Data Science (CITDS)\",\"volume\":\"27 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-05-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE 2nd Conference on Information Technology and Data Science (CITDS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CITDS54976.2022.9914136\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 2nd Conference on Information Technology and Data Science (CITDS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CITDS54976.2022.9914136","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Multimodal E-Commerce Product Classification Using Hierarchical Fusion
In this work, we present a multi-modal model for commercial product classification, that combines features extracted by multiple neural network models from textual (Camem-BERT and FlauBERT) and visual data (SE-ResNeXt-50), using simple fusion techniques. The proposed method significantly outperformed the performance of the unimodal models, as well as the reported performance of similar models on our specific task. We made experiments with multiple fusing techniques, and found, that the best preforming technique to combine the individual embedding of the unimodal network is based on the combination of concatenation and averaging the feature vectors. Each modality complemented the shortcomings of the other modalities, demonstrating that increasing the number of modalities can be an effective method for improving the performance of multi-label and multimodal classification problems.