{"title":"Multi-granularity Feature Fusion Algorithm for Short Chinese Texts Based on Hierarchical Attention Networks","authors":"Zhifeng Lu, Hao-dong Xia, Wenxing Hong","doi":"10.1145/3579654.3579715","DOIUrl":null,"url":null,"abstract":"Chinese short texts comprises a small number of words and many ambiguities, making it challenging to extract semantic information. The mainstream approach of extracting semantic characteristics from Chinese short texts is to combine character and word granularity, although this method suffers from partial loss of semantic features extraction. To address this issue, this study provides a multi-granularity feature fusion technique that combines character, word, pinyin, and radical granularity. Meanwhile, in order to solve the problem of misspelled words in short Chinese texts, we introduce Hierarchical Attention Networks in the model to assign more attention weights to the correct words. The studies show that our model(MGCHA) can successfully improve the performance of semantic matching for short Chinese texts on the LCQMC and BQ datasets.","PeriodicalId":146783,"journal":{"name":"Proceedings of the 2022 5th International Conference on Algorithms, Computing and Artificial Intelligence","volume":"29 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2022 5th International Conference on Algorithms, Computing and Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3579654.3579715","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Chinese short texts comprises a small number of words and many ambiguities, making it challenging to extract semantic information. The mainstream approach of extracting semantic characteristics from Chinese short texts is to combine character and word granularity, although this method suffers from partial loss of semantic features extraction. To address this issue, this study provides a multi-granularity feature fusion technique that combines character, word, pinyin, and radical granularity. Meanwhile, in order to solve the problem of misspelled words in short Chinese texts, we introduce Hierarchical Attention Networks in the model to assign more attention weights to the correct words. The studies show that our model(MGCHA) can successfully improve the performance of semantic matching for short Chinese texts on the LCQMC and BQ datasets.