{"title":"Social Media Text Classification Method Based on Character-Word Feature Self-attention Learning","authors":"王晓莉, 叶东毅","doi":"10.16451/J.CNKI.ISSN1003-6059.202004001","DOIUrl":null,"url":null,"abstract":"Long tail effect and excessive out-of-vocabulary(OOV)words in social media texts result in severe feature sparsity and reduce classification accuracy.To solve the problem,a social media text classification method based on character-word feature self-attention learning is proposed.Global features are constructed at the character level to learn attention weight distribution,and the existing multi-head attention mechanism is improved to reduce parameter scale and computational complexity.To further analyze character-word feature fusion,OOV sensitivity is proposed to measure the impact of OOV words on different types of features.Experiments on several social media text classification tasks indicate that the effectiveness and classification accuracy of the proposed method are obviously improved in terms of fusing word features and character features.Moreover,the quantitative results of OOV vocabulary sensitivity index verify the feasiblity and effectiveness of the proposed method.","PeriodicalId":34917,"journal":{"name":"模式识别与人工智能","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"模式识别与人工智能","FirstCategoryId":"1093","ListUrlMain":"https://doi.org/10.16451/J.CNKI.ISSN1003-6059.202004001","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"Computer Science","Score":null,"Total":0}
引用次数: 0
Abstract
Long tail effect and excessive out-of-vocabulary(OOV)words in social media texts result in severe feature sparsity and reduce classification accuracy.To solve the problem,a social media text classification method based on character-word feature self-attention learning is proposed.Global features are constructed at the character level to learn attention weight distribution,and the existing multi-head attention mechanism is improved to reduce parameter scale and computational complexity.To further analyze character-word feature fusion,OOV sensitivity is proposed to measure the impact of OOV words on different types of features.Experiments on several social media text classification tasks indicate that the effectiveness and classification accuracy of the proposed method are obviously improved in terms of fusing word features and character features.Moreover,the quantitative results of OOV vocabulary sensitivity index verify the feasiblity and effectiveness of the proposed method.