基于语境Word2Vec模型的在线社交媒体词汇外汉语理解

IF 4.1 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

International Journal on Semantic Web and Information Systems Pub Date : 2022-01-01 DOI:10.4018/ijswis.309428

Jiakai Gu, Gen Li, Nam D. Vo, Jason J. Jung

{"title":"基于语境Word2Vec模型的在线社交媒体词汇外汉语理解","authors":"Jiakai Gu, Gen Li, Nam D. Vo, Jason J. Jung","doi":"10.4018/ijswis.309428","DOIUrl":null,"url":null,"abstract":"In this chapter, the authors propose to use contextual Word2Vec model for understanding OOV (out of vocabulary). The OOV is extracted by using left-right entropy and point information entropy. They choose to use Word2Vec to construct the word vector space and CBOW (continuous bag of words) to obtain the contextual information of the words. If there is a word that has similar contextual information to the OOV, the word can be used to understand the OOV. They chose the Weibo corpus as the dataset for the experiments. The results show that the proposed model achieves 97.10% accuracy, which is better than Skip-Gram by 8.53%.","PeriodicalId":54934,"journal":{"name":"International Journal on Semantic Web and Information Systems","volume":"36 1","pages":"1-14"},"PeriodicalIF":4.1000,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Contextual Word2Vec Model for Understanding Chinese Out of Vocabularies on Online Social Media\",\"authors\":\"Jiakai Gu, Gen Li, Nam D. Vo, Jason J. Jung\",\"doi\":\"10.4018/ijswis.309428\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this chapter, the authors propose to use contextual Word2Vec model for understanding OOV (out of vocabulary). The OOV is extracted by using left-right entropy and point information entropy. They choose to use Word2Vec to construct the word vector space and CBOW (continuous bag of words) to obtain the contextual information of the words. If there is a word that has similar contextual information to the OOV, the word can be used to understand the OOV. They chose the Weibo corpus as the dataset for the experiments. The results show that the proposed model achieves 97.10% accuracy, which is better than Skip-Gram by 8.53%.\",\"PeriodicalId\":54934,\"journal\":{\"name\":\"International Journal on Semantic Web and Information Systems\",\"volume\":\"36 1\",\"pages\":\"1-14\"},\"PeriodicalIF\":4.1000,\"publicationDate\":\"2022-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal on Semantic Web and Information Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.4018/ijswis.309428\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal on Semantic Web and Information Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.4018/ijswis.309428","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 5

摘要

在本章中，作者建议使用上下文Word2Vec模型来理解OOV (out of vocabulary)。利用左右熵和点信息熵提取OOV。他们选择使用Word2Vec来构建词向量空间，使用CBOW (continuous bag of words)来获取词的上下文信息。如果有一个单词与OOV具有相似的上下文信息，则可以使用该单词来理解OOV。他们选择微博语料库作为实验的数据集。结果表明，该模型的准确率为97.10%，比Skip-Gram高8.53%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Contextual Word2Vec Model for Understanding Chinese Out of Vocabularies on Online Social Media

In this chapter, the authors propose to use contextual Word2Vec model for understanding OOV (out of vocabulary). The OOV is extracted by using left-right entropy and point information entropy. They choose to use Word2Vec to construct the word vector space and CBOW (continuous bag of words) to obtain the contextual information of the words. If there is a word that has similar contextual information to the OOV, the word can be used to understand the OOV. They chose the Weibo corpus as the dataset for the experiments. The results show that the proposed model achieves 97.10% accuracy, which is better than Skip-Gram by 8.53%.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

International Journal on Semantic Web and Information Systems 工程技术-计算机：人工智能

CiteScore

6.20

自引率

12.50%

发文量

审稿时长

20 months

期刊介绍： The International Journal on Semantic Web and Information Systems (IJSWIS) promotes a knowledge transfer channel where academics, practitioners, and researchers can discuss, analyze, criticize, synthesize, communicate, elaborate, and simplify the more-than-promising technology of the semantic Web in the context of information systems. The journal aims to establish value-adding knowledge transfer and personal development channels in three distinctive areas: academia, industry, and government.