{"title":"邀请演讲#2越南语神经语言模型在有限资源下的NLP任务","authors":"Q. T. Tho","doi":"10.1109/NICS.2018.8606865","DOIUrl":null,"url":null,"abstract":"A statistical language model is a probability distribution over sequences of words. Language modeling is used in various computing tasks such as speech recognition, machine translation, optical character and handwriting recognition and information retrieval and other applications. Whereas n-gram is considered as a traditional language model, neural language model has been emerging recently as a means to approximate the probability of a sentence using neural networks and word embeddings. An advantage of a neural language model is that it can be further applied to other NLP tasks where the training datasets may be limited. In this talk, we realize this idea by introducing the usage of a Vietnamese neural model language trained from a large corpus of social media data. When further applying this neural model language with other NLP tasks including entity recognition, spam detection and topic modeling with relatively small training datasets; we witness improved performance achieved, as compared to other existing approaches using deep learning with typical word embedding techniques.","PeriodicalId":137666,"journal":{"name":"2018 5th NAFOSTED Conference on Information and Computer Science (NICS)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Invited Talk #2 Vietnamese Neural Language Model for NLP Tasks With Limited Resources\",\"authors\":\"Q. T. Tho\",\"doi\":\"10.1109/NICS.2018.8606865\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A statistical language model is a probability distribution over sequences of words. Language modeling is used in various computing tasks such as speech recognition, machine translation, optical character and handwriting recognition and information retrieval and other applications. Whereas n-gram is considered as a traditional language model, neural language model has been emerging recently as a means to approximate the probability of a sentence using neural networks and word embeddings. An advantage of a neural language model is that it can be further applied to other NLP tasks where the training datasets may be limited. In this talk, we realize this idea by introducing the usage of a Vietnamese neural model language trained from a large corpus of social media data. When further applying this neural model language with other NLP tasks including entity recognition, spam detection and topic modeling with relatively small training datasets; we witness improved performance achieved, as compared to other existing approaches using deep learning with typical word embedding techniques.\",\"PeriodicalId\":137666,\"journal\":{\"name\":\"2018 5th NAFOSTED Conference on Information and Computer Science (NICS)\",\"volume\":\"31 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 5th NAFOSTED Conference on Information and Computer Science (NICS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/NICS.2018.8606865\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 5th NAFOSTED Conference on Information and Computer Science (NICS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NICS.2018.8606865","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Invited Talk #2 Vietnamese Neural Language Model for NLP Tasks With Limited Resources
A statistical language model is a probability distribution over sequences of words. Language modeling is used in various computing tasks such as speech recognition, machine translation, optical character and handwriting recognition and information retrieval and other applications. Whereas n-gram is considered as a traditional language model, neural language model has been emerging recently as a means to approximate the probability of a sentence using neural networks and word embeddings. An advantage of a neural language model is that it can be further applied to other NLP tasks where the training datasets may be limited. In this talk, we realize this idea by introducing the usage of a Vietnamese neural model language trained from a large corpus of social media data. When further applying this neural model language with other NLP tasks including entity recognition, spam detection and topic modeling with relatively small training datasets; we witness improved performance achieved, as compared to other existing approaches using deep learning with typical word embedding techniques.