Ha Nguyen Thi Thu, Tinh Dao Thanh, T. Hai, Vinh Ho Ngoc
{"title":"基于核心术语的越南语主题建模及其在文本分类中的应用","authors":"Ha Nguyen Thi Thu, Tinh Dao Thanh, T. Hai, Vinh Ho Ngoc","doi":"10.1109/CSNT.2015.22","DOIUrl":null,"url":null,"abstract":"In the languages, the occur of words are indicated about meaning of contents in text. Generative models for text, such as the topic model, have the potential to make important contributions to the statistical analysis of large document collections, and the development of a deeper understanding of human language learning and processing. In this paper, we proposed a novel method for building Vietnamese topic model based on core terms and conditional probability. With this approach, we reduced cost of time for building corpus. After that, we perform with Vietnamese text classification and the experimental show that, this corpus will help text classification system really effectively than traditional methods, higher accuracy and reduced complex data processing.","PeriodicalId":334733,"journal":{"name":"2015 Fifth International Conference on Communication Systems and Network Technologies","volume":"52 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"Building Vietnamese Topic Modeling Based on Core Terms and Applying in Text Classification\",\"authors\":\"Ha Nguyen Thi Thu, Tinh Dao Thanh, T. Hai, Vinh Ho Ngoc\",\"doi\":\"10.1109/CSNT.2015.22\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In the languages, the occur of words are indicated about meaning of contents in text. Generative models for text, such as the topic model, have the potential to make important contributions to the statistical analysis of large document collections, and the development of a deeper understanding of human language learning and processing. In this paper, we proposed a novel method for building Vietnamese topic model based on core terms and conditional probability. With this approach, we reduced cost of time for building corpus. After that, we perform with Vietnamese text classification and the experimental show that, this corpus will help text classification system really effectively than traditional methods, higher accuracy and reduced complex data processing.\",\"PeriodicalId\":334733,\"journal\":{\"name\":\"2015 Fifth International Conference on Communication Systems and Network Technologies\",\"volume\":\"52 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-04-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 Fifth International Conference on Communication Systems and Network Technologies\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CSNT.2015.22\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 Fifth International Conference on Communication Systems and Network Technologies","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CSNT.2015.22","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Building Vietnamese Topic Modeling Based on Core Terms and Applying in Text Classification
In the languages, the occur of words are indicated about meaning of contents in text. Generative models for text, such as the topic model, have the potential to make important contributions to the statistical analysis of large document collections, and the development of a deeper understanding of human language learning and processing. In this paper, we proposed a novel method for building Vietnamese topic model based on core terms and conditional probability. With this approach, we reduced cost of time for building corpus. After that, we perform with Vietnamese text classification and the experimental show that, this corpus will help text classification system really effectively than traditional methods, higher accuracy and reduced complex data processing.