{"title":"基于语义概念主题模型的跨文档知识发现","authors":"Xin Li, W. Jin","doi":"10.1109/ICMLA.2016.0026","DOIUrl":null,"url":null,"abstract":"Topic models employ the Bag-of-Words (BOW) representation, which break terms into constituent words and treat words as surface strings without assuming predefined knowledge about word meaning. In this paper, we propose the Semantic Concept Latent Dirichlet Allocation (SCLDA) and Semantic Concept Hierarchical Dirichlet Process (SCHDP) based approaches by representing text as meaningful concepts rather than words, using a new model known as Bag-of-Concepts (BOC). We propose new algorithms of applying SCLDA and SCHDP into the Concept Chain Queries (CCQ) problem. The algorithms are focused on discovering new semantic relationships between two concepts across documents where relationships found reveal semantic paths linking two concepts across multiple text units. The experiments demonstrate the search quality has been greatly improved, compared with using other LDA or HDP based approaches.","PeriodicalId":356182,"journal":{"name":"2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Cross-Document Knowledge Discovery Using Semantic Concept Topic Model\",\"authors\":\"Xin Li, W. Jin\",\"doi\":\"10.1109/ICMLA.2016.0026\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Topic models employ the Bag-of-Words (BOW) representation, which break terms into constituent words and treat words as surface strings without assuming predefined knowledge about word meaning. In this paper, we propose the Semantic Concept Latent Dirichlet Allocation (SCLDA) and Semantic Concept Hierarchical Dirichlet Process (SCHDP) based approaches by representing text as meaningful concepts rather than words, using a new model known as Bag-of-Concepts (BOC). We propose new algorithms of applying SCLDA and SCHDP into the Concept Chain Queries (CCQ) problem. The algorithms are focused on discovering new semantic relationships between two concepts across documents where relationships found reveal semantic paths linking two concepts across multiple text units. The experiments demonstrate the search quality has been greatly improved, compared with using other LDA or HDP based approaches.\",\"PeriodicalId\":356182,\"journal\":{\"name\":\"2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA)\",\"volume\":\"34 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICMLA.2016.0026\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMLA.2016.0026","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Cross-Document Knowledge Discovery Using Semantic Concept Topic Model
Topic models employ the Bag-of-Words (BOW) representation, which break terms into constituent words and treat words as surface strings without assuming predefined knowledge about word meaning. In this paper, we propose the Semantic Concept Latent Dirichlet Allocation (SCLDA) and Semantic Concept Hierarchical Dirichlet Process (SCHDP) based approaches by representing text as meaningful concepts rather than words, using a new model known as Bag-of-Concepts (BOC). We propose new algorithms of applying SCLDA and SCHDP into the Concept Chain Queries (CCQ) problem. The algorithms are focused on discovering new semantic relationships between two concepts across documents where relationships found reveal semantic paths linking two concepts across multiple text units. The experiments demonstrate the search quality has been greatly improved, compared with using other LDA or HDP based approaches.