{"title":"一种基于NMF的文档表示的多样化隐藏单元方法","authors":"X. Jiang, H. Zhang, R. Liu, Y. Zuo","doi":"10.1109/ICKEA.2016.7803001","DOIUrl":null,"url":null,"abstract":"Document modeling with hidden units as known as topics are very popular. Non-negative matrix factorization(NMF) is one of the most important techniques in document representation, which decomposes a document-term matrix into a document-topic matrix and a topic-term matrix. Since orthogonal constraint would limit terms occur only in one topic, we abandon this strong constraint. Furthermore, in order to represent documents in a certain number of topics with more semantic information, we add diversifying regularization and sparse constraint into NMF, which shows a great improvement in text classification and clustering. In the end, we draw the figure of topics similarities and display the top 20 weighted words in each topic to reveal that diversifying regularization can efficiently reduce the overlapping terms.","PeriodicalId":241850,"journal":{"name":"2016 IEEE International Conference on Knowledge Engineering and Applications (ICKEA)","volume":"60 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"A diversifying hidden units method based on NMF for document representation\",\"authors\":\"X. Jiang, H. Zhang, R. Liu, Y. Zuo\",\"doi\":\"10.1109/ICKEA.2016.7803001\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Document modeling with hidden units as known as topics are very popular. Non-negative matrix factorization(NMF) is one of the most important techniques in document representation, which decomposes a document-term matrix into a document-topic matrix and a topic-term matrix. Since orthogonal constraint would limit terms occur only in one topic, we abandon this strong constraint. Furthermore, in order to represent documents in a certain number of topics with more semantic information, we add diversifying regularization and sparse constraint into NMF, which shows a great improvement in text classification and clustering. In the end, we draw the figure of topics similarities and display the top 20 weighted words in each topic to reveal that diversifying regularization can efficiently reduce the overlapping terms.\",\"PeriodicalId\":241850,\"journal\":{\"name\":\"2016 IEEE International Conference on Knowledge Engineering and Applications (ICKEA)\",\"volume\":\"60 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 IEEE International Conference on Knowledge Engineering and Applications (ICKEA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICKEA.2016.7803001\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE International Conference on Knowledge Engineering and Applications (ICKEA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICKEA.2016.7803001","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A diversifying hidden units method based on NMF for document representation
Document modeling with hidden units as known as topics are very popular. Non-negative matrix factorization(NMF) is one of the most important techniques in document representation, which decomposes a document-term matrix into a document-topic matrix and a topic-term matrix. Since orthogonal constraint would limit terms occur only in one topic, we abandon this strong constraint. Furthermore, in order to represent documents in a certain number of topics with more semantic information, we add diversifying regularization and sparse constraint into NMF, which shows a great improvement in text classification and clustering. In the end, we draw the figure of topics similarities and display the top 20 weighted words in each topic to reveal that diversifying regularization can efficiently reduce the overlapping terms.