{"title":"评估统计短语和潜在语义索引在文本分类中的效用","authors":"H. Wu, D. Gunopulos","doi":"10.1109/ICDM.2002.1184036","DOIUrl":null,"url":null,"abstract":"The term-based vector space model is a prominent technique for retrieving textual information. In this paper we examine the usefulness of phrases as terms in vector-based document classification. We focus on statistical techniques to extract both adjacent and window phrases from documents. We discover that the positive effect of adding phrase terms is very limited, if we have already achieved good performance using single-word terms, even when SVD/LSI is used as the dimensionality reduction method.","PeriodicalId":405340,"journal":{"name":"2002 IEEE International Conference on Data Mining, 2002. Proceedings.","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2002-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"24","resultStr":"{\"title\":\"Evaluating the utility of statistical phrases and latent semantic indexing for text classification\",\"authors\":\"H. Wu, D. Gunopulos\",\"doi\":\"10.1109/ICDM.2002.1184036\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The term-based vector space model is a prominent technique for retrieving textual information. In this paper we examine the usefulness of phrases as terms in vector-based document classification. We focus on statistical techniques to extract both adjacent and window phrases from documents. We discover that the positive effect of adding phrase terms is very limited, if we have already achieved good performance using single-word terms, even when SVD/LSI is used as the dimensionality reduction method.\",\"PeriodicalId\":405340,\"journal\":{\"name\":\"2002 IEEE International Conference on Data Mining, 2002. Proceedings.\",\"volume\":\"6 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2002-12-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"24\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2002 IEEE International Conference on Data Mining, 2002. Proceedings.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDM.2002.1184036\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2002 IEEE International Conference on Data Mining, 2002. Proceedings.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDM.2002.1184036","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Evaluating the utility of statistical phrases and latent semantic indexing for text classification
The term-based vector space model is a prominent technique for retrieving textual information. In this paper we examine the usefulness of phrases as terms in vector-based document classification. We focus on statistical techniques to extract both adjacent and window phrases from documents. We discover that the positive effect of adding phrase terms is very limited, if we have already achieved good performance using single-word terms, even when SVD/LSI is used as the dimensionality reduction method.