Sérgio D. Canuto, Thiago Salles, Thierson Couto, Marcos André Gonçalves
{"title":"基于相似度的文本分类元特征生成合成文档表示","authors":"Sérgio D. Canuto, Thiago Salles, Thierson Couto, Marcos André Gonçalves","doi":"10.1145/3331184.3331239","DOIUrl":null,"url":null,"abstract":"We propose new solutions that enhance and extend the already very successful application of meta-features to text classification. Our newly proposed meta-features are capable of: (1) improving the correlation of small pieces of evidence shared by neighbors with labeled categories by means of synthetic document representations and (local and global) hyperplane distances; and (2) estimating the level of error introduced by these newly proposed and the existing meta-features in the literature, specially for hard-to-classify regions of the feature space. Our experiments with large and representative number of datasets show that our new solutions produce the best results in all tested scenarios, achieving gains of up to 12% over the strongest meta-feature proposal of the literature.","PeriodicalId":20700,"journal":{"name":"Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2019-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":"{\"title\":\"Similarity-Based Synthetic Document Representations for Meta-Feature Generation in Text Classification\",\"authors\":\"Sérgio D. Canuto, Thiago Salles, Thierson Couto, Marcos André Gonçalves\",\"doi\":\"10.1145/3331184.3331239\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We propose new solutions that enhance and extend the already very successful application of meta-features to text classification. Our newly proposed meta-features are capable of: (1) improving the correlation of small pieces of evidence shared by neighbors with labeled categories by means of synthetic document representations and (local and global) hyperplane distances; and (2) estimating the level of error introduced by these newly proposed and the existing meta-features in the literature, specially for hard-to-classify regions of the feature space. Our experiments with large and representative number of datasets show that our new solutions produce the best results in all tested scenarios, achieving gains of up to 12% over the strongest meta-feature proposal of the literature.\",\"PeriodicalId\":20700,\"journal\":{\"name\":\"Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-07-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"15\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3331184.3331239\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3331184.3331239","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Similarity-Based Synthetic Document Representations for Meta-Feature Generation in Text Classification
We propose new solutions that enhance and extend the already very successful application of meta-features to text classification. Our newly proposed meta-features are capable of: (1) improving the correlation of small pieces of evidence shared by neighbors with labeled categories by means of synthetic document representations and (local and global) hyperplane distances; and (2) estimating the level of error introduced by these newly proposed and the existing meta-features in the literature, specially for hard-to-classify regions of the feature space. Our experiments with large and representative number of datasets show that our new solutions produce the best results in all tested scenarios, achieving gains of up to 12% over the strongest meta-feature proposal of the literature.