适合于文本分类的基于核的学习

T. Jo, Malrey Lee
{"title":"适合于文本分类的基于核的学习","authors":"T. Jo, Malrey Lee","doi":"10.1109/SERA.2007.97","DOIUrl":null,"url":null,"abstract":"This research proposes a new strategy where documents are encoded into string vectors for text categorization and modified versions of SVM to be adaptable to string vectors. Traditionally, when the supervised machine learning algorithms are used for pattern classification, raw data should be encoded into numerical vectors. This encoding may be difficult, depending on a given application area of pattern classification. For example, in text categorization, encoding full texts given as raw data into numerical vectors leads to two main problems: huge dimensionality and sparse distribution. In this research, we encode full texts into string vectors, and apply the SVM to string vectors for text categorization.","PeriodicalId":181543,"journal":{"name":"5th ACIS International Conference on Software Engineering Research, Management & Applications (SERA 2007)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2007-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Kernel based Learning Suitable for Text Categorization\",\"authors\":\"T. Jo, Malrey Lee\",\"doi\":\"10.1109/SERA.2007.97\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This research proposes a new strategy where documents are encoded into string vectors for text categorization and modified versions of SVM to be adaptable to string vectors. Traditionally, when the supervised machine learning algorithms are used for pattern classification, raw data should be encoded into numerical vectors. This encoding may be difficult, depending on a given application area of pattern classification. For example, in text categorization, encoding full texts given as raw data into numerical vectors leads to two main problems: huge dimensionality and sparse distribution. In this research, we encode full texts into string vectors, and apply the SVM to string vectors for text categorization.\",\"PeriodicalId\":181543,\"journal\":{\"name\":\"5th ACIS International Conference on Software Engineering Research, Management & Applications (SERA 2007)\",\"volume\":\"5 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2007-08-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"5th ACIS International Conference on Software Engineering Research, Management & Applications (SERA 2007)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SERA.2007.97\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"5th ACIS International Conference on Software Engineering Research, Management & Applications (SERA 2007)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SERA.2007.97","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

本研究提出了一种新的策略,将文档编码成字符串向量进行文本分类,并修改支持向量机的版本以适应字符串向量。传统上,当使用监督机器学习算法进行模式分类时,应将原始数据编码为数值向量。这种编码可能比较困难,这取决于模式分类的给定应用领域。例如,在文本分类中,将全文作为原始数据编码为数值向量会导致两个主要问题:巨大的维度和稀疏的分布。在本研究中,我们将全文编码为字符串向量,并将支持向量机应用于字符串向量进行文本分类。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Kernel based Learning Suitable for Text Categorization
This research proposes a new strategy where documents are encoded into string vectors for text categorization and modified versions of SVM to be adaptable to string vectors. Traditionally, when the supervised machine learning algorithms are used for pattern classification, raw data should be encoded into numerical vectors. This encoding may be difficult, depending on a given application area of pattern classification. For example, in text categorization, encoding full texts given as raw data into numerical vectors leads to two main problems: huge dimensionality and sparse distribution. In this research, we encode full texts into string vectors, and apply the SVM to string vectors for text categorization.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信