Dimension reduction techniques for accessing Chinese readability

Yaw-Huei Chen, Ting-Chia Lin
{"title":"Dimension reduction techniques for accessing Chinese readability","authors":"Yaw-Huei Chen, Ting-Chia Lin","doi":"10.1109/ICMLC.2014.7009154","DOIUrl":null,"url":null,"abstract":"Machine learning-based techniques have been used to assess document readability in recent studies. One of the important issues of machine learning-based text classification techniques is to reduce the dimension of the document vectors. Different feature selection and feature extraction methods such as mutual information, chi-square test, information gain, PCA, and LSA are compared for assessing Chinese readability. We also compare classification techniques SVM and LDA. The experimental results indicate that the combination of chi-square feature selection method and SVM performs well.","PeriodicalId":335296,"journal":{"name":"2014 International Conference on Machine Learning and Cybernetics","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 International Conference on Machine Learning and Cybernetics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMLC.2014.7009154","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6

Abstract

Machine learning-based techniques have been used to assess document readability in recent studies. One of the important issues of machine learning-based text classification techniques is to reduce the dimension of the document vectors. Different feature selection and feature extraction methods such as mutual information, chi-square test, information gain, PCA, and LSA are compared for assessing Chinese readability. We also compare classification techniques SVM and LDA. The experimental results indicate that the combination of chi-square feature selection method and SVM performs well.
中文易读性的降维技术
在最近的研究中,基于机器学习的技术已被用于评估文档的可读性。基于机器学习的文本分类技术的一个重要问题是降低文档向量的维数。比较了互信息、卡方检验、信息增益、主成分分析和LSA等不同的特征选择和特征提取方法对中文可读性的影响。我们还比较了SVM和LDA的分类技术。实验结果表明,卡方特征选择方法与支持向量机相结合具有较好的效果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信