{"title":"Arabic text classification using linear discriminant analysis","authors":"Fawaz S. Al-Anzi, Dia AbuZeina","doi":"10.1109/ICEMIS.2017.8272958","DOIUrl":null,"url":null,"abstract":"The linear discriminant analysis (LDA) is a dimensionality reduction technique that is widely used in pattern recognition applications. The LDA aims at generating effective feature vectors by reducing the dimensions of the original data (e.g. bag-of-words textual representation) into a lower dimensional space. Hence, the LDA is a convenient method for text classification that is known by huge dimensional feature vectors. In this paper, we empirically investigated two LDA based methods for Arabic text classification. The first method is based on computing the generalized eigenvectors of the ratio (between-class to within-class) scatters, the second method includes linear classification functions that assume equal population covariance matrices (i.e. pooled sample covariance matrix). We used a textual data collection that contains 1,750 documents belong to five categories. The testing set contains 250 documents belong to five categories (50 documents for each category). The experimental results show that the linear classification functions method outperforms the eigenvalue decomposition method. We emphasize that the goal of this work is to demonstrate how to employ the LDA algorithm in text classification rather than comparing the performance with other well-known text classification algorithms.","PeriodicalId":117908,"journal":{"name":"2017 International Conference on Engineering & MIS (ICEMIS)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 International Conference on Engineering & MIS (ICEMIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICEMIS.2017.8272958","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
The linear discriminant analysis (LDA) is a dimensionality reduction technique that is widely used in pattern recognition applications. The LDA aims at generating effective feature vectors by reducing the dimensions of the original data (e.g. bag-of-words textual representation) into a lower dimensional space. Hence, the LDA is a convenient method for text classification that is known by huge dimensional feature vectors. In this paper, we empirically investigated two LDA based methods for Arabic text classification. The first method is based on computing the generalized eigenvectors of the ratio (between-class to within-class) scatters, the second method includes linear classification functions that assume equal population covariance matrices (i.e. pooled sample covariance matrix). We used a textual data collection that contains 1,750 documents belong to five categories. The testing set contains 250 documents belong to five categories (50 documents for each category). The experimental results show that the linear classification functions method outperforms the eigenvalue decomposition method. We emphasize that the goal of this work is to demonstrate how to employ the LDA algorithm in text classification rather than comparing the performance with other well-known text classification algorithms.