{"title":"基于语篇分割的电子书自动分类特征选择方法","authors":"Jiunn-Liang Guo, Hei-Chia Wang, Ming-Way Lai","doi":"10.1108/PROG-12-2012-0071","DOIUrl":null,"url":null,"abstract":"Purpose – The purpose of this paper is to develop a novel feature selection approach for automatic text classification of large digital documents – e-books of online library system. The main idea mainly aims on automatically identifying the discourse features in order to improving the feature selection process rather than focussing on the size of the corpus. Design/methodology/approach – The proposed framework intends to automatically identify the discourse segments within e-books and capture proper discourse subtopics that are cohesively expressed in discourse segments and treating these subtopics as informative and prominent features. The selected set of features is then used to train and perform the e-book classification task based on the support vector machine technique. Findings – The evaluation of the proposed framework shows that identifying discourse segments and capturing subtopic features leads to better performance, in comparison with two conventional feature selection techniques: TFIDF and mut...","PeriodicalId":49663,"journal":{"name":"Program-Electronic Library and Information Systems","volume":"49 1","pages":"2-22"},"PeriodicalIF":0.0000,"publicationDate":"2015-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1108/PROG-12-2012-0071","citationCount":"4","resultStr":"{\"title\":\"A feature selection approach for automatic e-book classification based on discourse segmentation\",\"authors\":\"Jiunn-Liang Guo, Hei-Chia Wang, Ming-Way Lai\",\"doi\":\"10.1108/PROG-12-2012-0071\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Purpose – The purpose of this paper is to develop a novel feature selection approach for automatic text classification of large digital documents – e-books of online library system. The main idea mainly aims on automatically identifying the discourse features in order to improving the feature selection process rather than focussing on the size of the corpus. Design/methodology/approach – The proposed framework intends to automatically identify the discourse segments within e-books and capture proper discourse subtopics that are cohesively expressed in discourse segments and treating these subtopics as informative and prominent features. The selected set of features is then used to train and perform the e-book classification task based on the support vector machine technique. Findings – The evaluation of the proposed framework shows that identifying discourse segments and capturing subtopic features leads to better performance, in comparison with two conventional feature selection techniques: TFIDF and mut...\",\"PeriodicalId\":49663,\"journal\":{\"name\":\"Program-Electronic Library and Information Systems\",\"volume\":\"49 1\",\"pages\":\"2-22\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-01-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.1108/PROG-12-2012-0071\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Program-Electronic Library and Information Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1108/PROG-12-2012-0071\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q\",\"JCRName\":\"Social Sciences\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Program-Electronic Library and Information Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1108/PROG-12-2012-0071","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q","JCRName":"Social Sciences","Score":null,"Total":0}
A feature selection approach for automatic e-book classification based on discourse segmentation
Purpose – The purpose of this paper is to develop a novel feature selection approach for automatic text classification of large digital documents – e-books of online library system. The main idea mainly aims on automatically identifying the discourse features in order to improving the feature selection process rather than focussing on the size of the corpus. Design/methodology/approach – The proposed framework intends to automatically identify the discourse segments within e-books and capture proper discourse subtopics that are cohesively expressed in discourse segments and treating these subtopics as informative and prominent features. The selected set of features is then used to train and perform the e-book classification task based on the support vector machine technique. Findings – The evaluation of the proposed framework shows that identifying discourse segments and capturing subtopic features leads to better performance, in comparison with two conventional feature selection techniques: TFIDF and mut...
期刊介绍:
■Automation of library and information services ■Storage and retrieval of all forms of electronic information ■Delivery of information to end users ■Database design and management ■Techniques for storing and distributing information ■Networking and communications technology ■The Internet ■User interface design ■Procurement of systems ■User training and support ■System evaluation