{"title":"面向电子书分类的深度学习协同训练框架","authors":"Tsui-Ping Chang, Hung-Ming Chen, Jian-Qun Chen","doi":"10.1109/IS3C50286.2020.00103","DOIUrl":null,"url":null,"abstract":"Automatic e-book classification is an important research issue since more and more people read and acquire information on their mobile devices (i.e., smartphones). Many writers digitize their works for users to acquire data on their mobile devices and result in the number of e-books has grown significantly. An e-book is a digital or electronic book that is formatted into a file that can be read on a mobile device. Some features of books are used in e-books. However, the important difference is that an e-book has a lot of images to describe the contents or knowledge of writers. Many researches proposed their methods of e-book classification to make users easily to find out and then read an e-book on their mobile devices. In these methods, an e-book can be categorized by several criteria. One of it is based on its type, i.e., novel, reference, and encyclopedia. Another is based on its topic, i.e., economy, religion, and technical. The classification systems based on the topics usually use well-known methodology such as Dewey Decimal Classification, in which, every category reflected by a decimal. These researches use Naïve Bayes Classifier and focus on automatic thesis classification. As deep learning proves its usefulness in an ever greater number of applications, there is a rise in demand for faster computational resources to train ever complex learning-based models. Based on the concept of deep learning, some researches proposed their methods to automatic e-book classification. W. A. Wiegand proposed a convolutional-neural-network (CNN) book label recognition algorithm to find out the misplaced books. On the other hand, four steps are illustrated in X. Yang et al. First, the keywords are extracted from the description data of e-books. Then, the description data is modeled as vectors of keywords. Third, the statistical categorization rules are obtained from meta-information of e-books. Finally, the vectors and the statistical categorization rules are combined to obtain an classification model. In this paper, a deep learning co-training framework (namely DLC) is proposed for improving the accuracy of automatic e-book classification. DLC combines the features of texts and images in e-books to co-training an e-book classifier. In order to increase the variety of feature sets, the texts in e-books are represented as vectors by Word2Vec. The images in e-books are translated and then combined into the vectors. Furthermore, DLC adopts the softmax regression function to co-train the combined features (i.e., vectors and images) in CNN to improve the accuracy of e-book classification. The experimental results demonstrate that our DLC has higher accuracy than other e-book classifiers.","PeriodicalId":143430,"journal":{"name":"2020 International Symposium on Computer, Consumer and Control (IS3C)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Deep Learning Co-training Framework for e-book Classification\",\"authors\":\"Tsui-Ping Chang, Hung-Ming Chen, Jian-Qun Chen\",\"doi\":\"10.1109/IS3C50286.2020.00103\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Automatic e-book classification is an important research issue since more and more people read and acquire information on their mobile devices (i.e., smartphones). Many writers digitize their works for users to acquire data on their mobile devices and result in the number of e-books has grown significantly. An e-book is a digital or electronic book that is formatted into a file that can be read on a mobile device. Some features of books are used in e-books. However, the important difference is that an e-book has a lot of images to describe the contents or knowledge of writers. Many researches proposed their methods of e-book classification to make users easily to find out and then read an e-book on their mobile devices. In these methods, an e-book can be categorized by several criteria. One of it is based on its type, i.e., novel, reference, and encyclopedia. Another is based on its topic, i.e., economy, religion, and technical. The classification systems based on the topics usually use well-known methodology such as Dewey Decimal Classification, in which, every category reflected by a decimal. These researches use Naïve Bayes Classifier and focus on automatic thesis classification. As deep learning proves its usefulness in an ever greater number of applications, there is a rise in demand for faster computational resources to train ever complex learning-based models. Based on the concept of deep learning, some researches proposed their methods to automatic e-book classification. W. A. Wiegand proposed a convolutional-neural-network (CNN) book label recognition algorithm to find out the misplaced books. On the other hand, four steps are illustrated in X. Yang et al. First, the keywords are extracted from the description data of e-books. Then, the description data is modeled as vectors of keywords. Third, the statistical categorization rules are obtained from meta-information of e-books. Finally, the vectors and the statistical categorization rules are combined to obtain an classification model. In this paper, a deep learning co-training framework (namely DLC) is proposed for improving the accuracy of automatic e-book classification. DLC combines the features of texts and images in e-books to co-training an e-book classifier. In order to increase the variety of feature sets, the texts in e-books are represented as vectors by Word2Vec. The images in e-books are translated and then combined into the vectors. Furthermore, DLC adopts the softmax regression function to co-train the combined features (i.e., vectors and images) in CNN to improve the accuracy of e-book classification. The experimental results demonstrate that our DLC has higher accuracy than other e-book classifiers.\",\"PeriodicalId\":143430,\"journal\":{\"name\":\"2020 International Symposium on Computer, Consumer and Control (IS3C)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 International Symposium on Computer, Consumer and Control (IS3C)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IS3C50286.2020.00103\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 International Symposium on Computer, Consumer and Control (IS3C)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IS3C50286.2020.00103","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
随着越来越多的人通过移动设备(即智能手机)阅读和获取信息,电子书自动分类是一个重要的研究问题。许多作家将他们的作品数字化,以便用户在他们的移动设备上获取数据,导致电子书的数量大幅增加。电子书是一种被格式化成文件的电子书,可以在移动设备上阅读。电子书中使用了书籍的一些功能。然而,重要的区别是电子书有很多图像来描述作者的内容或知识。很多研究都提出了自己的电子书分类方法,使用户可以很容易地在移动设备上找到并阅读电子书。在这些方法中,电子书可以根据几个标准进行分类。其中之一是基于它的类型,即小说、参考文献和百科全书。另一种是基于其主题,即经济,宗教和技术。基于主题的分类系统通常采用著名的杜威十进分类法,其中每个类别都用十进制表示。这些研究使用Naïve贝叶斯分类器,重点研究论文自动分类。随着深度学习在越来越多的应用中证明了它的实用性,对更快的计算资源的需求也在增加,以训练越来越复杂的基于学习的模型。基于深度学习的概念,一些研究提出了电子书自动分类的方法。W. a . Wiegand提出了一种卷积神经网络(CNN)的图书标签识别算法,用于查找丢失的图书。另一方面,X. Yang等人给出了四个步骤。首先,从电子书的描述数据中提取关键词。然后,将描述数据建模为关键词向量。第三,从电子书元信息中得到统计分类规则。最后,将向量与统计分类规则相结合,得到分类模型。为了提高电子书自动分类的准确率,本文提出了一种深度学习协同训练框架(即DLC)。DLC结合了电子书中文本和图像的特征来共同训练电子书分类器。为了增加特征集的多样性,电子书中的文本用Word2Vec表示为向量。电子书中的图像经过翻译,然后组合成矢量。DLC采用softmax回归函数对CNN中的组合特征(即向量和图像)进行共训练,提高电子书分类的准确率。实验结果表明,我们的DLC分类器比其他电子书分类器具有更高的准确率。
A Deep Learning Co-training Framework for e-book Classification
Automatic e-book classification is an important research issue since more and more people read and acquire information on their mobile devices (i.e., smartphones). Many writers digitize their works for users to acquire data on their mobile devices and result in the number of e-books has grown significantly. An e-book is a digital or electronic book that is formatted into a file that can be read on a mobile device. Some features of books are used in e-books. However, the important difference is that an e-book has a lot of images to describe the contents or knowledge of writers. Many researches proposed their methods of e-book classification to make users easily to find out and then read an e-book on their mobile devices. In these methods, an e-book can be categorized by several criteria. One of it is based on its type, i.e., novel, reference, and encyclopedia. Another is based on its topic, i.e., economy, religion, and technical. The classification systems based on the topics usually use well-known methodology such as Dewey Decimal Classification, in which, every category reflected by a decimal. These researches use Naïve Bayes Classifier and focus on automatic thesis classification. As deep learning proves its usefulness in an ever greater number of applications, there is a rise in demand for faster computational resources to train ever complex learning-based models. Based on the concept of deep learning, some researches proposed their methods to automatic e-book classification. W. A. Wiegand proposed a convolutional-neural-network (CNN) book label recognition algorithm to find out the misplaced books. On the other hand, four steps are illustrated in X. Yang et al. First, the keywords are extracted from the description data of e-books. Then, the description data is modeled as vectors of keywords. Third, the statistical categorization rules are obtained from meta-information of e-books. Finally, the vectors and the statistical categorization rules are combined to obtain an classification model. In this paper, a deep learning co-training framework (namely DLC) is proposed for improving the accuracy of automatic e-book classification. DLC combines the features of texts and images in e-books to co-training an e-book classifier. In order to increase the variety of feature sets, the texts in e-books are represented as vectors by Word2Vec. The images in e-books are translated and then combined into the vectors. Furthermore, DLC adopts the softmax regression function to co-train the combined features (i.e., vectors and images) in CNN to improve the accuracy of e-book classification. The experimental results demonstrate that our DLC has higher accuracy than other e-book classifiers.