A Machine Learning Approach to Detecting Start Reading Location of eBooks

S. Bodapati, S. Ramaswamy, G. Narayanan
{"title":"A Machine Learning Approach to Detecting Start Reading Location of eBooks","authors":"S. Bodapati, S. Ramaswamy, G. Narayanan","doi":"10.1109/ICBK.2018.00038","DOIUrl":null,"url":null,"abstract":"Machine Learning and NLP (Natural Language Processing) have aided the development of new and improved user experience features in many applications. We address the problem of automatically identifying the \"Start Reading Location\" (SRL) of eBooks, i.e. the location of the logical beginning or start of main content. This improves eBook reading experience by taking users automatically to the logical start location without requiring them to flip through several front-matter sections such as \"Dedication\" and \"About the Author\". Automatic identification of SRL is complex since many eBooks do not adhere to any well-defined convention with respect to section naming, formatting and layout patterns. We formulate SRL as a classification problem based on detailed rule-based and NLP-based classification schemes. Our models are being used in production for Kindle eBooks and have led to a 400% increase in coverage (number of books which had SRL stamped) compared to what could be achieved earlier through an entirely manual process, while also maintaining a high accuracy of 95%.","PeriodicalId":144958,"journal":{"name":"2018 IEEE International Conference on Big Knowledge (ICBK)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE International Conference on Big Knowledge (ICBK)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICBK.2018.00038","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

Machine Learning and NLP (Natural Language Processing) have aided the development of new and improved user experience features in many applications. We address the problem of automatically identifying the "Start Reading Location" (SRL) of eBooks, i.e. the location of the logical beginning or start of main content. This improves eBook reading experience by taking users automatically to the logical start location without requiring them to flip through several front-matter sections such as "Dedication" and "About the Author". Automatic identification of SRL is complex since many eBooks do not adhere to any well-defined convention with respect to section naming, formatting and layout patterns. We formulate SRL as a classification problem based on detailed rule-based and NLP-based classification schemes. Our models are being used in production for Kindle eBooks and have led to a 400% increase in coverage (number of books which had SRL stamped) compared to what could be achieved earlier through an entirely manual process, while also maintaining a high accuracy of 95%.
一种检测电子书开始阅读位置的机器学习方法
机器学习和NLP(自然语言处理)在许多应用程序中帮助开发新的和改进的用户体验功能。我们解决了自动识别电子书的“开始阅读位置”(SRL)的问题,即逻辑开始或主要内容开始的位置。这将使用户自动进入合乎逻辑的起始位置,而不需要他们翻看诸如“奉献”和“关于作者”这样的前半部分,从而改善了电子书阅读体验。SRL的自动识别是复杂的,因为许多电子书在章节命名、格式和布局模式方面没有遵循任何良好定义的约定。我们将SRL描述为基于详细规则和基于nlp的分类方案的分类问题。我们的模型被用于Kindle电子书的生产,与之前完全通过手工过程可以实现的相比,覆盖率(有SRL盖章的书籍数量)增加了400%,同时也保持了95%的高精度。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信