Yogalakshmi Jayabal, Chandrashekar Ramanathan, M. Sheth
{"title":"从电子书的TOC条目生成书签的挑战","authors":"Yogalakshmi Jayabal, Chandrashekar Ramanathan, M. Sheth","doi":"10.1145/2361354.2361363","DOIUrl":null,"url":null,"abstract":"ABSTRACT The task of extracting document structures from a digital e-book is difficult and is an active area of research. On the other hand, many e-books already have a table of contents (TOC) at the beginning of the document. This may lead us to believe that adding bookmarks into digital document (e-book) based on the existing TOC would be trivial. In this paper, we highlight the challenges involved in this task of automatically adding bookmarks to an existing e-book based on the TOC that exists within the document. If we are able to reliably identify the specific locations of each TOC entry within the document, the algorithms can be easily extended to identify document structures within e-books that have TOC. We describe a tool we have built called Booky that tries to add automatic PDF bookmarks to existing PDF based e-books as they have TOC as part of the document content. The tool addresses most of the challenges that have been identified while still leaving a few tricky scenarios still open.","PeriodicalId":91385,"journal":{"name":"Proceedings of the ACM Symposium on Document Engineering. ACM Symposium on Document Engineering","volume":"62 1","pages":"37-40"},"PeriodicalIF":0.0000,"publicationDate":"2012-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Challenges in generating bookmarks from TOC entries in e-books\",\"authors\":\"Yogalakshmi Jayabal, Chandrashekar Ramanathan, M. Sheth\",\"doi\":\"10.1145/2361354.2361363\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"ABSTRACT The task of extracting document structures from a digital e-book is difficult and is an active area of research. On the other hand, many e-books already have a table of contents (TOC) at the beginning of the document. This may lead us to believe that adding bookmarks into digital document (e-book) based on the existing TOC would be trivial. In this paper, we highlight the challenges involved in this task of automatically adding bookmarks to an existing e-book based on the TOC that exists within the document. If we are able to reliably identify the specific locations of each TOC entry within the document, the algorithms can be easily extended to identify document structures within e-books that have TOC. We describe a tool we have built called Booky that tries to add automatic PDF bookmarks to existing PDF based e-books as they have TOC as part of the document content. The tool addresses most of the challenges that have been identified while still leaving a few tricky scenarios still open.\",\"PeriodicalId\":91385,\"journal\":{\"name\":\"Proceedings of the ACM Symposium on Document Engineering. ACM Symposium on Document Engineering\",\"volume\":\"62 1\",\"pages\":\"37-40\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-09-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the ACM Symposium on Document Engineering. ACM Symposium on Document Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2361354.2361363\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ACM Symposium on Document Engineering. ACM Symposium on Document Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2361354.2361363","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Challenges in generating bookmarks from TOC entries in e-books
ABSTRACT The task of extracting document structures from a digital e-book is difficult and is an active area of research. On the other hand, many e-books already have a table of contents (TOC) at the beginning of the document. This may lead us to believe that adding bookmarks into digital document (e-book) based on the existing TOC would be trivial. In this paper, we highlight the challenges involved in this task of automatically adding bookmarks to an existing e-book based on the TOC that exists within the document. If we are able to reliably identify the specific locations of each TOC entry within the document, the algorithms can be easily extended to identify document structures within e-books that have TOC. We describe a tool we have built called Booky that tries to add automatic PDF bookmarks to existing PDF based e-books as they have TOC as part of the document content. The tool addresses most of the challenges that have been identified while still leaving a few tricky scenarios still open.