A Comparative Evaluation of PDF-to-HTML Conversion Tools

Pramodya Pathirana, Asini Silva, Thenuka Lawrence, T. Weerasinghe, Roshan Abeyweera
{"title":"A Comparative Evaluation of PDF-to-HTML Conversion Tools","authors":"Pramodya Pathirana, Asini Silva, Thenuka Lawrence, T. Weerasinghe, Roshan Abeyweera","doi":"10.1109/SCSE59836.2023.10214989","DOIUrl":null,"url":null,"abstract":"PDF (Portable Document Format) is a popular file format used for sharing and storing documents across different platforms. However, there are occasions when the content of a PDF document needs to be re-purposed for online use. PDF-toHTML conversion is a common method used to achieve this goal. This research paper presents a comparative evaluation of existing PDF-to-HTML conversion tools for their suitability in extracting text and images. These tools were tested using school textbooks in Sri Lanka, which contain complex text formatting and non-textual elements. The evaluation was based on various criteria, such as the accuracy of the output, handling of complex text formatting, and non-textual elements. Comparisons were drawn based on the performance of each of these tools with respect to the criteria. The study provides useful insights for individuals and organizations looking to re-purpose PDF content for online use in the HTML format, particularly in the education sector.","PeriodicalId":429228,"journal":{"name":"2023 International Research Conference on Smart Computing and Systems Engineering (SCSE)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 International Research Conference on Smart Computing and Systems Engineering (SCSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SCSE59836.2023.10214989","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

PDF (Portable Document Format) is a popular file format used for sharing and storing documents across different platforms. However, there are occasions when the content of a PDF document needs to be re-purposed for online use. PDF-toHTML conversion is a common method used to achieve this goal. This research paper presents a comparative evaluation of existing PDF-to-HTML conversion tools for their suitability in extracting text and images. These tools were tested using school textbooks in Sri Lanka, which contain complex text formatting and non-textual elements. The evaluation was based on various criteria, such as the accuracy of the output, handling of complex text formatting, and non-textual elements. Comparisons were drawn based on the performance of each of these tools with respect to the criteria. The study provides useful insights for individuals and organizations looking to re-purpose PDF content for online use in the HTML format, particularly in the education sector.
pdf到html转换工具的比较评估
PDF(可移植文档格式)是一种流行的文件格式,用于跨不同平台共享和存储文档。但是,在某些情况下,PDF文档的内容需要重新用于在线使用。pdf -to - html转换是实现这一目标的常用方法。本研究报告对现有PDF-to-HTML转换工具在提取文本和图像方面的适用性进行了比较评估。这些工具在斯里兰卡使用学校教科书进行了测试,这些教科书包含复杂的文本格式和非文本元素。评估基于各种标准,例如输出的准确性、复杂文本格式的处理和非文本元素。根据每个工具相对于标准的性能进行比较。该研究为希望将PDF内容转换成HTML格式用于在线使用的个人和组织提供了有用的见解,特别是在教育部门。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信