{"title":"原生数字化文档页面布局分析的基础真实性与性能评价","authors":"Xin Tao, Zhi Tang, Canhui Xu, Liangcai Gao","doi":"10.1109/DAS.2014.37","DOIUrl":null,"url":null,"abstract":"In this paper, a new dataset is proposed for page layout analysis of born-digital documents. By extracting uniformly the document contents, an XML based data format is designed in terms of raw data and structure data. Utilizing a self-developed ground-truthing tool, a public dataset is constructed from diverse styles of document resources. With consideration of physical segmentation and logical labeling, automatic performance evaluation methods are adjusted to cope with different scenarios. The applications of the proposed dataset have shown that it is suitable for evaluating various layout analysis tasks.","PeriodicalId":220495,"journal":{"name":"2014 11th IAPR International Workshop on Document Analysis Systems","volume":"67 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Ground-Truth and Performance Evaluation for Page Layout Analysis of Born-Digital Documents\",\"authors\":\"Xin Tao, Zhi Tang, Canhui Xu, Liangcai Gao\",\"doi\":\"10.1109/DAS.2014.37\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, a new dataset is proposed for page layout analysis of born-digital documents. By extracting uniformly the document contents, an XML based data format is designed in terms of raw data and structure data. Utilizing a self-developed ground-truthing tool, a public dataset is constructed from diverse styles of document resources. With consideration of physical segmentation and logical labeling, automatic performance evaluation methods are adjusted to cope with different scenarios. The applications of the proposed dataset have shown that it is suitable for evaluating various layout analysis tasks.\",\"PeriodicalId\":220495,\"journal\":{\"name\":\"2014 11th IAPR International Workshop on Document Analysis Systems\",\"volume\":\"67 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-04-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 11th IAPR International Workshop on Document Analysis Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/DAS.2014.37\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 11th IAPR International Workshop on Document Analysis Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DAS.2014.37","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Ground-Truth and Performance Evaluation for Page Layout Analysis of Born-Digital Documents
In this paper, a new dataset is proposed for page layout analysis of born-digital documents. By extracting uniformly the document contents, an XML based data format is designed in terms of raw data and structure data. Utilizing a self-developed ground-truthing tool, a public dataset is constructed from diverse styles of document resources. With consideration of physical segmentation and logical labeling, automatic performance evaluation methods are adjusted to cope with different scenarios. The applications of the proposed dataset have shown that it is suitable for evaluating various layout analysis tasks.