{"title":"Glyph spotting for mediaeval handwritings by template matching","authors":"Jan-Hendrik Worch, Mathias Lawo, B. Gottfried","doi":"10.1145/2361354.2361401","DOIUrl":"https://doi.org/10.1145/2361354.2361401","url":null,"abstract":"This paper reports on the analysis of different approaches in order to search for glyphs within handwritten mediaeval documents. As layout analysis methods are difficult to apply to the documents at hand, template matching methods are employed. A number of different shape descriptions are used to filter out false positives, since the application of correlation coefficients alone results in too many matches. The overall goal consists in the interactive support of an editor who is transcribing a given handwriting. For this purpose, the automatic spotting of glyphs enables the editor to compare glyphs within different contexts.","PeriodicalId":91385,"journal":{"name":"Proceedings of the ACM Symposium on Document Engineering. ACM Symposium on Document Engineering","volume":"60 1","pages":"213-216"},"PeriodicalIF":0.0,"publicationDate":"2012-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82632172","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Structured and fragmented content in collaborative XML publishing chains","authors":"Stéphane Crozat","doi":"10.1145/2361354.2361388","DOIUrl":"https://doi.org/10.1145/2361354.2361388","url":null,"abstract":"In this paper, we present the main results of the C2M project through one of its operational deliverable: the Scenari4 collaborative editing and publishing system for XML content. The purpose of the C2M project was to design a system able to manage structured and fragmented contents - as XML publishing chains do - while providing collaborative possibilities - as Enterprise Content Management systems (ECM) do. The main issue is related to transclusion relationships which are massively used in XML publishing chains, in order to support repurposing without copying. This approach is not compatible with the classical way ECMs manage content, especially in terms of propagation of modifications, rights or transactions management. We propose two complementary solutions to manage two different levels of collaboration. The workspace is designed as a highly dynamic place able to deal with live fragments, linked together in a network, that can be easily updated at any time by any user. The library is a more static and more classical way to manage content, dedicated to folder-documents, which are XML frozen versions of sub-networks extracted from workspaces. While workspaces are dedicated to content elaboration and maintenance, libraries are places to store, to read, or to exchange stable documents. Scenari4 is released under FLOSS license and has been being used in several experimental and commercial contexts since the beginning of 2012.","PeriodicalId":91385,"journal":{"name":"Proceedings of the ACM Symposium on Document Engineering. ACM Symposium on Document Engineering","volume":"33 1","pages":"145-148"},"PeriodicalIF":0.0,"publicationDate":"2012-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83641509","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Faceted documents: describing document characteristics using semantic lenses","authors":"S. Peroni, D. Shotton, F. Vitali","doi":"10.1145/2361354.2361396","DOIUrl":"https://doi.org/10.1145/2361354.2361396","url":null,"abstract":"The semantic enhancement of a traditional scientific paper is not a straightforward operation, since it involves many different aspects or facets. In this paper we propose eight different semantic lenses through which these facets may be viewed, and describe and exemplify the ontologies by which these lenses may be implemented.","PeriodicalId":91385,"journal":{"name":"Proceedings of the ACM Symposium on Document Engineering. ACM Symposium on Document Engineering","volume":"1 1","pages":"191-194"},"PeriodicalIF":0.0,"publicationDate":"2012-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91525914","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
C. Mahlow, C. Grün, Alexander Holupirek, M. Scholl
{"title":"A framework for retrieval and annotation in digital humanities using XQuery full text and update in BaseX","authors":"C. Mahlow, C. Grün, Alexander Holupirek, M. Scholl","doi":"10.1145/2361354.2361398","DOIUrl":"https://doi.org/10.1145/2361354.2361398","url":null,"abstract":"A key difference between traditional humanities research and the emerging field of digital humanities is that the latter aims to complement qualitative methods with quantitative data. In linguistics, this means the use of large corpora of text, which are usually annotated automatically using natural language processing tools. However, these tools do not exist for historical texts, so scholars have to work with unannotated data. We have developed a system for systematic, iterative exploration and annotation of historical text corpora, which relies on an XML database (BaseX) and in particular on the Full Text and Update facilities of XQuery.","PeriodicalId":91385,"journal":{"name":"Proceedings of the ACM Symposium on Document Engineering. ACM Symposium on Document Engineering","volume":"33 1","pages":"195-204"},"PeriodicalIF":0.0,"publicationDate":"2012-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91278779","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Max C. Göbel, Tamir Hassan, Ermelinda Oro, G. Orsi
{"title":"A methodology for evaluating algorithms for table understanding in PDF documents","authors":"Max C. Göbel, Tamir Hassan, Ermelinda Oro, G. Orsi","doi":"10.1145/2361354.2361365","DOIUrl":"https://doi.org/10.1145/2361354.2361365","url":null,"abstract":"This paper presents a methodology for the evaluation of table understanding algorithms for PDF documents. The evaluation takes into account three major tasks: table detection, table structure recognition and functional analysis. We provide a general and flexible output model for each task along with corresponding evaluation metrics and methods. We also present a methodology for collecting and ground-truthing PDF documents based on consensus-reaching principles and provide a publicly available ground-truthed dataset.","PeriodicalId":91385,"journal":{"name":"Proceedings of the ACM Symposium on Document Engineering. ACM Symposium on Document Engineering","volume":"261 1","pages":"45-48"},"PeriodicalIF":0.0,"publicationDate":"2012-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76740919","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Structure-conforming XML document transformation based on graph homomorphism","authors":"Tyng-Ruey Chuang, Hui-Yin Wu","doi":"10.1145/2361354.2361376","DOIUrl":"https://doi.org/10.1145/2361354.2361376","url":null,"abstract":"We propose a principled method to specify XML document transformation so that the outcome of a transformation can be ensured to conform to certain structural constraints as required by the target XML document type. We view XML document types as graphs, and model transformations as relations between the two graphs. Starting from this abstraction, we use and extend graph homomorphism as a formalism for the specifications of transformations between XML document types. A specification can then be checked to ensure whether results from the transformation will always be structure-conforming.","PeriodicalId":91385,"journal":{"name":"Proceedings of the ACM Symposium on Document Engineering. ACM Symposium on Document Engineering","volume":"19 1","pages":"99-102"},"PeriodicalIF":0.0,"publicationDate":"2012-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75567107","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bill Janssen, E. Saund, E. Bier, Patricia Wall, M. Sprague
{"title":"Receipts2Go: the big world of small documents","authors":"Bill Janssen, E. Saund, E. Bier, Patricia Wall, M. Sprague","doi":"10.1145/2361354.2361381","DOIUrl":"https://doi.org/10.1145/2361354.2361381","url":null,"abstract":"The Receipts2Go system is about the world of one-page documents: cash register receipts, book covers, cereal boxes, price tags, train tickets, fire extinguisher tags. In that world, we're exploring techniques for extracting accurate information from documents for which we have no layout descriptions -- indeed no initial idea of what the document's genre is -- using photos taken with cell phone cameras by users who aren't skilled document capture technicians. This paper outlines the system and reports on some initial results, including the algorithms we've found useful for cleaning up those document images, and the techniques used to extract and organize relevant information from thousands of similar-but-different page layouts.","PeriodicalId":91385,"journal":{"name":"Proceedings of the ACM Symposium on Document Engineering. ACM Symposium on Document Engineering","volume":"146 1","pages":"121-124"},"PeriodicalIF":0.0,"publicationDate":"2012-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72714221","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pierrick Tranouez, Stéphane Nicolas, Vladislavs Dovgalecs, A. Burnett, L. Heutte, Yiqing Liang, R. Guest, M. Fairhurst
{"title":"DocExplore: overcoming cultural and physical barriers to access ancient documents","authors":"Pierrick Tranouez, Stéphane Nicolas, Vladislavs Dovgalecs, A. Burnett, L. Heutte, Yiqing Liang, R. Guest, M. Fairhurst","doi":"10.1145/2361354.2361399","DOIUrl":"https://doi.org/10.1145/2361354.2361399","url":null,"abstract":"In this paper, we describe DocExplore, an integrated software suite centered on the handling of digitized documents with an emphasis on ancient manuscripts. This software suite allows the augmentation and exploration of ancient documents of cultural interest. Specialists can add textual and multimedia data and metadata to digitized documents through a graphical interface that does not require technical knowledge. They are helped in this endeavor by sophisticated document analysis tools that allows for instance to spot words or patterns in images of documents. The suite is intended to ease considerably the process of bringing locked away historical materials to the attention of the general public by covering all the steps from managing a digital collection to creating interactive presentations suited for cultural exhibitions. Its genesis and sustained development reside in a collaboration of archivists, historians and computer scientists, the latter being not only in charge of the development of the software, but also of creating and incorporating novel pattern recognition for document analysis techniques.","PeriodicalId":91385,"journal":{"name":"Proceedings of the ACM Symposium on Document Engineering. ACM Symposium on Document Engineering","volume":"4 1","pages":"205-208"},"PeriodicalIF":0.0,"publicationDate":"2012-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88207589","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Carlos de Salles Soares Neto, H. F. Pinto, L. Soares
{"title":"TAL processor for hypermedia applications","authors":"Carlos de Salles Soares Neto, H. F. Pinto, L. Soares","doi":"10.1145/2361354.2361369","DOIUrl":"https://doi.org/10.1145/2361354.2361369","url":null,"abstract":"TAL (Template Authoring Language) is a specification language for hypermedia document templates. Templates describe application families with structural and semantic similarities. In TAL, templates not only define design patterns that applications must follow, but also constraints on the use of these patterns. A template must be processed together with a padding document giving rise to a new document in some specification language, called target language. TAL supports the description of templates independently of the languages used to specify target and padding documents. Usually a specific processor is required for each target language and for each padding document used. This paper concerns TAL processors. However, we should note that the proposal can be easily extended to any other solution used to define templates. Any pattern language and any language used to define constraints could be used instead of TAL. The TAL processor architecture is general and it is discussed when presenting the processor framework. As an instantiation example, an implementation of a TAL Processor targeting NCL (the declarative language of Ginga DTV middleware) is examined, and also another one targeting HTML-based middleware. The use of wizards for defining padding documents is also discussed in the examples of the proposed architecture instantiation.","PeriodicalId":91385,"journal":{"name":"Proceedings of the ACM Symposium on Document Engineering. ACM Symposium on Document Engineering","volume":"114 1","pages":"69-78"},"PeriodicalIF":0.0,"publicationDate":"2012-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80085378","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"XML query-update independence analysis revisited","authors":"Muhammad Junedi, P. Genevès, Nabil Layaïda","doi":"10.1145/2361354.2361375","DOIUrl":"https://doi.org/10.1145/2361354.2361375","url":null,"abstract":"XML transformations can be resource-costly in particular when applied to very large XML documents and document sets. Those transformations usually involve lots of XPath queries and may not need to be entirely re-executed following an update of the input document. In this context, a given query is said to be independent of a given update if, for any XML document, the results of the query are not affected by the update. We revisit Benedikt and Cheney's framework for query-update independence analysis and show that performance can be drastically enhanced, contradicting their initial claims. The essence of our approach and results resides in the use of an appropriate logic, to which queries and updates are both succinctly translated. Compared to previous approaches, ours is more expressive from a theoretical point of view, equally accurate, and more efficient in practice. We illustrate this through practical experiments and comparative figures.","PeriodicalId":91385,"journal":{"name":"Proceedings of the ACM Symposium on Document Engineering. ACM Symposium on Document Engineering","volume":"123 1","pages":"95-98"},"PeriodicalIF":0.0,"publicationDate":"2012-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75810826","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}