{"title":"Glyph spotting for mediaeval handwritings by template matching","authors":"Jan-Hendrik Worch, Mathias Lawo, B. Gottfried","doi":"10.1145/2361354.2361401","DOIUrl":"https://doi.org/10.1145/2361354.2361401","url":null,"abstract":"This paper reports on the analysis of different approaches in order to search for glyphs within handwritten mediaeval documents. As layout analysis methods are difficult to apply to the documents at hand, template matching methods are employed. A number of different shape descriptions are used to filter out false positives, since the application of correlation coefficients alone results in too many matches. The overall goal consists in the interactive support of an editor who is transcribing a given handwriting. For this purpose, the automatic spotting of glyphs enables the editor to compare glyphs within different contexts.","PeriodicalId":91385,"journal":{"name":"Proceedings of the ACM Symposium on Document Engineering. ACM Symposium on Document Engineering","volume":"60 1","pages":"213-216"},"PeriodicalIF":0.0,"publicationDate":"2012-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82632172","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Structured and fragmented content in collaborative XML publishing chains","authors":"Stéphane Crozat","doi":"10.1145/2361354.2361388","DOIUrl":"https://doi.org/10.1145/2361354.2361388","url":null,"abstract":"In this paper, we present the main results of the C2M project through one of its operational deliverable: the Scenari4 collaborative editing and publishing system for XML content. The purpose of the C2M project was to design a system able to manage structured and fragmented contents - as XML publishing chains do - while providing collaborative possibilities - as Enterprise Content Management systems (ECM) do. The main issue is related to transclusion relationships which are massively used in XML publishing chains, in order to support repurposing without copying. This approach is not compatible with the classical way ECMs manage content, especially in terms of propagation of modifications, rights or transactions management. We propose two complementary solutions to manage two different levels of collaboration. The workspace is designed as a highly dynamic place able to deal with live fragments, linked together in a network, that can be easily updated at any time by any user. The library is a more static and more classical way to manage content, dedicated to folder-documents, which are XML frozen versions of sub-networks extracted from workspaces. While workspaces are dedicated to content elaboration and maintenance, libraries are places to store, to read, or to exchange stable documents. Scenari4 is released under FLOSS license and has been being used in several experimental and commercial contexts since the beginning of 2012.","PeriodicalId":91385,"journal":{"name":"Proceedings of the ACM Symposium on Document Engineering. ACM Symposium on Document Engineering","volume":"33 1","pages":"145-148"},"PeriodicalIF":0.0,"publicationDate":"2012-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83641509","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yogalakshmi Jayabal, Chandrashekar Ramanathan, M. Sheth
{"title":"Challenges in generating bookmarks from TOC entries in e-books","authors":"Yogalakshmi Jayabal, Chandrashekar Ramanathan, M. Sheth","doi":"10.1145/2361354.2361363","DOIUrl":"https://doi.org/10.1145/2361354.2361363","url":null,"abstract":"ABSTRACT The task of extracting document structures from a digital e-book is difficult and is an active area of research. On the other hand, many e-books already have a table of contents (TOC) at the beginning of the document. This may lead us to believe that adding bookmarks into digital document (e-book) based on the existing TOC would be trivial. In this paper, we highlight the challenges involved in this task of automatically adding bookmarks to an existing e-book based on the TOC that exists within the document. If we are able to reliably identify the specific locations of each TOC entry within the document, the algorithms can be easily extended to identify document structures within e-books that have TOC. We describe a tool we have built called Booky that tries to add automatic PDF bookmarks to existing PDF based e-books as they have TOC as part of the document content. The tool addresses most of the challenges that have been identified while still leaving a few tricky scenarios still open.","PeriodicalId":91385,"journal":{"name":"Proceedings of the ACM Symposium on Document Engineering. ACM Symposium on Document Engineering","volume":"62 1","pages":"37-40"},"PeriodicalIF":0.0,"publicationDate":"2012-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86104494","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Faceted documents: describing document characteristics using semantic lenses","authors":"S. Peroni, D. Shotton, F. Vitali","doi":"10.1145/2361354.2361396","DOIUrl":"https://doi.org/10.1145/2361354.2361396","url":null,"abstract":"The semantic enhancement of a traditional scientific paper is not a straightforward operation, since it involves many different aspects or facets. In this paper we propose eight different semantic lenses through which these facets may be viewed, and describe and exemplify the ontologies by which these lenses may be implemented.","PeriodicalId":91385,"journal":{"name":"Proceedings of the ACM Symposium on Document Engineering. ACM Symposium on Document Engineering","volume":"1 1","pages":"191-194"},"PeriodicalIF":0.0,"publicationDate":"2012-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91525914","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
César F. Acebal, B. Bos, M. Rodríguez, J. M. C. Lovelle
{"title":"ALMcss: a javascript implementation of the CSS template layout module","authors":"César F. Acebal, B. Bos, M. Rodríguez, J. M. C. Lovelle","doi":"10.1145/2361354.2361360","DOIUrl":"https://doi.org/10.1145/2361354.2361360","url":null,"abstract":"Traditionally, web standards in general and Cascading Style Sheets (CSS) in particular take a long time from when they are defined by the W3C until they are implemented by browser vendors. This has been a limitation not only for authors, who had to wait even years before they were able to use certain CSS properties in their web pages, but also for the creators of the specification itself, who were not able to test their proposals in practice.\u0000 In this paper we present ALMcss, a JavaScript prototype that implements the CSS Template Layout Module, a proposal for an addition to CSS to make it a more capable layout language. It has been developed inside the W3C CSS Working Group by two of the authors of this paper. We present the rationale of the module and an introduction to its syntax, before discussing the design of our prototype.\u0000 ALMcss has served us as a proof of concept that the Template Layout Module is not only feasible, but it can be in fact implemented in current web browsers using just JavaScript and the Document Object Model (DOM). In addition, ALMcss allows web designers to start to use today the new layout capabilities of CSS that the module provides, even before it becomes an official W3C specification.","PeriodicalId":91385,"journal":{"name":"Proceedings of the ACM Symposium on Document Engineering. ACM Symposium on Document Engineering","volume":"10 1","pages":"23-32"},"PeriodicalIF":0.0,"publicationDate":"2012-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88182088","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Structure-conforming XML document transformation based on graph homomorphism","authors":"Tyng-Ruey Chuang, Hui-Yin Wu","doi":"10.1145/2361354.2361376","DOIUrl":"https://doi.org/10.1145/2361354.2361376","url":null,"abstract":"We propose a principled method to specify XML document transformation so that the outcome of a transformation can be ensured to conform to certain structural constraints as required by the target XML document type. We view XML document types as graphs, and model transformations as relations between the two graphs. Starting from this abstraction, we use and extend graph homomorphism as a formalism for the specifications of transformations between XML document types. A specification can then be checked to ensure whether results from the transformation will always be structure-conforming.","PeriodicalId":91385,"journal":{"name":"Proceedings of the ACM Symposium on Document Engineering. ACM Symposium on Document Engineering","volume":"19 1","pages":"99-102"},"PeriodicalIF":0.0,"publicationDate":"2012-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75567107","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
C. Mahlow, C. Grün, Alexander Holupirek, M. Scholl
{"title":"A framework for retrieval and annotation in digital humanities using XQuery full text and update in BaseX","authors":"C. Mahlow, C. Grün, Alexander Holupirek, M. Scholl","doi":"10.1145/2361354.2361398","DOIUrl":"https://doi.org/10.1145/2361354.2361398","url":null,"abstract":"A key difference between traditional humanities research and the emerging field of digital humanities is that the latter aims to complement qualitative methods with quantitative data. In linguistics, this means the use of large corpora of text, which are usually annotated automatically using natural language processing tools. However, these tools do not exist for historical texts, so scholars have to work with unannotated data. We have developed a system for systematic, iterative exploration and annotation of historical text corpora, which relies on an XML database (BaseX) and in particular on the Full Text and Update facilities of XQuery.","PeriodicalId":91385,"journal":{"name":"Proceedings of the ACM Symposium on Document Engineering. ACM Symposium on Document Engineering","volume":"33 1","pages":"195-204"},"PeriodicalIF":0.0,"publicationDate":"2012-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91278779","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pierrick Tranouez, Stéphane Nicolas, Vladislavs Dovgalecs, A. Burnett, L. Heutte, Yiqing Liang, R. Guest, M. Fairhurst
{"title":"DocExplore: overcoming cultural and physical barriers to access ancient documents","authors":"Pierrick Tranouez, Stéphane Nicolas, Vladislavs Dovgalecs, A. Burnett, L. Heutte, Yiqing Liang, R. Guest, M. Fairhurst","doi":"10.1145/2361354.2361399","DOIUrl":"https://doi.org/10.1145/2361354.2361399","url":null,"abstract":"In this paper, we describe DocExplore, an integrated software suite centered on the handling of digitized documents with an emphasis on ancient manuscripts. This software suite allows the augmentation and exploration of ancient documents of cultural interest. Specialists can add textual and multimedia data and metadata to digitized documents through a graphical interface that does not require technical knowledge. They are helped in this endeavor by sophisticated document analysis tools that allows for instance to spot words or patterns in images of documents. The suite is intended to ease considerably the process of bringing locked away historical materials to the attention of the general public by covering all the steps from managing a digital collection to creating interactive presentations suited for cultural exhibitions. Its genesis and sustained development reside in a collaboration of archivists, historians and computer scientists, the latter being not only in charge of the development of the software, but also of creating and incorporating novel pattern recognition for document analysis techniques.","PeriodicalId":91385,"journal":{"name":"Proceedings of the ACM Symposium on Document Engineering. ACM Symposium on Document Engineering","volume":"4 1","pages":"205-208"},"PeriodicalIF":0.0,"publicationDate":"2012-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88207589","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Carlos de Salles Soares Neto, H. F. Pinto, L. Soares
{"title":"TAL processor for hypermedia applications","authors":"Carlos de Salles Soares Neto, H. F. Pinto, L. Soares","doi":"10.1145/2361354.2361369","DOIUrl":"https://doi.org/10.1145/2361354.2361369","url":null,"abstract":"TAL (Template Authoring Language) is a specification language for hypermedia document templates. Templates describe application families with structural and semantic similarities. In TAL, templates not only define design patterns that applications must follow, but also constraints on the use of these patterns. A template must be processed together with a padding document giving rise to a new document in some specification language, called target language. TAL supports the description of templates independently of the languages used to specify target and padding documents. Usually a specific processor is required for each target language and for each padding document used. This paper concerns TAL processors. However, we should note that the proposal can be easily extended to any other solution used to define templates. Any pattern language and any language used to define constraints could be used instead of TAL. The TAL processor architecture is general and it is discussed when presenting the processor framework. As an instantiation example, an implementation of a TAL Processor targeting NCL (the declarative language of Ginga DTV middleware) is examined, and also another one targeting HTML-based middleware. The use of wizards for defining padding documents is also discussed in the examples of the proposed architecture instantiation.","PeriodicalId":91385,"journal":{"name":"Proceedings of the ACM Symposium on Document Engineering. ACM Symposium on Document Engineering","volume":"114 1","pages":"69-78"},"PeriodicalIF":0.0,"publicationDate":"2012-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80085378","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"XML query-update independence analysis revisited","authors":"Muhammad Junedi, P. Genevès, Nabil Layaïda","doi":"10.1145/2361354.2361375","DOIUrl":"https://doi.org/10.1145/2361354.2361375","url":null,"abstract":"XML transformations can be resource-costly in particular when applied to very large XML documents and document sets. Those transformations usually involve lots of XPath queries and may not need to be entirely re-executed following an update of the input document. In this context, a given query is said to be independent of a given update if, for any XML document, the results of the query are not affected by the update. We revisit Benedikt and Cheney's framework for query-update independence analysis and show that performance can be drastically enhanced, contradicting their initial claims. The essence of our approach and results resides in the use of an appropriate logic, to which queries and updates are both succinctly translated. Compared to previous approaches, ours is more expressive from a theoretical point of view, equally accurate, and more efficient in practice. We illustrate this through practical experiments and comparative figures.","PeriodicalId":91385,"journal":{"name":"Proceedings of the ACM Symposium on Document Engineering. ACM Symposium on Document Engineering","volume":"123 1","pages":"95-98"},"PeriodicalIF":0.0,"publicationDate":"2012-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75810826","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}