{"title":"Session details: Workshop and Tutorials","authors":"Sonja Maier","doi":"10.1145/3256809","DOIUrl":"https://doi.org/10.1145/3256809","url":null,"abstract":"","PeriodicalId":106339,"journal":{"name":"Proceedings of the 2015 ACM Symposium on Document Engineering","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125287939","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Documents as Data, Data as Documents: What we learned about Semi-Structured Information for our Open World of Cloud & Devices","authors":"J. Paoli","doi":"10.1145/2682571.2797070","DOIUrl":"https://doi.org/10.1145/2682571.2797070","url":null,"abstract":"Many of us always believed in a unique vision unifying documents and data through semantically-rich semi-structured information. This vision is even more critical today in our open interconnected world of Clouds and Devices. The last 20 years represents a real-life worldwide experiment in this area that fueled a massive set of market applications. In this talk, we review the history and trends of a lot of what is enabling today's core interchanges on the internet: from initial research adding document user interfaces to data, to the specification of structured documents, to the generalization of document markup techniques to the wide acceptance of document databases. We will also review our share of historical acronyms such as 'Star', 'Grif', 'OpenDoc', 'WorldWideWeb/Nexus', 'Amaya', 'InfoPath' 'HTML', 'SGML', 'XML', 'JSON', 'YAML', 'Markdown', 'Schema', 'Semantics','MongoDB', 'Hadoop', 'DocumentDB' and many others. We will then turn, cautiously and humbly, to the future and try to guess: what would the world need? And what do we need to think about to make it happen? We truly believe in the potential of the open Internet. We see pieces of information (that we once called \"Diamonds of the Internet\"), being created, shared, re-shaped, re-routed, modified by users or tiny small devices, understood through big data and machine learning, and processed by cloud services. We see the potential of fundamentally designing open platforms connected worldwide. By bridging technologies, we create higher level abstractions and thus more complex organisms (software) that can help everyone. But at the core remains the need for semi-structured open information fundamentally unifying documents and data.","PeriodicalId":106339,"journal":{"name":"Proceedings of the 2015 ACM Symposium on Document Engineering","volume":"121 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129042264","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"AERO: An Extensible Framework for Adaptive Web Layout Synthesis","authors":"R. Vernica, N. Venkata","doi":"10.1145/2682571.2797084","DOIUrl":"https://doi.org/10.1145/2682571.2797084","url":null,"abstract":"We present AERO, an extensible framework for adaptive web layout synthesis. The goal is to provide an underlying software architecture to allow general adaptive layout behaviors. The framework consists of a 1) a suite of templates specified in HTML/CSS, 2) A hierarchical, highly customizable scoring function specification and 3) An evaluation engine that leverages native browser rendering to rapidly render content and apply the scoring functions. Unlike current responsive layout frameworks for web (e.g., Twitter Bootstrap) that have pre-configured grid layouts that adapt in a manually pre-encoded content-independent manner, AERO allows layout to adapt automatically based on multiple content-dependent criteria like aesthetic quality, cropability of individual images, layout A/B testing results, Ad placement etc.","PeriodicalId":106339,"journal":{"name":"Proceedings of the 2015 ACM Symposium on Document Engineering","volume":"112 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121212068","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jamilson Batista, Rodolfo Ferreira, Hilário Tomaz, Rafael Ferreira, R. Lins, S. Simske, G. Silva, M. Riss
{"title":"A Quantitative and Qualitative Assessment of Automatic Text Summarization Systems","authors":"Jamilson Batista, Rodolfo Ferreira, Hilário Tomaz, Rafael Ferreira, R. Lins, S. Simske, G. Silva, M. Riss","doi":"10.1145/2682571.2797081","DOIUrl":"https://doi.org/10.1145/2682571.2797081","url":null,"abstract":"Text summarization is the process of automatically creating a shorter version of one or more text documents. This paper presents a qualitative and quantitative assessment of the 22 state-of-the-art extractive summarization systems using the CNN corpus, a dataset of 3,000 news articles.","PeriodicalId":106339,"journal":{"name":"Proceedings of the 2015 ACM Symposium on Document Engineering","volume":"78 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116361164","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Madoko: Scholarly Documents for the Web","authors":"Daan Leijen","doi":"10.1145/2682571.2797097","DOIUrl":"https://doi.org/10.1145/2682571.2797097","url":null,"abstract":"Madoko [8] is a novel authoring system for writing complex documents. It is especially well suited for complex academic or industrial documents, like scientific articles, reference manuals, or math-heavy presentations. It started out as a project to take a fresh look at how we write academic articles. In particular, we would like to satisfy the following requirements when writing complex documents:","PeriodicalId":106339,"journal":{"name":"Proceedings of the 2015 ACM Symposium on Document Engineering","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131497999","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Does a Split-View Aid Navigation Within Academic Documents?","authors":"Juliane Franze, K. Marriott, Michael Wybrow","doi":"10.1145/2682571.2797093","DOIUrl":"https://doi.org/10.1145/2682571.2797093","url":null,"abstract":"Paper is still the dominant medium in academic reading. One reason is the ease of navigation within a paper document. We therefore investigate how to provide a more paper-like navigation within an academic document when read digitally. We present the results of a user study in which we compare the standard single-view hyperlink navigation with a split-view navigation. The split-view offers the reader a primary reading view of the document as well as a contextual view next to it. When a hyperlink is activated in the reading view the contextual view shows the referenced element. While we found no difference between user performance, the split-view was preferred by almost all users to the standard single-view navigation model.","PeriodicalId":106339,"journal":{"name":"Proceedings of the 2015 ACM Symposium on Document Engineering","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116414729","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"High-Quality Capture of Documents on a Cluttered Tabletop with a 4K Video Camera","authors":"Chelhwon Kim, Patrick Chiu, Henry Tang","doi":"10.1145/2682571.2797074","DOIUrl":"https://doi.org/10.1145/2682571.2797074","url":null,"abstract":"We present a novel system for detecting and capturing paper documents on a tabletop using a 4K video camera mounted overhead on pan-tilt servos. Our automated system first finds paper documents on a cluttered tabletop based on a text probability map, and then takes a sequence of high-resolution frames of the located document to reconstruct a high quality and fronto-parallel document page image. The quality of the resulting images enables OCR processing on the whole page. We performed a preliminary evaluation on a small set of 10 document pages and our proposed system achieved 98% accuracy with the open source Tesseract OCR engine.","PeriodicalId":106339,"journal":{"name":"Proceedings of the 2015 ACM Symposium on Document Engineering","volume":"1646 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123372837","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multi-oriented Text Extraction from Information Graphics","authors":"Falk Böschen, A. Scherp","doi":"10.1145/2682571.2797092","DOIUrl":"https://doi.org/10.1145/2682571.2797092","url":null,"abstract":"Existing research on analyzing information graphics assume to have a perfect text detection and extraction available. However, text extraction from information graphics is far from solved. To fill this gap, we propose a novel processing pipeline for multi-oriented text extraction from infographics. The pipeline applies a combination of data mining and computer vision techniques to identify text elements, cluster them into text lines, compute their orientation, and uses a state-of-the-art open source OCR engine to perform the text recognition. We evaluate our method on 121 infographics extracted from an open access corpus of scientific publications. The results show that our approach is effective and significantly outperforms a state-of-the-art baseline.","PeriodicalId":106339,"journal":{"name":"Proceedings of the 2015 ACM Symposium on Document Engineering","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127108057","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}