{"title":"TEXUS: A Task-based Approach for Table Extraction and Understanding","authors":"Roya Rastan, Hye-young Paik, J. Shepherd","doi":"10.1145/2682571.2797069","DOIUrl":"https://doi.org/10.1145/2682571.2797069","url":null,"abstract":"In this paper, we propose a precise, comprehensive model of table processing which aims to remedy some of the problems in the discussion of table processing in the literature. The model targets application-independent, end-to-end table processing, and thus encompasses a large subset of the work in the area. The model can be used to aid the design of table processing systems (We provide an example of such a system), can be considered as a reference framework for evaluating the performance of table processing systems, and can assist in clarifying terminological differences in the table processing literature.","PeriodicalId":106339,"journal":{"name":"Proceedings of the 2015 ACM Symposium on Document Engineering","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127092623","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Interlinking English and Chinese RDF Data Using BabelNet","authors":"Tatiana Lesnikova, Jérôme David, J. Euzenat","doi":"10.1145/2682571.2797089","DOIUrl":"https://doi.org/10.1145/2682571.2797089","url":null,"abstract":"Linked data technologies make it possible to publish and link structured data on the Web. Although RDF is not about text, many RDF data providers publish their data in their own language. Cross-lingual interlinking aims at discovering links between identical resources across knowledge bases in different languages. In this paper, we present a method for interlinking RDF resources described in English and Chinese using the BabelNet multilingual lexicon. Resources are represented as vectors of identifiers and then similarity between these resources is computed. The method achieves an F-measure of 88%. The results are also compared to a translation-based method.","PeriodicalId":106339,"journal":{"name":"Proceedings of the 2015 ACM Symposium on Document Engineering","volume":"219 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132725452","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Creating eBooks with Accessible Graphics Content","authors":"Cagatay Goncu, K. Marriott","doi":"10.1145/2682571.2797076","DOIUrl":"https://doi.org/10.1145/2682571.2797076","url":null,"abstract":"We present a new model for presenting graphics in eBooks to blind readers. It is based on the GraViewer app which allows an accessible graphic embedded in an iBook to be explored on an iPad using speech and non-speech audio feedback. We also introduce a web-based tool, GraAuthor, for creating such accessible graphics and describe the workflow for including these in an iBook. Unlike previous approaches our model provides an integrated digital presentation of both text and graphics and allows the general public to create accessible graphics.","PeriodicalId":106339,"journal":{"name":"Proceedings of the 2015 ACM Symposium on Document Engineering","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115758238","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Detecting XSLT Rules Affected by Schema Evolution","authors":"Yang Wu, Nobutaka Suzuki","doi":"10.1145/2682571.2797086","DOIUrl":"https://doi.org/10.1145/2682571.2797086","url":null,"abstract":"In general, schemas of XML documents are continuously updated according to changes in the real world. If a schema is updated, then XSLT stylesheets are also affected by the schema update. To maintain the consistencies of XSLT stylesheets with updated schemas, we have to detect the XSLT rules affected by schema updates. However, detecting such XSLT rules manually is a difficult and time-consuming task, since recent DTDs and XSLT stylesheets are becoming more complex and users do not always fully understand the dependencies between XSLT stylesheets and DTDs. In this paper, we consider three subclasses based on unranked tree transducer, and consider an algorithm for detecting XSLT rules affected by a DTD update for the classes.","PeriodicalId":106339,"journal":{"name":"Proceedings of the 2015 ACM Symposium on Document Engineering","volume":"100 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115799387","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Session details: Information Summarized","authors":"D. Brailsford","doi":"10.1145/3256803","DOIUrl":"https://doi.org/10.1145/3256803","url":null,"abstract":"","PeriodicalId":106339,"journal":{"name":"Proceedings of the 2015 ACM Symposium on Document Engineering","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116958419","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Session details: Logical Structures","authors":"E. Munson","doi":"10.1145/3256807","DOIUrl":"https://doi.org/10.1145/3256807","url":null,"abstract":"","PeriodicalId":106339,"journal":{"name":"Proceedings of the 2015 ACM Symposium on Document Engineering","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121919904","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"What Is This Thing Called Linked Data?","authors":"Manuel Atencia, Jérôme David, P. Genoud","doi":"10.1145/2682571.2801035","DOIUrl":"https://doi.org/10.1145/2682571.2801035","url":null,"abstract":"The Linked Data initiative has made it possible for the web to evolve from being a global information space in which only documents are linked to one in which both documents and data are linked: a web of documents and data. This tutorial aims to give an overview of the principles, models and technologies underlying Linked Data.","PeriodicalId":106339,"journal":{"name":"Proceedings of the 2015 ACM Symposium on Document Engineering","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130917523","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jie Mei, Xinxin Kou, Zhimin Yao, A. Rau-Chaplin, Aminul Islam, A. Mohammad, E. Milios
{"title":"Efficient Computation of Co-occurrence Based Word Relatedness","authors":"Jie Mei, Xinxin Kou, Zhimin Yao, A. Rau-Chaplin, Aminul Islam, A. Mohammad, E. Milios","doi":"10.1145/2682571.2797088","DOIUrl":"https://doi.org/10.1145/2682571.2797088","url":null,"abstract":"Measuring document relatedness using unsupervised co-occurrence based word relatedness methods is a processing-time and memory consuming task. This paper introduces the application of compact data structures for efficient computation of word relatedness based on corpus statistics. The data structure is used to efficiently lookup: (1) the corpus statistics for the Common Word Relatedness Approach, (2) the pairwise word relatedness for the Algorithm Specific Word Relatedness Approach. These two approaches significantly accelerate the processing time of word relatedness methods and reduce the space cost of storing co-occurrence statistics in memory, making text mining tasks like classification and clustering based on word relatedness practical.","PeriodicalId":106339,"journal":{"name":"Proceedings of the 2015 ACM Symposium on Document Engineering","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128091425","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Proceedings of the 2015 ACM Symposium on Document Engineering","authors":"C. Vanoirbeek, P. Genevès","doi":"10.1145/2682571","DOIUrl":"https://doi.org/10.1145/2682571","url":null,"abstract":"It is our great pleasure to welcome you to the 2015 ACM Symposium on Document Engineering -- DocEng'15. This year's symposium both continues and innovates in its tradition of being the premier forum for presentation of research results and experience reports on leading edge issues of document engineering. The mission of the symposium is to share significant results, to evaluate novel approaches and models, and to identify promising directions for future research and development. DocEng gives researchers and practitioners a unique opportunity to share their perspectives with others interested in the various aspects of document engineering. Document engineering is a rapidly developing field that encompasses both traditional topics and also new ideas and challenges related to new technologies and to changes in the ways in which information is created, managed, and disseminated. \u0000 \u0000This year we issued a new call for papers centered on new hot topics around the notion of document that has evolved to encompass a broader vision of the field. We therefore took pains to include new program committee members to supplement the overall expertise around these topics. Our call for papers attracted submissions from 25 countries (Algeria, Australia, Austria, Belgium, Brazil, Canada, China, Denmark, Ecuador, Ethiopia, France, Germany, India, Italy, Japan, Netherlands, Portugal, Qatar, Russian Federation, Singapore, Spain, Switzerland, Tunisia, United Kingdom of Great Britain and Northern Ireland, United States of America). All papers were carefully reviewed by a minimum of three program committee members. The program committee accepted 11 of 31 reviewed full paper submissions (35%) and 18 of 51 reviewed short paper submissions (35%) for oral presentations, for a combined acceptance rate of 35%. A further 10 short paper submissions were accepted for poster presentations. This year's program includes two poster sessions during which attendees will be given the opportunity to interact with authors of short papers accepted for poster presentation. The most covered topics this year are analysis, layout, authoring, querying, transformation, validation, management and semantics of documents, as well as related algorithms. \u0000 \u0000We are happy to feature two keynote talks: \u0000Documents as Data, Data as Documents: what we learned about Semi-Structured Information for our Open World of Cloud & Devices, Jean Paoli (who is currently President at Microsoft Open Technologies, Inc.) \u0000The Venice Time Machine, Frederic Kaplan (who is currently professor at EPFL)","PeriodicalId":106339,"journal":{"name":"Proceedings of the 2015 ACM Symposium on Document Engineering","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131611477","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chen Liang, Shuting Wang, Zhaohui Wu, Kyle Williams, B. Pursel, Benjamin Bräutigam, Sherwyn Saul, Hannah Williams, Kyle Bowen, C. Lee Giles
{"title":"BBookX: An Automatic Book Creation Framework","authors":"Chen Liang, Shuting Wang, Zhaohui Wu, Kyle Williams, B. Pursel, Benjamin Bräutigam, Sherwyn Saul, Hannah Williams, Kyle Bowen, C. Lee Giles","doi":"10.1145/2682571.2797094","DOIUrl":"https://doi.org/10.1145/2682571.2797094","url":null,"abstract":"As more educational resources become available online, it is possible to acquire more up-to-date knowledge and information. We propose BBookX, a novel computer facilitated system that automatically and collaboratively builds free open online books using publicly available educational resources such as Wikipedia. BBookX has two separate components: one creates an open version of existing books by linking different book chapters to Wikipedia articles, while another with an interactive user interface supports interactive real-time book creation where users are allowed to modify a generated book from explicit feedback.","PeriodicalId":106339,"journal":{"name":"Proceedings of the 2015 ACM Symposium on Document Engineering","volume":"289 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117292670","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}