Proceedings of the 2015 ACM Symposium on Document Engineering最新文献

筛选
英文 中文
The Browser as a Document Composition Engine 浏览器作为文档组合引擎
Proceedings of the 2015 ACM Symposium on Document Engineering Pub Date : 2015-09-08 DOI: 10.1145/2682571.2797063
Tamir Hassan, N. Venkata
{"title":"The Browser as a Document Composition Engine","authors":"Tamir Hassan, N. Venkata","doi":"10.1145/2682571.2797063","DOIUrl":"https://doi.org/10.1145/2682571.2797063","url":null,"abstract":"Printing has long been a neglected aspect of the Web, and the print function of browsers, when used on documents designed for on-screen consumption, often leads to a poor result. Whereas print CSS goes some way towards optimizing the paper experience, it still does not enable full control over the page layout, which is necessary to obtain a publication-quality print result. Furthermore, its use requires web authors to invest additional resources for a feature that might only be used infrequently. This paper introduces a framework designed to alleviate these issues and improve the print experience on the Web. We describe the technologies that enable us to automatically compose and optimize the layout of a document, and generate a high quality PDF fully within the browser. This functionality can be offered to web publishers in the form of a print button, enabling content to be simultaneously delivered in screen and print formats, and ensuring a publication-quality result that adheres to the publisher's design guidelines.","PeriodicalId":106339,"journal":{"name":"Proceedings of the 2015 ACM Symposium on Document Engineering","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127059559","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Concept Hierarchy Extraction from Textbooks 从教科书中提取概念层次
Proceedings of the 2015 ACM Symposium on Document Engineering Pub Date : 2015-09-08 DOI: 10.1145/2682571.2797062
Shuting Wang, Chen Liang, Zhaohui Wu, Kyle Williams, B. Pursel, Benjamin Bräutigam, Sherwyn Saul, Hannah Williams, Kyle Bowen, C. Lee Giles
{"title":"Concept Hierarchy Extraction from Textbooks","authors":"Shuting Wang, Chen Liang, Zhaohui Wu, Kyle Williams, B. Pursel, Benjamin Bräutigam, Sherwyn Saul, Hannah Williams, Kyle Bowen, C. Lee Giles","doi":"10.1145/2682571.2797062","DOIUrl":"https://doi.org/10.1145/2682571.2797062","url":null,"abstract":"Concept hierarchies have been useful tools for presenting and organizing knowledge. With the rapid growth in the number of online knowledge resources, automatic concept hierarchy extraction is increasingly attractive. Here, we focus on concept extraction from textbooks based on the knowledge in Wikipedia. Given a book, we extract important concepts in each book chapter using Wikipedia as a resource and from this construct a concept hierarchy for that book. We define local and global features that capture both the local relatedness and global coherence embedded in that textbook. In order to evaluate the proposed features and extracted concept hierarchies, we manually construct concept hierarchies for three well used textbooks by labeling important concepts for each book chapter. Experiments show that our proposed local and global features achieve better performance than using only keyphrases to construct the concept hierarchies. Moreover, we observe that incorporating global features can improve the concept ranking precision and reaffirms the global coherence in the book.","PeriodicalId":106339,"journal":{"name":"Proceedings of the 2015 ACM Symposium on Document Engineering","volume":"100 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127117383","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 51
Developing Web Applications with Document Engineering Technologies and Enjoying It! 使用文档工程技术开发Web应用程序并享受它!
Proceedings of the 2015 ACM Symposium on Document Engineering Pub Date : 2015-09-08 DOI: 10.1145/2682571.2801034
S. Sire
{"title":"Developing Web Applications with Document Engineering Technologies and Enjoying It!","authors":"S. Sire","doi":"10.1145/2682571.2801034","DOIUrl":"https://doi.org/10.1145/2682571.2801034","url":null,"abstract":"This tutorial proposes a practical software development method for building web applications using the XQuery and XSLT languages for manipulating semi-structured data. This method captures solutions and practices that we have applied during the last 4 years into many projects. It can be used on any XML database, as it requires only a thin layer to analyze and route incoming HTTP requests to a simple pipeline rendering the page. We will demonstrate it with a real world example developed with eXist-DB and the Oppidum lightweight XQuery framework.","PeriodicalId":106339,"journal":{"name":"Proceedings of the 2015 ACM Symposium on Document Engineering","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126178457","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Automatic Document Classification using Summarization Strategies 使用摘要策略的自动文档分类
Proceedings of the 2015 ACM Symposium on Document Engineering Pub Date : 2015-09-08 DOI: 10.1145/2682571.2797077
Rafael Ferreira, R. Lins, L. Cabral, F. Freitas, S. Simske, M. Riss
{"title":"Automatic Document Classification using Summarization Strategies","authors":"Rafael Ferreira, R. Lins, L. Cabral, F. Freitas, S. Simske, M. Riss","doi":"10.1145/2682571.2797077","DOIUrl":"https://doi.org/10.1145/2682571.2797077","url":null,"abstract":"An efficient way to automatically classify documents may be provided by automatic text summarization, the task of creating a shorter text from one or several documents. This paper presents an assessment of the 15 most widely used methods for automatic text summarization from the text classification perspective. A naive Bayes classifier was used showing that some of the methods tested are better suited for such a task.","PeriodicalId":106339,"journal":{"name":"Proceedings of the 2015 ACM Symposium on Document Engineering","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129382546","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
The Venice Time Machine 威尼斯时光机
Proceedings of the 2015 ACM Symposium on Document Engineering Pub Date : 2015-09-08 DOI: 10.1145/2682571.2797071
F. Kaplan
{"title":"The Venice Time Machine","authors":"F. Kaplan","doi":"10.1145/2682571.2797071","DOIUrl":"https://doi.org/10.1145/2682571.2797071","url":null,"abstract":"The Venice Time Machine is an international scientific programme launched by the EPFL and the University Ca'Foscari of Venice with the generous support of the Fondation Lombard Odier. It aims at building a multidimensional model of Venice and its evolution covering a period of more than 1000 years. The project ambitions to reconstruct a large open access database that could be used for research and education. Thanks to a parternship with the Archivio di Stato in Venice, kilometers of archives are currently digitized, transcribed and indexed setting the base of the largest database ever created on Venetian documents. The State Archives of Venice contain a massive amount of hand-written documentation in languages evolving from medieval times to the 20th century. An estimated 80 km of shelves are filled with over a thousand years of administrative documents, from birth registrations, death certificates and tax statements, all the way to maps and urban planning designs. These documents are often very delicate and are occasionally in a fragile state of conservation. In complementary to these primary sources, the content of thousands of monographies have been indexed and made searchable. The documents digitised in the Venice Time Machine programme are intricately interweaved, telling a much richer story when they are cross-referenced. By combining this mass of information, it is possible to reconstruct large segments of the city's past: complete biographies, political dynamics, or even the appearance of buildings and entire neighborhoods. The information extracted from the primary and secondary sources are organized in a semantic graph of linked data and unfolded in space and time in an historical geographical information system. The resulting platform can serve for both research and education. About a hundred researchers and students collaborate already on this programme. A doctoral school is organised every year in Venice and several bachelor and master courses currently use the data produced in the context of the Venice Time Machine. Through all these initiatives, the Venice Time Machine explores how \"big data of the past\" can change research and education in historical sciences, hopefully paving the way towards a general methodology that could be applied to many other cities and archives.","PeriodicalId":106339,"journal":{"name":"Proceedings of the 2015 ACM Symposium on Document Engineering","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123164619","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Session details: Documents Made Accessible 会话详细信息:文档可访问
Proceedings of the 2015 ACM Symposium on Document Engineering Pub Date : 2015-09-08 DOI: 10.1145/3256805
M. Hardy
{"title":"Session details: Documents Made Accessible","authors":"M. Hardy","doi":"10.1145/3256805","DOIUrl":"https://doi.org/10.1145/3256805","url":null,"abstract":"","PeriodicalId":106339,"journal":{"name":"Proceedings of the 2015 ACM Symposium on Document Engineering","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124211447","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Segmentation of Overlapping Digits through the Emulation of a Hypothetical Ball and Physical Forces 通过模拟一个假想的球和物理力来分割重叠的数字
Proceedings of the 2015 ACM Symposium on Document Engineering Pub Date : 2015-09-08 DOI: 10.1145/2682571.2797080
Alberto N. G. Lopes Filho, C. Mello
{"title":"Segmentation of Overlapping Digits through the Emulation of a Hypothetical Ball and Physical Forces","authors":"Alberto N. G. Lopes Filho, C. Mello","doi":"10.1145/2682571.2797080","DOIUrl":"https://doi.org/10.1145/2682571.2797080","url":null,"abstract":"This paper presents an algorithm for segmenting pairs of overlapping handwritten digits. Digits can be found overlapped in text depending on writing style and organization; digits in close proximity or with elongated strokes may also overlap with their neighbors. Applications such as automated character recognition are directly affected by overlapping characters and their segmentation. The proposed approach is based on the emulation of inertia and a deformable hypothetical ball. The strokes act as a pathway for the ball to run and create the segmentation. The results of the algorithm are subject to a digit recognizer and it is shown that the method performs well and presents lower computational cost when compared to other segmentation approaches.","PeriodicalId":106339,"journal":{"name":"Proceedings of the 2015 ACM Symposium on Document Engineering","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116101606","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Filling the Gaps: Improving Wikipedia Stubs 填补空白:改进维基百科存根
Proceedings of the 2015 ACM Symposium on Document Engineering Pub Date : 2015-09-08 DOI: 10.1145/2682571.2797073
Siddhartha Banerjee, P. Mitra
{"title":"Filling the Gaps: Improving Wikipedia Stubs","authors":"Siddhartha Banerjee, P. Mitra","doi":"10.1145/2682571.2797073","DOIUrl":"https://doi.org/10.1145/2682571.2797073","url":null,"abstract":"The availability of only a limited number of contributors on Wikipedia cannot ensure consistent growth and improvement of the online encyclopedia. With information being scattered on the web, our goal is to automate the process of generation of content for Wikipedia. In this work, we propose a technique of improving stubs on Wikipedia that do not contain comprehensive information. A classifier learns features from the existing comprehensive articles on Wikipedia and recommends content that can be added to the stubs to improve the completeness of such stubs. We conduct experiments using several classifiers - Latent Dirichlet Allocation (LDA) based model, a deep learning based architecture (Deep belief network) and TFIDF based classifier. Our experiments reveal that the LDA based model outperforms the other models (~6% F-score). Our generation approach shows that this technique is capable of generating comprehensive articles. ROUGE-2 scores of the articles generated by our system outperform the articles generated using the baseline. Content generated by our system has been appended to several stubs and successfully retained in Wikipedia.","PeriodicalId":106339,"journal":{"name":"Proceedings of the 2015 ACM Symposium on Document Engineering","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121545084","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Combining Advanced Information Retrieval and Text-Mining for Digital Humanities 数字人文学科高级信息检索与文本挖掘的结合
Proceedings of the 2015 ACM Symposium on Document Engineering Pub Date : 2015-09-08 DOI: 10.1145/2682571.2797067
Antoine Widlöcher, Nicolas Béchet, Jean-Marc Lecarpentier, Yann Mathet, Julia Roger
{"title":"Combining Advanced Information Retrieval and Text-Mining for Digital Humanities","authors":"Antoine Widlöcher, Nicolas Béchet, Jean-Marc Lecarpentier, Yann Mathet, Julia Roger","doi":"10.1145/2682571.2797067","DOIUrl":"https://doi.org/10.1145/2682571.2797067","url":null,"abstract":"Digital Humanities make more and more structured and richly annotated corpora available. Most of this data rely on well known and established standards, such as TEI, which especially enable scientists to edit and publish their work. However, one of the remaining problems is to give adequate access to this rich data, in order to produce higher-order knowledge. In this paper, we present an integrated environment combining an advanced search engine and text-mining techniques for hermeneutics in Digital Humanities. Relying on semantic web technologies, the search engine uses full text as well as complex embedding structures and offers a single interface to access rich and heterogeneous data and meta-data. Text-mining possibilities enable scholars to exhibit regularities in corpora. Results obtained on the Cartesian corpus illustrate these principles and tools.","PeriodicalId":106339,"journal":{"name":"Proceedings of the 2015 ACM Symposium on Document Engineering","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131647562","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Enhancing Exploration with a Faceted Browser through Summarization 通过摘要增强对分面浏览器的探索
Proceedings of the 2015 ACM Symposium on Document Engineering Pub Date : 2015-09-08 DOI: 10.1145/2682571.2797083
Grzegorz Drzadzewski, Frank Wm. Tompa
{"title":"Enhancing Exploration with a Faceted Browser through Summarization","authors":"Grzegorz Drzadzewski, Frank Wm. Tompa","doi":"10.1145/2682571.2797083","DOIUrl":"https://doi.org/10.1145/2682571.2797083","url":null,"abstract":"An enhanced faceted browsing system has been developed to support users' exploration of large multi-tagged document collections. It provides summary measures of document result sets at each step of navigation through a set of representative terms and a diverse set of documents. These summaries are derived from pre-materialized views that allow for quick calculation of centroids for various result sets. The utility and efficiency of the system is demonstrated on the New York Times Annotated Corpus.","PeriodicalId":106339,"journal":{"name":"Proceedings of the 2015 ACM Symposium on Document Engineering","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129221881","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信