Proceedings of the 2015 ACM Symposium on Document Engineering最新文献

筛选
英文 中文
MSoS: A Multi-Screen-Oriented Web Page Segmentation Approach 面向多屏幕的网页分割方法
Proceedings of the 2015 ACM Symposium on Document Engineering Pub Date : 2015-09-08 DOI: 10.1145/2682571.2797090
Mira Sarkis, C. Concolato, Jean-Claude Dufourd
{"title":"MSoS: A Multi-Screen-Oriented Web Page Segmentation Approach","authors":"Mira Sarkis, C. Concolato, Jean-Claude Dufourd","doi":"10.1145/2682571.2797090","DOIUrl":"https://doi.org/10.1145/2682571.2797090","url":null,"abstract":"In this paper we describe a multiscreen-oriented approach for segmenting web pages. The segmentation is an automatic and hybrid visual and structural method. It aims at creating coherent blocks which have different functions determined by the multiscreen environment. It is also characterized by a dynamic adaptation to the page content.Experiments are conducted on a set of existing applications that contain multimedia elements, in particular YouTube and video player pages. Results are compared with one segmentation method from the literature and with a ground truth manually created. With a 81% precision, the MSoS is a promising method that is capable of producing good segmentation results.","PeriodicalId":106339,"journal":{"name":"Proceedings of the 2015 ACM Symposium on Document Engineering","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123955372","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Proceedings of the 2015 ACM Symposium on Document Engineering 2015 ACM文献工程研讨会论文集
Proceedings of the 2015 ACM Symposium on Document Engineering Pub Date : 2015-09-08 DOI: 10.1145/2682571
C. Vanoirbeek, P. Genevès
{"title":"Proceedings of the 2015 ACM Symposium on Document Engineering","authors":"C. Vanoirbeek, P. Genevès","doi":"10.1145/2682571","DOIUrl":"https://doi.org/10.1145/2682571","url":null,"abstract":"It is our great pleasure to welcome you to the 2015 ACM Symposium on Document Engineering -- DocEng'15. This year's symposium both continues and innovates in its tradition of being the premier forum for presentation of research results and experience reports on leading edge issues of document engineering. The mission of the symposium is to share significant results, to evaluate novel approaches and models, and to identify promising directions for future research and development. DocEng gives researchers and practitioners a unique opportunity to share their perspectives with others interested in the various aspects of document engineering. Document engineering is a rapidly developing field that encompasses both traditional topics and also new ideas and challenges related to new technologies and to changes in the ways in which information is created, managed, and disseminated. \u0000 \u0000This year we issued a new call for papers centered on new hot topics around the notion of document that has evolved to encompass a broader vision of the field. We therefore took pains to include new program committee members to supplement the overall expertise around these topics. Our call for papers attracted submissions from 25 countries (Algeria, Australia, Austria, Belgium, Brazil, Canada, China, Denmark, Ecuador, Ethiopia, France, Germany, India, Italy, Japan, Netherlands, Portugal, Qatar, Russian Federation, Singapore, Spain, Switzerland, Tunisia, United Kingdom of Great Britain and Northern Ireland, United States of America). All papers were carefully reviewed by a minimum of three program committee members. The program committee accepted 11 of 31 reviewed full paper submissions (35%) and 18 of 51 reviewed short paper submissions (35%) for oral presentations, for a combined acceptance rate of 35%. A further 10 short paper submissions were accepted for poster presentations. This year's program includes two poster sessions during which attendees will be given the opportunity to interact with authors of short papers accepted for poster presentation. The most covered topics this year are analysis, layout, authoring, querying, transformation, validation, management and semantics of documents, as well as related algorithms. \u0000 \u0000We are happy to feature two keynote talks: \u0000Documents as Data, Data as Documents: what we learned about Semi-Structured Information for our Open World of Cloud & Devices, Jean Paoli (who is currently President at Microsoft Open Technologies, Inc.) \u0000The Venice Time Machine, Frederic Kaplan (who is currently professor at EPFL)","PeriodicalId":106339,"journal":{"name":"Proceedings of the 2015 ACM Symposium on Document Engineering","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131611477","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Change Classification in Graphics-Intensive Digital Documents 图形密集型数字文档中的变化分类
Proceedings of the 2015 ACM Symposium on Document Engineering Pub Date : 2015-09-08 DOI: 10.1145/2682571.2797079
Jeremy Svendsen, A. Albu
{"title":"Change Classification in Graphics-Intensive Digital Documents","authors":"Jeremy Svendsen, A. Albu","doi":"10.1145/2682571.2797079","DOIUrl":"https://doi.org/10.1145/2682571.2797079","url":null,"abstract":"This paper proposes an approach for the automatic detection and classification of changes occurring in images of documents with identical content, but generated with different software versions, or under different operating platforms. Our work is performed on a database of digitally-born business documents created using financial reporting tools. The proposed method involves a multi-stage process, where the end goal is to present to a human user the reports which have changed and the changes which were detected. Our main contribution is related to matching and comparing of graphical document elements. This paper focuses on detection of local, translation-based changes. Future work will explore other local changes involving size, color, and rotation.","PeriodicalId":106339,"journal":{"name":"Proceedings of the 2015 ACM Symposium on Document Engineering","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123908151","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
VEDD: A Visual Editor for Creation and Semi-Automatic Update of Derived Documents VEDD:用于创建和半自动更新派生文档的可视化编辑器
Proceedings of the 2015 ACM Symposium on Document Engineering Pub Date : 2015-09-08 DOI: 10.1145/2682571.2797075
K. Marriott, Mingzheng Shi, Michael Wybrow
{"title":"VEDD: A Visual Editor for Creation and Semi-Automatic Update of Derived Documents","authors":"K. Marriott, Mingzheng Shi, Michael Wybrow","doi":"10.1145/2682571.2797075","DOIUrl":"https://doi.org/10.1145/2682571.2797075","url":null,"abstract":"Document content is increasingly customised to a particular audience. Such customised documents are typically built by combining content from selected logical content modules and then editing this to create the custom document. A major difficulty is how to efficiently update these derived documents when the source documents are changed. Here we describe a web-based visual editing tool for both creating and semi-automatically updating derived documents from modules in a source library.","PeriodicalId":106339,"journal":{"name":"Proceedings of the 2015 ACM Symposium on Document Engineering","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114144760","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Automatic Text Document Summarization Based on Machine Learning 基于机器学习的文本文档自动摘要
Proceedings of the 2015 ACM Symposium on Document Engineering Pub Date : 2015-09-08 DOI: 10.1145/2682571.2797099
G. Silva, Rafael Ferreira, R. Lins, L. Cabral, Hilário Oliveira, S. Simske, M. Riss
{"title":"Automatic Text Document Summarization Based on Machine Learning","authors":"G. Silva, Rafael Ferreira, R. Lins, L. Cabral, Hilário Oliveira, S. Simske, M. Riss","doi":"10.1145/2682571.2797099","DOIUrl":"https://doi.org/10.1145/2682571.2797099","url":null,"abstract":"The need for automatic generation of summaries gained importance with the unprecedented volume of information available in the Internet. Automatic systems based on extractive summarization techniques select the most significant sentences of one or more texts to generate a summary. This article makes use of Machine Learning techniques to assess the quality of the twenty most referenced strategies used in extractive summarization, integrating them in a tool. Quantitative and qualitative aspects were considered in such assessment demonstrating the validity of the proposed scheme. The experiments were performed on the CNN-corpus, possibly the largest and most suitable test corpus today for benchmarking extractive summarization strategies.","PeriodicalId":106339,"journal":{"name":"Proceedings of the 2015 ACM Symposium on Document Engineering","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125325361","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Searching Live Meeting Documents "Show me the Action" 搜索实时会议文档“Show me the Action”
Proceedings of the 2015 ACM Symposium on Document Engineering Pub Date : 2015-09-08 DOI: 10.1145/2682571.2797082
Laurent Denoue, S. Carter, Matthew L. Cooper
{"title":"Searching Live Meeting Documents \"Show me the Action\"","authors":"Laurent Denoue, S. Carter, Matthew L. Cooper","doi":"10.1145/2682571.2797082","DOIUrl":"https://doi.org/10.1145/2682571.2797082","url":null,"abstract":"Live meeting documents require different techniques for effectively retrieving important pieces of information. During live meetings, people share web sites, edit presentation slides, and share code editors. A simple approach is to index with Optical Character Recognition (OCR) the video frames, or key-frames, being shared and let user retrieve them. Here we show that a more useful approach is to look at what actions users take inside the live document streams. Based on observations of real meetings, we focus on two important signals: text editing and mouse cursor motion. We describe the detection of text and cursor motion, their implementation in our WebRTC (Web Real-Time Communication)-based system, and how users are better able to search live documents during a meeting based on these extracted actions.","PeriodicalId":106339,"journal":{"name":"Proceedings of the 2015 ACM Symposium on Document Engineering","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121848485","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Document Engineering Issues in Document Analysis 文档分析中的文档工程问题
Proceedings of the 2015 ACM Symposium on Document Engineering Pub Date : 2015-09-08 DOI: 10.1145/2682571.2801033
Charles K. Nicholas, Robert Brandon
{"title":"Document Engineering Issues in Document Analysis","authors":"Charles K. Nicholas, Robert Brandon","doi":"10.1145/2682571.2801033","DOIUrl":"https://doi.org/10.1145/2682571.2801033","url":null,"abstract":"We present an overview of the field of malware analysis with emphasis on issues related to document engineering. We will introduce the field with a discussion of the types of malware, including executable binaries, polymorphic malware, malicious PDFs, and exploit kits. We will conclude with our view of important research questions in the field.","PeriodicalId":106339,"journal":{"name":"Proceedings of the 2015 ACM Symposium on Document Engineering","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114954946","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Delaunay Document Layout Descriptor Delaunay文档布局描述符
Proceedings of the 2015 ACM Symposium on Document Engineering Pub Date : 2015-09-08 DOI: 10.1145/2682571.2797059
Sébastien Eskenazi, Petra Gomez-Krämer, J. Ogier
{"title":"The Delaunay Document Layout Descriptor","authors":"Sébastien Eskenazi, Petra Gomez-Krämer, J. Ogier","doi":"10.1145/2682571.2797059","DOIUrl":"https://doi.org/10.1145/2682571.2797059","url":null,"abstract":"Security applications related to document authentication require an exact match between an authentic copy and the original of a document. This implies that the documents analysis algorithms that are used to compare two documents (original and copy) should provide the same output. This kind of algorithm includes the computation of layout descriptors from the segmentation result, as the layout of a document is a part of its semantic content. To this end, this paper presents a new layout descriptor that significantly improves the state of the art. The basic of this descriptor is the use of a Delaunay triangulation of the centroids of the document regions. This triangulation is seen as a graph and the adjacency matrix of the graph forms the descriptor. While most layout descriptors have a stability of 0% with regard to an exact match, our descriptor has a stability of 74% which can be brought up to 100% with the use of an appropriate matching algorithm. It also achieves 100% accuracy and retrieval in a document retrieval scheme on a database of 960 document images. Furthermore, this descriptor is extremely efficient as it performs a search in constant time with respect to the size of the document database and it reduces the size of the index of the database by a factor 400.","PeriodicalId":106339,"journal":{"name":"Proceedings of the 2015 ACM Symposium on Document Engineering","volume":"76 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129654215","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Automatic Extraction of Figures from Scholarly Documents 从学术文献中自动提取数字
Proceedings of the 2015 ACM Symposium on Document Engineering Pub Date : 2015-09-08 DOI: 10.1145/2682571.2797085
Sagnik Ray Choudhury, P. Mitra, C. Lee Giles
{"title":"Automatic Extraction of Figures from Scholarly Documents","authors":"Sagnik Ray Choudhury, P. Mitra, C. Lee Giles","doi":"10.1145/2682571.2797085","DOIUrl":"https://doi.org/10.1145/2682571.2797085","url":null,"abstract":"Scholarly papers (journal and conference papers, technical reports, etc.) usually contain multiple ``figures'' such as plots, flow charts and other images which are generated manually to symbolically represent and illustrate visually important concepts, findings and results. These figures can be analyzed for automated data extraction or semantic analysis. Surprisingly, large scale automated extraction of such figures from PDF documents has received little attention. Here we discuss the challenges of how to build a heuristic independent trainable model for such an extraction task and how to extract figures at scale. Motivated by recent developments in table extraction, we define three new evaluation metrics: figure-precision, figure-recall, and figure-F1-score. Our dataset consists of a sample of 200 PDFs, randomly collected from five million scholarly PDFs and manually tagged for 180 figure locations. Initial results from our work demonstrate an accuracy greater than 80%.","PeriodicalId":106339,"journal":{"name":"Proceedings of the 2015 ACM Symposium on Document Engineering","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130008257","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
Multimedia Document Structure for Distributed Theatre 分布式影院的多媒体文档结构
Proceedings of the 2015 ACM Symposium on Document Engineering Pub Date : 2015-09-08 DOI: 10.1145/2682571.2797087
Jack Jansen, Michael Frantzis, Pablo César
{"title":"Multimedia Document Structure for Distributed Theatre","authors":"Jack Jansen, Michael Frantzis, Pablo César","doi":"10.1145/2682571.2797087","DOIUrl":"https://doi.org/10.1145/2682571.2797087","url":null,"abstract":"This paper explores the suitability of structured (and declarative) multimedia document formats for supporting a novel type of performing arts: distributed theatre. In distributed theatre, the actors are split between two (or more) locations, but together deliver a single performance mediated by the cameras, the internet, and projection technologies. Based on our efforts to make an actual distributed theatre production happen (the Tempest by Miracle Theatre), this paper reflects on our experience. Our findings are divided into two main areas: workflow and document structure. We conclude that novel types of video-mediated applications, like distributed theatre, require new manners of authoring documents. Moreover, specific extensions to existing document formats are needed in order to accommodate the new requirements imposed by such kind of applications.","PeriodicalId":106339,"journal":{"name":"Proceedings of the 2015 ACM Symposium on Document Engineering","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127625641","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信