{"title":"Document Engineering Issues in Malware Analysis","authors":"Charles K. Nicholas","doi":"10.1145/3103010.3103027","DOIUrl":"https://doi.org/10.1145/3103010.3103027","url":null,"abstract":"We present an overview of the field of malware analysis with emphasis on issues related to document engineering. We will introduce the field with a discussion of the types of malware, including executable binaries, malicious PDFs, polymorphic malware, ransomware, and exploit kits. We will conclude with our view of important research questions in the field. This is an updated version of last year's tutorial, with more information about web-based malware and malware targeting the Android market.","PeriodicalId":200469,"journal":{"name":"Proceedings of the 2017 ACM Symposium on Document Engineering","volume":"120 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131788018","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"qqmbr and indentml: Extensible Mathematical Publishing for Web and Paper","authors":"I. Schurov","doi":"10.1145/3103010.3121031","DOIUrl":"https://doi.org/10.1145/3103010.3121031","url":null,"abstract":"We present qqmbr, novel publishing system aimed at preparation of high-quality mathematical publications. One source can be converted to a single interactive webpage, multi-page website or PDF (via LaTeX). The markup language behind qqmbr entitled indentml is designed to be both human-readable and machine-readable (easily parsable). It is possible to extend basic qqmbr markup with custom tags that enrich its semantics and build plugins and applications that query qqmbr documents, extract information from them and process it in an arbitrary way without much effort.","PeriodicalId":200469,"journal":{"name":"Proceedings of the 2017 ACM Symposium on Document Engineering","volume":"70 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113991155","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"NuSys: Towards a Document IDE for Knowledge Work","authors":"P. Eichmann, Trent Green, R. Zeleznik, A. V. Dam","doi":"10.1145/3103010.3121045","DOIUrl":"https://doi.org/10.1145/3103010.3121045","url":null,"abstract":"Knowledge workers consume and annotate digital documents such as PDF files, videos, images and text notes - in some cases collaboratively - to form mental models and gain insight. An abundance of software solutions and utilities that were designed to assist users in stages of this process but not in the process as a whole, which makes knowledge work with documents unnecessarily inefficient. In this paper, we introduce ideas on how to streamline common knowledge worker tasks, such as collaboratively searching, gathering and freely arranging fragments of various media documents to gain understanding and then transforming emergent insights into interactive structured visualizations. Furthermore, we present NuSys, an integrated development environment (IDE) specialized for document-centric workflows, that implements the core of these ideas.","PeriodicalId":200469,"journal":{"name":"Proceedings of the 2017 ACM Symposium on Document Engineering","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127412739","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Maintaining Integrity and Non-Repudiation in Secure Offline Documents","authors":"Ahmed S. Shatnawi, E. Munson, C. Thao","doi":"10.1145/3103010.3121038","DOIUrl":"https://doi.org/10.1145/3103010.3121038","url":null,"abstract":"Securing sensitive digital documents (such as health records, legal reports, government documents, and financial assets) is a critical and challenging task. Unreliable Internet connections, viruses, and compromised file storage systems impose a significant risk on such documents and can compromise their integrity especially when shared across domains while they are shared in offline fashion. In this paper, we present a new framework for maintaining integrity in offline documents and provide a non-repudiation security feature without relying on a central repository of certificates. This framework has been implemented as a plug-in for the Microsoft Word application. It is portable because the plug-in is attached to the document itself and it is scalable because there are no fixed limits on the numbers of users who can collaborate in producing the document. Our framework provides integrity and non-repudiation guarantees for each change in the document's version history.","PeriodicalId":200469,"journal":{"name":"Proceedings of the 2017 ACM Symposium on Document Engineering","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115330579","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Authenticity in a Digital Era: Still a Document Process: The Case of Laboratory Notebooks","authors":"L. Tosi, Aurélien Bénel","doi":"10.1145/3103010.3121034","DOIUrl":"https://doi.org/10.1145/3103010.3121034","url":null,"abstract":"Asymmetric cryptography brings the ability for anyone on earth to check the signature of a digital object (Diffie & Hellman, 1976). From that perspective, trusted timestamping of a digital object provides very strong evidence of its author or inventor and integrity (Haber, 1991). 26 years later, one might have expected that trusted timestamping would have long ago replaced traditional paper laboratory notebooks, which has not happened yet. In this paper, we argue that the reason is that authenticity is a document process: while trusted timestamping remains a necessary part of the process, a digital object must be involved in a sociotechnical process in order to become a document. We first point out the gap, intractable with paper, between the strict administrative workflow required to create strong evidence, and the fluidity of collaborative authoring needed for creativity. This gap is relevant to laboratory notebooks, as they are commonly used by inventors to attest that they discovered elements at a specific time, in a specific context. Then we explain the design and implementation of our software system, according to document theory (Buckland, 1997), in order to reinvent the whole process to minimize the administrative burden, while preserving its well-known and valuable properties.","PeriodicalId":200469,"journal":{"name":"Proceedings of the 2017 ACM Symposium on Document Engineering","volume":"718 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116128040","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Tool for Mixing XML Annotations","authors":"B. Gaiffe","doi":"10.1145/3103010.3121028","DOIUrl":"https://doi.org/10.1145/3103010.3121028","url":null,"abstract":"XML documents, in particular critical editions are usually very heavily annotated. They usually represent abbreviations, variant readings, edition operations etc. Among such documents, only a part of the character contents of the file is the actual edition of the text. Very often, one wants to run automatic tools on this \"simple\" text and thereafter re-embed the result into the original file. The tool we present here is dedicated to this embedding of annotations. In order to achieve this, the tool sets the problem as an ambiguous input and parses that ambiguous input by the grammar of the XML language. It then proposes those solutions that are syntactically correct. In case there are none, the input is modified and reparsed until at least one solution is found.","PeriodicalId":200469,"journal":{"name":"Proceedings of the 2017 ACM Symposium on Document Engineering","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129268222","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kenichi Iwatsuki, T. Sagara, T. Hara, Akiko Aizawa
{"title":"Detecting In-line Mathematical Expressions in Scientific Documents","authors":"Kenichi Iwatsuki, T. Sagara, T. Hara, Akiko Aizawa","doi":"10.1145/3103010.3121041","DOIUrl":"https://doi.org/10.1145/3103010.3121041","url":null,"abstract":"One of the issues in extracting natural language sentences from PDF documents is the identification of non-textual elements in a sentence. In this paper, we report our preliminary results on the identification of in-line mathematical expressions. We first construct a manually annotated corpus and apply conditional random field (CRF) for the math-zone identification using both layout features, such as font types, and linguistic features, such as context n-grams, obtained from PDF documents. Although our method is naive and uses a small amount of annotated training data, our method achieved an 88.95% F-measure compared with 22.81% for existing math OCR software.","PeriodicalId":200469,"journal":{"name":"Proceedings of the 2017 ACM Symposium on Document Engineering","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116187487","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Proceedings of the 2017 ACM Symposium on Document Engineering","authors":"K. Camilleri, Alexandra Bonnici","doi":"10.1145/3103010","DOIUrl":"https://doi.org/10.1145/3103010","url":null,"abstract":"It is with honour and pleasure that we welcome you in Valletta for the 17th ACM Symposium on Document Engineering. DocEng 2017 is being organised by the University of Malta's Department of Systems and Control Engineering on the 4-7th September 2017. The symposium brings together experts in all areas of document engineering from both academia and industry, with the intention of presenting and discussing the most recent advances in the field of Document Engineering. \u0000 \u0000Building on the experiences of previous years, the DocEng symposium program consists of one day of tutorials followed by three days of paper presentations. The program offers three half-day tutorials on Historic Document Processing, Document Engineering Issues in Malware Analysis and User Evaluation in the Document Engineering Field. DocEng2017 also keeps alive the tradition of the Birds of a Feather discussion group which will be led by Charles Nicholas. Of course, the highlight of the symposium will be the keynote talks. \u0000Sketched Visual Narratives for Image and Video Search by John Collomosse from the University of Surrey \u0000The Notarial Archives, Valletta: Starting from Zero by Theresa Zammit Lupi from theValletta Notarial Archives. \u0000 \u0000 \u0000 \u0000DocEng received a total of 71 papers; 36 of these papers were submitted in April as full papers, with a further 35 papers being submitted in June as short papers and application notes. All papers were reviewed by at least three Program Committee members and based on these recommendations, the symposium accepted 13 (36%) papers as full papers, 13 (37%) as short papers with an oral presentation and a further 10 as poster presentations. \u0000 \u0000This year, DocEng participated in the Review Quality Collector, an initiative for improving the quality of scientific peer review whereby reviewers were invited to grade their co-reviewers on aspects related to helpfulness to authors, timeliness and helpfulness for decision. Reviewers were given a receipt for their work for the symposium and the five top-ranked reviewers will be recognised during the symposium. \u0000 \u0000The symposium continues in its support for student researchers who will be the future generation of researchers in document engineering. To this extent, DocEng 2017 offers students the opportunity to select a student mentor during the conference. The mentors, senior and experienced researchers will be able to discuss the student research, providing advice, feedback and constructive criticism. With the support of ACM SIGWEB, students are given travel grants to help them participate in the symposium.","PeriodicalId":200469,"journal":{"name":"Proceedings of the 2017 ACM Symposium on Document Engineering","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126738453","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Effective Floating Strategies","authors":"F. Mittelbach","doi":"10.1145/3103010.3103015","DOIUrl":"https://doi.org/10.1145/3103010.3103015","url":null,"abstract":"This paper presents an extension to the general framework for globally optimized pagination described in Mittelbach (2016). The extended algorithm supports automatic placement of floats as part of the optimization. It uses a flexible constraint model that allows for the implementation of typical typographic rules that can be weighted against each other to support different application scenarios. By \"flexible\" we mean that the rules of typographic presentation of the content of a document element are not fixed---but neither are they completely arbitrary; also, some of these rules are absolute whereas others are in the form of preferences. It is easy to see that without restrictions the float placement possibilities grow exponentially if the number of floats has a linear relation to the document size. It is therefore important to restrict the objective function used for optimization in a way that the algorithm does not have to evaluate all theoretically possible placements while still being guaranteed to find an optimal solution. Different objective functions are being evaluated against typical typographic requirements in order to arrive at a system that is both rich in its expressiveness of modeling a large class of pagination applications and at the same time is capable of solving the optimization problem in acceptable time for realistic input data. Frank Mittelbach. 2016. A General Framework for Globally Optimized Pagination. In Proceedings of the 2016 ACM Symposium on Document Engineering (DocEng '16). ACM, New York, NY, USA, pages 11--20.","PeriodicalId":200469,"journal":{"name":"Proceedings of the 2017 ACM Symposium on Document Engineering","volume":"819 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129475831","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Session details: Generation, Manipulation and Presentation","authors":"Tamir Hassan","doi":"10.1145/3248706","DOIUrl":"https://doi.org/10.1145/3248706","url":null,"abstract":"","PeriodicalId":200469,"journal":{"name":"Proceedings of the 2017 ACM Symposium on Document Engineering","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130180878","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}