{"title":"Displaying chemical structural formulae in ePub format","authors":"S. Marinai, Stefano Quiriconi","doi":"10.1145/2361354.2361382","DOIUrl":"https://doi.org/10.1145/2361354.2361382","url":null,"abstract":"We describe one tool designed to enhance the visualization of chemical structural formulae in E-book readers. When dealing with small formulae, to avoid the pixelation effect with zoomed images, the formula is converted to a vectoral representation and then enlarged. On the opposite, large formulae are split in sub-images by cutting the image in suitable locations attempting to reduce the parts of the formula that are broken. In both cases the formulae are embedded in one ePub document that allows users to browse the chemical structure on most reading devices.","PeriodicalId":91385,"journal":{"name":"Proceedings of the ACM Symposium on Document Engineering. ACM Symposium on Document Engineering","volume":"34 1","pages":"125-128"},"PeriodicalIF":0.0,"publicationDate":"2012-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75245171","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Optimal guillotine layout","authors":"G. Gange, K. Marriott, Peter James Stuckey","doi":"10.1145/2361354.2361359","DOIUrl":"https://doi.org/10.1145/2361354.2361359","url":null,"abstract":"Guillotine-based page layout is a method for document layout commonly used by newspapers and magazines, where each region of the page either contains a single article, or is recursively split either vertically or horizontally. Suprisingly there appears to be little research into algorithms for automatic guillotine-based document layout. In this paper we give efficient algorithms to find optimal solutions to guillotine layout problems of two forms. Fixed-cut layout is where the structure of the guillotining is given and we only have to determine the best configuration for each individual article to give the optimal total configuration. Free layout is where we also have to search for the optimal structure. We give bottom-up and top-down dynamic programming algorithms to solve these problems, and propose a novel interaction model for documents on electronic media. Experiments show that our algorithms are effective for realistic layout problems.","PeriodicalId":91385,"journal":{"name":"Proceedings of the ACM Symposium on Document Engineering. ACM Symposium on Document Engineering","volume":"230 1","pages":"13-22"},"PeriodicalIF":0.0,"publicationDate":"2012-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76927280","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Learning how to trade off aesthetic criteria in layout","authors":"P. Moulder, K. Marriott","doi":"10.1145/2361354.2361361","DOIUrl":"https://doi.org/10.1145/2361354.2361361","url":null,"abstract":"Typesetting software is often faced with conflicting aesthetic goals. For example, choosing where to break lines in text might involve aiming to minimize hyphenation, variation in word spacing, and consecutive lines starting with the same word. Typically, automatic layout is modelled as an optimization problem in which the goal is to minimize a complex objective function that combines various penalty functions each of which corresponds to a particular bad feature. Determining how to combine these penalty functions is difficult and very time consuming, becoming harder each time we add another penalty. Here we present a machine-learning approach to do this, and test it in the context of line-breaking. Our approach repeatedly queries the expert typographer as to which one of a pair of layouts is better, and accordingly refines the estimate of how best to weight the penalties in a linear combination. It chooses layout pair queries by a heuristic to maximize the amount that can be learnt from them so as to reduce the number of combinations that must be considered by the typographer.","PeriodicalId":91385,"journal":{"name":"Proceedings of the ACM Symposium on Document Engineering. ACM Symposium on Document Engineering","volume":"29 1","pages":"33-36"},"PeriodicalIF":0.0,"publicationDate":"2012-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81221478","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Young-Min Kim, P. Bellot, J. Tavernier, Elodie Faath, Marin Dacos
{"title":"Evaluation of BILBO reference parsing in digital humanities via a comparison of different tools","authors":"Young-Min Kim, P. Bellot, J. Tavernier, Elodie Faath, Marin Dacos","doi":"10.1145/2361354.2361400","DOIUrl":"https://doi.org/10.1145/2361354.2361400","url":null,"abstract":"Automatic bibliographic reference annotation involves the tokenization and identification of reference fields. Recent methods use machine learning techniques such as Conditional Random Fields to tackle this problem. On the other hand, the state of the art methods always learn and evaluate their systems with a well structured data having simple format such as bibliography at the end of scientific articles. And that is a reason why the parsing of new reference different from a regular format does not work well. In our previous work, we have established a standard for the tokenization and feature selection with a less formulaic data such as notes. In this paper, we evaluate our system BILBO with other popular online reference parsing tools on a new data from totally different source. BILBO is constructed with our own corpora extracted and annotated from real world data, digital humanities articles of Revues.org site (90% in French) of OpenEdition. The robustness of BILBO system allows a language independent tagging result. We expect that this first attempt of evaluation will motivate the development of other efficient techniques for the scattered and less formulaic bibliographic references.","PeriodicalId":91385,"journal":{"name":"Proceedings of the ACM Symposium on Document Engineering. ACM Symposium on Document Engineering","volume":"23 1","pages":"209-212"},"PeriodicalIF":0.0,"publicationDate":"2012-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74535031","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Effective radical segmentation of offline handwritten Chinese characters towards constructing personal handwritten fonts","authors":"Zhanghui Chen, Baoyao Zhou","doi":"10.1145/2361354.2361379","DOIUrl":"https://doi.org/10.1145/2361354.2361379","url":null,"abstract":"Effective radical segmentation of handwritten Chinese characters can greatly facilitate the subsequent character processing tasks, such as Chinese handwriting recognition/identification and the generation of Chinese handwritten fonts. In this paper, a popular snake model is enhanced by considering the guided image force and optimized by Genetic Algorithm, such that it achieves a significant improvement in terms of both accuracy and efficiency when applied to segment the radicals in handwritten Chinese characters. The proposed radical segmentation approach consists of three stages: constructing guide information, Genetic Algorithm optimization and post-embellishment. Testing results show that the proposed approach can effectively decompose radicals with overlaps and connections from handwritten Chinese characters with various layout structures. The segmentation accuracy reaches 94.91% for complicated samples with overlapped and connected radicals and the segmentation speed is 0.05 second per character. For demonstrating the advantages of the approach, radicals extracted from the user input samples are reused to construct personal Chinese handwritten font library. Experiments show that the constructed characters well maintain the handwriting style of the user and have good enough performance. In this way, the user only needs to write a small number of samples for obtaining his/her own handwritten font library. This method greatly reduces the cost of existing solutions and makes it much easier for people to use computers to write letters/e-mails, diaries/blogs, even magazines/books in their own handwriting.","PeriodicalId":91385,"journal":{"name":"Proceedings of the ACM Symposium on Document Engineering. ACM Symposium on Document Engineering","volume":"19 1","pages":"107-116"},"PeriodicalIF":0.0,"publicationDate":"2012-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75624042","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The Glozz platform: a corpus annotation and mining tool","authors":"Antoine Widlöcher, Yann Mathet","doi":"10.1145/2361354.2361394","DOIUrl":"https://doi.org/10.1145/2361354.2361394","url":null,"abstract":"Corpus linguistics and Natural Language Processing make it necessary to produce and share reference annotations to which linguistic and computational models can be compared. Creating such resources requires a formal framework supporting description of heterogeneous linguistic objects and structures, appropriate representation formats, and adequate manual annotation tools, making it possible to locate, identify and describe linguistic phenomena in textual documents. The Glozz platform addresses all these needs, and provides a highly versatile corpus annotation tool with advanced visualization, querying and evaluation possibilities.","PeriodicalId":91385,"journal":{"name":"Proceedings of the ACM Symposium on Document Engineering. ACM Symposium on Document Engineering","volume":"21 1","pages":"171-180"},"PeriodicalIF":0.0,"publicationDate":"2012-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75982826","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"500 year documentation","authors":"F. Marchese, Maninder Pal Kaur Shergill","doi":"10.1145/2361354.2361391","DOIUrl":"https://doi.org/10.1145/2361354.2361391","url":null,"abstract":"Museum visitors today can regularly view 500 year old art by Renaissance masters. Will visitors to museums 500 years in the future be able to see the work of digital artists from the early 21st century? This paper considers the real problem of conserving interactive digital artwork for museum installation in the far distant future by exploring the requirements for creating documentation that will support an artwork's adaptation to future technology. In effect, this documentation must survive as long as the artwork itself -- effectively, in perpetuity. A proposal is made for the use of software engineering methodologies as solutions for designing this documentation.","PeriodicalId":91385,"journal":{"name":"Proceedings of the ACM Symposium on Document Engineering. ACM Symposium on Document Engineering","volume":"42 1","pages":"157-160"},"PeriodicalIF":0.0,"publicationDate":"2012-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80151994","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Seongchan Kim, Keejun Han, Soon Young Kim, Ying Liu
{"title":"Scientific table type classification in digital library","authors":"Seongchan Kim, Keejun Han, Soon Young Kim, Ying Liu","doi":"10.1145/2361354.2361384","DOIUrl":"https://doi.org/10.1145/2361354.2361384","url":null,"abstract":"Tables are ubiquitous in digital libraries and on the Web, utilized to satisfy various types of data delivery and document formatting goals. For example, tables are widely used to present experimental results or statistical data in a condensed fashion in scientific documents. Identifying and organizing tables of different types is an absolutely necessary task for better table understanding, and data sharing and reusing. This paper has a three-fold contribution: 1) We propose Introduction, Methods, Results, and Discussion (IMRAD)-based table functional classification for scientific documents; 2) A fine-grained table taxonomy is introduced based on an extensive observation and investigation of tables in digital libraries; and 3) We investigate table characteristics and classify tables automatically based on the defined taxonomy. The preliminary experimental results show that our table taxonomy with salient features can significantly improve scientific table classification performance.","PeriodicalId":91385,"journal":{"name":"Proceedings of the ACM Symposium on Document Engineering. ACM Symposium on Document Engineering","volume":"28 1","pages":"133-136"},"PeriodicalIF":0.0,"publicationDate":"2012-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91166559","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A section title authoring tool for clinical guidelines","authors":"M. Truran, G. Georg, M. Cavazza, Dong Zhou","doi":"10.1145/2361354.2361364","DOIUrl":"https://doi.org/10.1145/2361354.2361364","url":null,"abstract":"Professional users of medical information often report difficulties when attempting to locate specific information in lengthy documents. Sometimes these difficulties can be attributed to poorly specified section titles which fail to advertise relevant content. In this paper we describe preliminary work on a software plug-in for a document engineering environment that will assist authors when they formulate section-level headings. We describe two different algorithms which can be used to generate section titles. We compare the performance of these algorithms and correlate our experimental results with an evaluation of title quality performed by domain experts.","PeriodicalId":91385,"journal":{"name":"Proceedings of the ACM Symposium on Document Engineering. ACM Symposium on Document Engineering","volume":"114 1","pages":"41-44"},"PeriodicalIF":0.0,"publicationDate":"2012-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77684479","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Content and document based approach for digital productivity applications","authors":"Thierry Delprat","doi":"10.1145/2361354.2361372","DOIUrl":"https://doi.org/10.1145/2361354.2361372","url":null,"abstract":"In today's world most of the data produced and consumed by employees is content. In this talk we will present our approach to create and deploy content and document based applications to improve business processes and user experience.","PeriodicalId":91385,"journal":{"name":"Proceedings of the ACM Symposium on Document Engineering. ACM Symposium on Document Engineering","volume":"24 1","pages":"83-84"},"PeriodicalIF":0.0,"publicationDate":"2012-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80390204","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}