Proceedings of the 22nd ACM Symposium on Document Engineering最新文献

筛选
英文 中文
How did dennis ritchie produce his PhD thesis?: a typographical mystery 丹尼斯·里奇是如何完成他的博士论文的?印刷上的谜团
Proceedings of the 22nd ACM Symposium on Document Engineering Pub Date : 2022-09-20 DOI: 10.1145/3558100.3563839
D. Brailsford, B. Kernighan, Williamson Ritchie
{"title":"How did dennis ritchie produce his PhD thesis?: a typographical mystery","authors":"D. Brailsford, B. Kernighan, Williamson Ritchie","doi":"10.1145/3558100.3563839","DOIUrl":"https://doi.org/10.1145/3558100.3563839","url":null,"abstract":"Dennis Ritchie, the creator of the C programming language and, with Ken Thompson, the co-creator of the Unix operating system, completed his Harvard PhD thesis on recursive function theory in early 1968. But for unknown reasons, he never officially received his degree, and the thesis itself disappeared for nearly 50 years. This strange set of circumstances raises at least three broad questions: • What was the technical contribution of the thesis? • Why wasn't the degree granted? • How was the thesis prepared? This paper investigates the third question: how was a long and typographically complicated mathematical thesis produced at a very early stage in the history of computerized document preparation?","PeriodicalId":146244,"journal":{"name":"Proceedings of the 22nd ACM Symposium on Document Engineering","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114498233","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Modifying PDF sewing patterns for use with projectors 修改PDF缝纫模式与投影仪使用
Proceedings of the 22nd ACM Symposium on Document Engineering Pub Date : 2022-09-20 DOI: 10.1145/3558100.3563853
Charlotte Curtis
{"title":"Modifying PDF sewing patterns for use with projectors","authors":"Charlotte Curtis","doi":"10.1145/3558100.3563853","DOIUrl":"https://doi.org/10.1145/3558100.3563853","url":null,"abstract":"Print-at-home PDF sewing patterns have gained popularity over the last decade and now represent a significant proportion of the home sewing pattern market. Recently, an all-digital workflow has emerged through the use of ceiling-mounted projectors, allowing for patterns to be projected directly onto fabric. However, PDF patterns produced for printing are not suitable for projecting. This paper presents PDFStitcher, an open-source cross-platform graphical tool that enables end users to modify PDF sewing patterns for use with a projector. The key functionality of PDFStitcher is described, followed by a brief discussion on the future of sewing pattern file formats and information processing.","PeriodicalId":146244,"journal":{"name":"Proceedings of the 22nd ACM Symposium on Document Engineering","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131290044","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Theory entity extraction for social and behavioral sciences papers using distant supervision 使用远程监督的社会和行为科学论文的理论实体提取
Proceedings of the 22nd ACM Symposium on Document Engineering Pub Date : 2022-09-20 DOI: 10.1145/3558100.3563855
Xin Wei, Lamia Salsabil, Jian Wu
{"title":"Theory entity extraction for social and behavioral sciences papers using distant supervision","authors":"Xin Wei, Lamia Salsabil, Jian Wu","doi":"10.1145/3558100.3563855","DOIUrl":"https://doi.org/10.1145/3558100.3563855","url":null,"abstract":"Theories and models, which are common in scientific papers in almost all domains, usually provide the foundations of theoretical analysis and experiments. Understanding the use of theories and models can shed light on the credibility and reproducibility of research works. Compared with metadata, such as title, author, keywords, etc., theory extraction in scientific literature is rarely explored, especially for social and behavioral science (SBS) domains. One challenge of applying supervised learning methods is the lack of a large number of labeled samples for training. In this paper, we propose an automated framework based on distant supervision that leverages entity mentions from Wikipedia to build a ground truth corpus consisting of more than 4500 automatically annotated sentences containing theory/model mentions. We use this corpus to train models for theory extraction in SBS papers. We compared four deep learning architectures and found the RoBERTa-BiLSTM-CRF is the best one with a precision as high as 89.72%. The model is promising to be conveniently extended to domains other than SBS. The code and data are publicly available at https://github.com/lamps-lab/theory.","PeriodicalId":146244,"journal":{"name":"Proceedings of the 22nd ACM Symposium on Document Engineering","volume":"341-342 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123865609","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Graphical document representation for french newsletters analysis 法语通讯分析的图形文件表示
Proceedings of the 22nd ACM Symposium on Document Engineering Pub Date : 2022-09-20 DOI: 10.1145/3558100.3563856
Alexis Blandin, Farida Saïd, Jeanne Villaneau, P. Marteau
{"title":"Graphical document representation for french newsletters analysis","authors":"Alexis Blandin, Farida Saïd, Jeanne Villaneau, P. Marteau","doi":"10.1145/3558100.3563856","DOIUrl":"https://doi.org/10.1145/3558100.3563856","url":null,"abstract":"Document analysis is essential in many industrial applications. However, engineering natural language resources to represent entire documents is still challenging. Besides, available resources in French are scarce and do not cover all possible tasks, especially in specific business applications. In this context, we present a French newsletter dataset and its use to predict the good or bad impact of newsletters on readers. We propose a new representation of newsletters in the form of graphs that consider the newsletters' layout. We evaluate the relevance of the proposed representation to predict a newsletter's performance in terms of open and click rates using graph analysis methods.","PeriodicalId":146244,"journal":{"name":"Proceedings of the 22nd ACM Symposium on Document Engineering","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117163831","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Optical character recognition guided image super resolution 光学字符识别引导图像超分辨率
Proceedings of the 22nd ACM Symposium on Document Engineering Pub Date : 2022-09-20 DOI: 10.1145/3558100.3563841
Philipp Hildebrandt, Maximilian Schulze, S. Cohen, Vanja Doskoc, Raid Saabni, Tobias Friedrich
{"title":"Optical character recognition guided image super resolution","authors":"Philipp Hildebrandt, Maximilian Schulze, S. Cohen, Vanja Doskoc, Raid Saabni, Tobias Friedrich","doi":"10.1145/3558100.3563841","DOIUrl":"https://doi.org/10.1145/3558100.3563841","url":null,"abstract":"Recognizing disturbed text in real-life images is a difficult problem, as information that is missing due to low resolution or out-of-focus text has to be recreated. Combining text super-resolution and optical character recognition deep learning models can be a valuable tool to enlarge and enhance text images for better readability, as well as recognize text automatically afterwards. We achieve improved peak signal-to-noise ratio and text recognition accuracy scores over a state-of-the-art text super-resolution model TBSRN on the real-world low-resolution dataset TextZoom while having a smaller theoretical model size due to the usage of quantization techniques. In addition, we show how different training strategies influence the performance of the resulting model.","PeriodicalId":146244,"journal":{"name":"Proceedings of the 22nd ACM Symposium on Document Engineering","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128170195","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Scholarly big data quality assessment: a case study of document linking and conflation with S2ORC 学术大数据质量评估:以S2ORC文件链接与合并为例
Proceedings of the 22nd ACM Symposium on Document Engineering Pub Date : 2022-09-20 DOI: 10.1145/3558100.3563850
Jian Wu, Ryan Hiltabrand, Dominik Soós, C. Lee Giles
{"title":"Scholarly big data quality assessment: a case study of document linking and conflation with S2ORC","authors":"Jian Wu, Ryan Hiltabrand, Dominik Soós, C. Lee Giles","doi":"10.1145/3558100.3563850","DOIUrl":"https://doi.org/10.1145/3558100.3563850","url":null,"abstract":"Recently, the Allen Institute for Artificial Intelligence released the Semantic Scholar Open Research Corpus (S2ORC), one of the largest open-access scholarly big datasets with more than 130 million scholarly paper records. S2ORC contains a significant portion of automatically generated metadata. The metadata quality could impact downstream tasks such as citation analysis, citation prediction, and link analysis. In this project, we assess the document linking quality and estimate the document conflation rate for the S2ORC dataset. Using semi-automatically curated ground truth corpora, we estimated that the overall document linking quality is high, with 92.6% of documents correctly linking to six major databases, but the linking quality varies depending on subject domains. The document conflation rate is around 2.6%, meaning that about 97.4% of documents are unique. We further quantitatively compared three near-duplicate detection methods using the ground truth created from S2ORC. The experiments indicated that locality-sensitive hashing was the best method in terms of effectiveness and scalability, achieving high performance (F1=0.960) and a much reduced runtime. Our code and data are available at https://github.com/lamps-lab/docconflation.","PeriodicalId":146244,"journal":{"name":"Proceedings of the 22nd ACM Symposium on Document Engineering","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131445789","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Long-term lifecycle-related management of digital building documents: towards a holistic and standard-based concept for a technical and organizational solution in building authorities 与数字建筑文件的长期生命周期相关的管理:为建筑当局提供技术和组织解决方案的整体和基于标准的概念
Proceedings of the 22nd ACM Symposium on Document Engineering Pub Date : 2022-09-20 DOI: 10.1145/3558100.3563842
Uwe M. Borghoff, Eberhard Pfeiffer, Peter Rödig
{"title":"Long-term lifecycle-related management of digital building documents: towards a holistic and standard-based concept for a technical and organizational solution in building authorities","authors":"Uwe M. Borghoff, Eberhard Pfeiffer, Peter Rödig","doi":"10.1145/3558100.3563842","DOIUrl":"https://doi.org/10.1145/3558100.3563842","url":null,"abstract":"The long-term lifecycle-related management of digital building information is essential to improve the overall quality of public built assets. However, this management task still poses great challenges for building authorities, as they are usually responsible for large, heterogeneous and long-lived built assets with countless of data sets and documents that are increasingly changing from analogue to digital representations. These digital collections are characterized by complex dependencies, by numerous different, sometimes highly specialized and proprietary formats and also by their inappropriate organization. The major challenge is to ensure completeness, consistency and usability over the entire lifecycle of buildings or their associated digital data and documents. In this paper, we present an approach for a holistic and standard-based concept for a technical and organizational solution in building authorities. Holistic means integrating concepts for the long-term usability of digital building information, taking into account the framework conditions described in building authorities, including the introduction of BIM (building information modeling). To this end, we outline how the concepts of the consolidated and widely accepted ISO-standardized reference model OAIS (open archive information system) can be applied to a building-specific information architecture. First, we sketch the history of electronic data processing in the building sector and introduce the essential concepts of OAIS. Then, we illustrate typical major actors and their (future) IT systems, including systems intended for OAIS-compliant long-term usability. Next, we outline major (future) software components and their interactions and assignment to lifecycle phases. Finally, we delineate how the generic information model of OAIS can be used. In summary, ensuring the long-term usability of digital information in the building sector will remain a grand challenge, but our proposed approach to the systematic application and further refinement of the OAIS reference model can help to better organize future discussions as well as research, development and implementation activities. We conclude with some suggestions for further research based on the concepts of the OAIS reference model, such as refining information models or developing information repositories needed for long-term interpretation of digital objects.","PeriodicalId":146244,"journal":{"name":"Proceedings of the 22nd ACM Symposium on Document Engineering","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129867435","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SeNMFk-SPLIT: large corpora topic modeling by semantic non-negative matrix factorization with automatic model selection SeNMFk-SPLIT:基于语义非负矩阵分解和自动模型选择的大型语料库主题建模
Proceedings of the 22nd ACM Symposium on Document Engineering Pub Date : 2022-08-21 DOI: 10.1145/3558100.3563844
M. Eren, N. Solovyev, Manish Bhattarai, Kim Ø. Rasmussen, Charles Nicholas, B. Alexandrov
{"title":"SeNMFk-SPLIT: large corpora topic modeling by semantic non-negative matrix factorization with automatic model selection","authors":"M. Eren, N. Solovyev, Manish Bhattarai, Kim Ø. Rasmussen, Charles Nicholas, B. Alexandrov","doi":"10.1145/3558100.3563844","DOIUrl":"https://doi.org/10.1145/3558100.3563844","url":null,"abstract":"As the amount of text data continues to grow, topic modeling is serving an important role in understanding the content hidden by the overwhelming quantity of documents. One popular topic modeling approach is non-negative matrix factorization (NMF), an unsupervised machine learning (ML) method. Recently, Semantic NMF with automatic model selection (SeNMFk) has been proposed as a modification to NMF. In addition to heuristically estimating the number of topics, SeNMFk also incorporates the semantic structure of the text. This is performed by jointly factorizing the term frequency-inverse document frequency (TF-IDF) matrix with the co-occurrence/word-context matrix, the values of which represent the number of times two words co-occur in a predetermined window of the text. In this paper, we introduce a novel distributed method, SeNMFk-SPLIT, for semantic topic extraction suitable for large corpora. Contrary to SeNMFk, our method enables the joint factorization of large documents by decomposing the word-context and term-document matrices separately. We demonstrate the capability of SeNMFk-SPLIT by applying it to the entire artificial intelligence (AI) and ML scientific literature uploaded on arXiv.","PeriodicalId":146244,"journal":{"name":"Proceedings of the 22nd ACM Symposium on Document Engineering","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115812369","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Downstream transformer generation of question-answer pairs with preprocessing and postprocessing pipelines 带预处理和后处理管道的下游变压器问答对生成
Proceedings of the 22nd ACM Symposium on Document Engineering Pub Date : 2022-05-15 DOI: 10.1145/3558100.3563846
Cheng Zhang, Hao Zhang, Jie Wang
{"title":"Downstream transformer generation of question-answer pairs with preprocessing and postprocessing pipelines","authors":"Cheng Zhang, Hao Zhang, Jie Wang","doi":"10.1145/3558100.3563846","DOIUrl":"https://doi.org/10.1145/3558100.3563846","url":null,"abstract":"We present a method to perform a downstream task of transformers on generating question-answer pairs (QAPs) from a given article. We first finetune pretrained transformers on QAP datasets. We then use a preprocessing pipeline to select appropriate answers from the article, and feed each answer and the relevant context to the finetuned transformer to generate a candidate QAP. Finally we use a postprocessing pipeline to filter inadequate QAPs. In particular, using pretrained T5 models as transformers and the SQuAD dataset as the finetruning dataset, we obtain a finetuned T5 model that outperforms previous models on standard performance measures over the SQuAD dataset. We then show that our method based on this finetuned model generates a satisfactory number of QAPs with high qualities on the Gaokao-EN dataset assessed by human judges.","PeriodicalId":146244,"journal":{"name":"Proceedings of the 22nd ACM Symposium on Document Engineering","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117316826","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Proceedings of the 22nd ACM Symposium on Document Engineering 第22届美国计算机学会文献工程研讨会论文集
Proceedings of the 22nd ACM Symposium on Document Engineering Pub Date : 2012-09-04 DOI: 10.1145/2361354
C. Concolato, P. Schmitz
{"title":"Proceedings of the 22nd ACM Symposium on Document Engineering","authors":"C. Concolato, P. Schmitz","doi":"10.1145/2361354","DOIUrl":"https://doi.org/10.1145/2361354","url":null,"abstract":"It is our great pleasure to welcome you to the 2012 ACM Symposium on Document Engineering -- DocEng 2012, which is being held September 4-7, 2012, in Paris, France. This year's symposium continues its tradition of being the premier forum for presentation of research results and experience reports on leading edge issues of document presentation and adaptation, analysis, modeling, transformation, systems, theory, and applications. The mission of the symposium is to share significant results, to evaluate novel approaches and models, and to identify promising directions for future research and development. DocEng gives researchers and practitioners a unique opportunity to share their perspectives with others interested in the various aspects of document engineering. \u0000 \u0000The call for papers attracted 89 submissions from Asia, Australia, Canada, Europe, the Russian Federation, and the United States. The program committee accepted 14 of 42 full paper submissions (33%), plus another 20 short papers, and 5 demos and posters, for a combined acceptance rate of 44%. The papers cover a variety of topics, including Layout and Presentation Control, Document Analysis, OCR and Visual Analysis, Multimedia and Hypermedia, XML and Related Tools, Architecture and Document Management, Search and Sense-making, and Digital Humanities. In addition, the program includes workshops on authoring issues, and on education models and curricula for Document Engineering. DocEng 2012 features keynote speeches by Bruno Bachimont of the Institut National de l'Audiovisuel, and Universite de Technologie de Compiagne, and by Thierry Delprat of Nuxeo. We hope that these proceedings will serve as a valuable reference for document engineering researchers and developers.","PeriodicalId":146244,"journal":{"name":"Proceedings of the 22nd ACM Symposium on Document Engineering","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132806773","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信