Proceedings of the 21st ACM Symposium on Document Engineering最新文献

筛选
英文 中文
Small-step pipelines reduce the complexity of XSLT/XPath programs 小步骤管道降低了XSLT/XPath程序的复杂性
Proceedings of the 21st ACM Symposium on Document Engineering Pub Date : 2021-08-16 DOI: 10.1145/3469096.3474922
Marcel Schaeben, Gioele Barabucci
{"title":"Small-step pipelines reduce the complexity of XSLT/XPath programs","authors":"Marcel Schaeben, Gioele Barabucci","doi":"10.1145/3469096.3474922","DOIUrl":"https://doi.org/10.1145/3469096.3474922","url":null,"abstract":"As code is adapted to deal with unclean external data, it tends to grow in size and complexity. The use of small-step pipelines (a programming style in which data is manipulated through small, independent steps; a variation on the point-free and concatenative programming paradigms) has been suggested as a way to reduce this complexity, resulting in simpler programs. Our preliminary quantitative results show that writing data-curation and data-analysis XSLT/XPath programs as small-step pipelines leads to a significant reduction of the peak McCabe cyclo-matic complexity. This reduction of complexity is associated with a parallel increase in readability of the resulting code.","PeriodicalId":423462,"journal":{"name":"Proceedings of the 21st ACM Symposium on Document Engineering","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126255598","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Table-structure recognition method using neural networks for implicit ruled line estimation and cell estimation 表结构识别方法采用神经网络进行隐式直纹线估计和单元估计
Proceedings of the 21st ACM Symposium on Document Engineering Pub Date : 2021-08-16 DOI: 10.1145/3469096.3469870
Manabu Ohta, R. Yamada, T. Kanazawa, A. Takasu
{"title":"Table-structure recognition method using neural networks for implicit ruled line estimation and cell estimation","authors":"Manabu Ohta, R. Yamada, T. Kanazawa, A. Takasu","doi":"10.1145/3469096.3469870","DOIUrl":"https://doi.org/10.1145/3469096.3469870","url":null,"abstract":"Tables are often used to summarize accurate values in academic papers, while graphs are used to show them visually. Automatic graph generation from a table is therefore a topic of research interest. Given that the way tables are written varies depending on the author, in earlier work we proposed a cell-detection-based table-structure recognition method. Our method achieved fair performance in experiments using the ICDAR 2013 table competition dataset, but could not outperform the top-ranked participant in the competition. This paper proposes an improved method using two neural networks: one estimates implicit ruled lines that are necessary to separate cells but are undrawn, and the other estimates cells by merging detected tokens in a table. We demonstrated the effectiveness of the proposed method by experiments using the same ICDAR 2013 dataset. It achieved an F-measure of 0.955, thereby outperforming the other methods including the top-ranked participant.","PeriodicalId":423462,"journal":{"name":"Proceedings of the 21st ACM Symposium on Document Engineering","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130427061","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Session details: Collections, systems and management 会话细节:集合、系统和管理
Proceedings of the 21st ACM Symposium on Document Engineering Pub Date : 2021-08-16 DOI: 10.1145/3482787
Angelo Di Iorio
{"title":"Session details: Collections, systems and management","authors":"Angelo Di Iorio","doi":"10.1145/3482787","DOIUrl":"https://doi.org/10.1145/3482787","url":null,"abstract":"","PeriodicalId":423462,"journal":{"name":"Proceedings of the 21st ACM Symposium on Document Engineering","volume":"39 4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132936076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
20 years of physical document and product protection using digital methods 使用数字方法进行20年的物理文档和产品保护
Proceedings of the 21st ACM Symposium on Document Engineering Pub Date : 2021-08-16 DOI: 10.1145/3469096.3476452
J. Picard
{"title":"20 years of physical document and product protection using digital methods","authors":"J. Picard","doi":"10.1145/3469096.3476452","DOIUrl":"https://doi.org/10.1145/3469096.3476452","url":null,"abstract":"The human hand can copy all that the human hand creates\", said Baron von Eichtal in 1843 when he was in charge of banknote issuing for the bank of Bavaria. Despite all the advancements in security printing and anti-counterfeiting, these words ring more true than ever. Indeed in the last thirty years, counterfeiters have greatly benefited from the globalisation of supply chains as well as from the digitization of production methods. Yet the development of digital technologies has also opened unprecedented possibilities for addressing the 3000 years old problem of counterfeiting. This presentation will review some of the counterfeit detection techniques developed in the last 20 years, including printed digital watermarks, copy detection patterns and secure QR Codes. These technologies not only open up the possibility for potentially anyone to verify authenticity with their smartphones, but they also enable the digitization of the trillions of physical documents, packaging and products that we produce every year. Real world examples from industrial applications as well as current research topics will be discussed.","PeriodicalId":423462,"journal":{"name":"Proceedings of the 21st ACM Symposium on Document Engineering","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131226181","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Trustworthiness of spam email addresses using machine learning 使用机器学习的垃圾邮件地址的可信度
Proceedings of the 21st ACM Symposium on Document Engineering Pub Date : 2021-08-16 DOI: 10.1145/3469096.3475060
Francisco Jáñez-Martino, R. Alaíz-Rodríguez, V. González-Castro, Eduardo FIDALGO
{"title":"Trustworthiness of spam email addresses using machine learning","authors":"Francisco Jáñez-Martino, R. Alaíz-Rodríguez, V. González-Castro, Eduardo FIDALGO","doi":"10.1145/3469096.3475060","DOIUrl":"https://doi.org/10.1145/3469096.3475060","url":null,"abstract":"Cybercriminals have increasingly used spam email to send scams, phishing, malware and other frauds to organisations and people. They design sophisticated and contextualised emails to make them look trustworthy for users, being the sender addresses an essential part. Although cybersecurity agencies and companies develop products and organise courses for people to detect emails patterns, spam attacks are not totally avoided yet. This work presents a proof-of-concept methodology to give the user more meaningful information about trustworthiness to detect these harmful emails. For the first time in the literature, we present an email address dataset manually labelled into two classes, low and high quality. Moreover, we extracted 18 handcrafted features based on social engineering techniques and natural language properties. We evaluated four popular machine learning classifiers and obtained the best performance with Naive Bayes, i.e., 88.17% of accuracy and 0.808 of F1-Score. Additionally, we applied the InterpretML framework to find out the most relevant properties to eventually implement an automatic system able to inform about the trustworthiness of email addresses.","PeriodicalId":423462,"journal":{"name":"Proceedings of the 21st ACM Symposium on Document Engineering","volume":"222 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132384759","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Proceedings of the 21st ACM Symposium on Document Engineering 第21届美国计算机学会文献工程研讨会论文集
Proceedings of the 21st ACM Symposium on Document Engineering Pub Date : 2021-08-16 DOI: 10.1145/3469096
{"title":"Proceedings of the 21st ACM Symposium on Document Engineering","authors":"","doi":"10.1145/3469096","DOIUrl":"https://doi.org/10.1145/3469096","url":null,"abstract":"","PeriodicalId":423462,"journal":{"name":"Proceedings of the 21st ACM Symposium on Document Engineering","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115095833","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SlideGen
Proceedings of the 21st ACM Symposium on Document Engineering Pub Date : 2021-08-16 DOI: 10.1145/3469096.3474939
Athar Sefid, Prasenjit K. Mitra, Lee Giles
{"title":"SlideGen","authors":"Athar Sefid, Prasenjit K. Mitra, Lee Giles","doi":"10.1145/3469096.3474939","DOIUrl":"https://doi.org/10.1145/3469096.3474939","url":null,"abstract":"Presentation slides generated from research papers provide summary of the papers primarily to guide talks. Manually generating presentation slides is labor intensive. We propose a method to automatically generate slides for scientific articles based on a corpus of 5000 paper-slide pairs compiled from conference proceedings websites which is the largest dataset used for scholarly article summarization. We generate slides 1) extractively by selecting salient sentences from the paper and 2) abstractively by fine-tuning pre-trained language models to learn the language of slides. The results show the superiority of the extractive models in terms of ROUGE scores. However, abstractive summaries are less verbose and follow the language of the slides by generating phrases rather than full sentences.","PeriodicalId":423462,"journal":{"name":"Proceedings of the 21st ACM Symposium on Document Engineering","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122138818","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Session details: Generation, manipulation and presentation 会话细节:生成、操作和表示
Proceedings of the 21st ACM Symposium on Document Engineering Pub Date : 2021-08-16 DOI: 10.1145/3482782
S. Bagley
{"title":"Session details: Generation, manipulation and presentation","authors":"S. Bagley","doi":"10.1145/3482782","DOIUrl":"https://doi.org/10.1145/3482782","url":null,"abstract":"","PeriodicalId":423462,"journal":{"name":"Proceedings of the 21st ACM Symposium on Document Engineering","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130777495","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Session details: Keynote II 会议详情:主题演讲二
Proceedings of the 21st ACM Symposium on Document Engineering Pub Date : 2021-08-16 DOI: 10.1145/3482783
S. Simske
{"title":"Session details: Keynote II","authors":"S. Simske","doi":"10.1145/3482783","DOIUrl":"https://doi.org/10.1145/3482783","url":null,"abstract":"","PeriodicalId":423462,"journal":{"name":"Proceedings of the 21st ACM Symposium on Document Engineering","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115552064","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ALiBERT
Proceedings of the 21st ACM Symposium on Document Engineering Pub Date : 2021-08-16 DOI: 10.1145/3469096.3474928
Rajkumar Ramamurthy, Maren Pielka, Robin Stenzel, Christian Bauckhage, R. Sifa, T. Khameneh, Ulrich Warning, Bernd Kliem, Rüdiger Loitz
{"title":"ALiBERT","authors":"Rajkumar Ramamurthy, Maren Pielka, Robin Stenzel, Christian Bauckhage, R. Sifa, T. Khameneh, Ulrich Warning, Bernd Kliem, Rüdiger Loitz","doi":"10.1145/3469096.3474928","DOIUrl":"https://doi.org/10.1145/3469096.3474928","url":null,"abstract":"We consider Automated List Inspection (ALI), a content-based text recommendation system that assists auditors in matching relevant text passages from notes in financial statements to specific law regulations. ALI follows a ranking paradigm in which a fixed number of requirements per textual passage are shown to the user. Despite achieving impressive ranking performance, the user experience can still be improved by showing a dynamic number of recommendations. Besides, existing models rely on a feature-based language model that needs to be pre-trained on a large corpus of domain-specific datasets. Moreover, they cannot be trained in an end-to-end fashion by jointly optimizing with language model parameters. In this work, we alleviate these concerns by considering a multi-label classification approach that predicts dynamic requirement sequences. We base our model on pre-trained BERT that allows us to fine-tune the whole model in an end-to-end fashion, thereby avoiding the need for training a language representation model. We conclude by presenting a detailed evaluation of the proposed model on two German financial datasets.","PeriodicalId":423462,"journal":{"name":"Proceedings of the 21st ACM Symposium on Document Engineering","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127622202","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信