Proceedings of the ACM Symposium on Document Engineering. ACM Symposium on Document Engineering最新文献_第9页

Component-based hypervideo model: high-level operational specification of hypervideos 基于组件的超视频模型:超视频的高级操作规范

Proceedings of the ACM Symposium on Document Engineering. ACM Symposium on Document Engineering Pub Date : 2011-09-19 DOI: 10.1145/2034691.2034701

Madjid Sadallah, Olivier Aubert, Yannick Prié

引用次数: 8

Building a topic hierarchy using the bag-of-related-words representation 使用相关词袋表示构建主题层次结构

Proceedings of the ACM Symposium on Document Engineering. ACM Symposium on Document Engineering Pub Date : 2011-09-19 DOI: 10.1145/2034691.2034733

R. G. Rossi, S. O. Rezende

{"title":"Building a topic hierarchy using the bag-of-related-words representation","authors":"R. G. Rossi, S. O. Rezende","doi":"10.1145/2034691.2034733","DOIUrl":"https://doi.org/10.1145/2034691.2034733","url":null,"abstract":"A simple and intuitive way to organize a huge document collection is by a topic hierarchy. Generally two steps are carried out to build a topic hierarchy automatically: 1) hierarchical document clustering and 2) cluster labeling. For both steps, a good textual document representation is essential. The bag-of-words is the common way to represent text collections. In this representation, each document is represented by a vector where each word in the document collection represents a dimension (feature). This approach has well known problems as the high dimensionality and sparsity of data. Besides, most of the concepts are composed by more than one word, as \"document engineering\" or \"text mining\". In this paper an approach called bag-of-related-words is proposed to generate features compounded by a set of related words with a dimensionality smaller than the bag-of-words. The features are extracted from each textual document of a collection using association rules. Different ways to map the document into transactions in order to allow the extraction of association rules and interest measures to prune the number of features are analyzed. To evaluate how much the proposed approach can aid the topic hierarchy building, we carried out an objective evaluation for the clustering structure, and a subjective evaluation for topic hierarchies. All the results were compared with the bag-of-words. The obtained results demonstrated that the proposed representation is better than the bag-of-words for the topic hierarchy building.","PeriodicalId":91385,"journal":{"name":"Proceedings of the ACM Symposium on Document Engineering. ACM Symposium on Document Engineering","volume":"28 1","pages":"195-204"},"PeriodicalIF":0.0,"publicationDate":"2011-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85627473","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 18

Evaluating CRDTs for real-time document editing 评估实时文档编辑的crdt

Proceedings of the ACM Symposium on Document Engineering. ACM Symposium on Document Engineering Pub Date : 2011-09-19 DOI: 10.1145/2034691.2034717

Mehdi Ahmed-Nacer, C. Ignat, G. Oster, Hyun-Gul Roh, Pascal Urso

{"title":"Evaluating CRDTs for real-time document editing","authors":"Mehdi Ahmed-Nacer, C. Ignat, G. Oster, Hyun-Gul Roh, Pascal Urso","doi":"10.1145/2034691.2034717","DOIUrl":"https://doi.org/10.1145/2034691.2034717","url":null,"abstract":"Nowadays, real-time editing systems are catching on. Tools such as Etherpad or Google Docs enable multiple authors at dispersed locations to collaboratively write shared documents. In such systems, a replication mechanism is required to ensure consistency when merging concurrent changes performed on the same document. Current editing systems make use of operational transformation (OT), a traditional replication mechanism for concurrent document editing.\u0000 Recently, Commutative Replicated Data Types (CRDTs) were introduced as a new class of replication mechanisms whose concurrent operations are designed to be natively commutative. CRDTs, such as WOOT, Logoot, Treedoc, and RGAs, are expected to be substitutes of replication mechanisms in collaborative editing systems.\u0000 This paper demonstrates the suitability of CRDTs for real-time collaborative editing. To reflect the tendency of decentralised collaboration, which can resist censorship, tolerate failures, and let users have control over documents, we collected editing logs from real-time peer-to-peer collaborations. We present our experiment results obtained by replaying those editing logs on various CRDTs and an OT algorithm implemented in the same environment.","PeriodicalId":91385,"journal":{"name":"Proceedings of the ACM Symposium on Document Engineering. ACM Symposium on Document Engineering","volume":"23 1","pages":"103-112"},"PeriodicalIF":0.0,"publicationDate":"2011-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90918305","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 78

Version control workshop 版本控制车间

Proceedings of the ACM Symposium on Document Engineering. ACM Symposium on Document Engineering Pub Date : 2011-09-19 DOI: 10.1145/2034691.2034745

Neil Fraser

引用次数: 0

Contributions to the study of SMS spam filtering: new collection and results 对短信垃圾邮件过滤研究的贡献:新的收集和结果

Proceedings of the ACM Symposium on Document Engineering. ACM Symposium on Document Engineering Pub Date : 2011-09-19 DOI: 10.1145/2034691.2034742

Tiago A. Almeida, J. M. G. Hidalgo, A. Yamakami

引用次数: 406

An exploratory analysis of mind maps 对思维导图的探索性分析

Proceedings of the ACM Symposium on Document Engineering. ACM Symposium on Document Engineering Pub Date : 2011-09-19 DOI: 10.1145/2034691.2034709

J. Beel, Stefan Langer

引用次数: 39

A versatile model for web page representation, information extraction and content re-packaging 一个用于网页表示、信息提取和内容重新包装的通用模型

Proceedings of the ACM Symposium on Document Engineering. ACM Symposium on Document Engineering Pub Date : 2011-09-19 DOI: 10.1145/2034691.2034721

Bernhard Krüpl, R. R. Fayzrakhmanov, Wolfgang Holzinger, Mathias Panzenböck, Robert Baumgartner

{"title":"A versatile model for web page representation, information extraction and content re-packaging","authors":"Bernhard Krüpl, R. R. Fayzrakhmanov, Wolfgang Holzinger, Mathias Panzenböck, Robert Baumgartner","doi":"10.1145/2034691.2034721","DOIUrl":"https://doi.org/10.1145/2034691.2034721","url":null,"abstract":"On today's Web, designers take huge efforts to create visually rich websites that boast a magnitude of interactive elements. Contrarily, most web information extraction (WIE) algorithms are still based on attributed tree methods which struggle to deal with this complexity. In this paper, we introduce a versatile model to represent web documents. The model is based on gestalt theory principles---trying to capture the most important aspects in a formally exact way. It (i) represents and unifies access to visual layout, content and functional aspects; (ii) is implemented with semantic web techniques that can be leveraged for i.e. automatic reasoning. Considering the visual appearance of a web page, we view it as a collection of gestalt figures---based on gestalt primitives---each representing a specific design pattern, be it navigation menus or news articles. Based on this model, we introduce our WIE methodology, a re-engineering process involving design patterns, statistical distributions and text content properties. The complete framework consists of the UOM model, which formalizes the mentioned components, and the MANM layer that hints on structure and serialization, providing document re-packaging foundations. Finally, we discuss how we have applied and evaluated our model in the area of web accessibility.","PeriodicalId":91385,"journal":{"name":"Proceedings of the ACM Symposium on Document Engineering. ACM Symposium on Document Engineering","volume":"71 1","pages":"129-138"},"PeriodicalIF":0.0,"publicationDate":"2011-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85147539","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 20

Citation pattern matching algorithms for citation-based plagiarism detection: greedy citation tiling, citation chunking and longest common citation sequence 基于引文的抄袭检测的引文模式匹配算法:贪婪引文平铺、引文分块和最长公共引文序列

Proceedings of the ACM Symposium on Document Engineering. ACM Symposium on Document Engineering Pub Date : 2011-09-19 DOI: 10.1145/2034691.2034741

Bela Gipp, Norman Meuschke

{"title":"Citation pattern matching algorithms for citation-based plagiarism detection: greedy citation tiling, citation chunking and longest common citation sequence","authors":"Bela Gipp, Norman Meuschke","doi":"10.1145/2034691.2034741","DOIUrl":"https://doi.org/10.1145/2034691.2034741","url":null,"abstract":"Plagiarism Detection Systems have been developed to locate instances of plagiarism e.g. within scientific papers. Studies have shown that the existing approaches deliver reasonable results in identifying copy&paste plagiarism, but fail to detect more sophisticated forms such as paraphrased plagiarism, translation plagiarism or idea plagiarism. The authors of this paper demonstrated in recent studies that the detection rate can be significantly improved by not only relying on text analysis, but by additionally analyzing the citations of a document. Citations are valuable language independent markers that are similar to a fingerprint. In fact, our examinations of real world cases have shown that the order of citations in a document often remains similar even if the text has been strongly paraphrased or translated in order to disguise plagiarism.\u0000 This paper introduces three algorithms and discusses their suitability for the purpose of citation-based plagiarism detection. Due to the numerous ways in which plagiarism can occur, these algorithms need to be versatile. They must be capable of detecting transpositions, scaling and combinations in a local and global form. The algorithms are coined Greedy Citation Tiling, Citation Chunking and Longest Common Citation Sequence. The evaluation showed that if these algorithms are combined, common forms of plagiarism can be detected reliably.","PeriodicalId":91385,"journal":{"name":"Proceedings of the ACM Symposium on Document Engineering. ACM Symposium on Document Engineering","volume":"242 1","pages":"249-258"},"PeriodicalIF":0.0,"publicationDate":"2011-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76551406","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 76

Introduction of a dynamic assistance to the creative process of adding dimensions to multistructured documents 介绍了一个动态的辅助，以创造性的过程，增加维度，以多结构的文件

Proceedings of the ACM Symposium on Document Engineering. ACM Symposium on Document Engineering Pub Date : 2011-09-19 DOI: 10.1145/2034691.2034728

P. Portier, S. Calabretto

{"title":"Introduction of a dynamic assistance to the creative process of adding dimensions to multistructured documents","authors":"P. Portier, S. Calabretto","doi":"10.1145/2034691.2034728","DOIUrl":"https://doi.org/10.1145/2034691.2034728","url":null,"abstract":"We consider documents as the results of dynamic processes of documentary fragments' associations. We have experienced that once a substantial number of associations exist, users need some synoptic views. One possible way of providing such views relies in the organization of associations into relevant subsets that we call \"dimensions\". Thus, dimensions offer orders along which a documentary archive can be traversed. Many works have proposed efficient ways of presenting combinations of dimensions through graphical user interfaces. Moreover, there are studies on the structural properties of dimensional hypertexts. However, the problem of the origins and evolution of dimensions has not yet received a similar attention. Thus, we propose a mechanism based on a simple structural constraint for helping users in the construction of dimensions: if a cycle appears within a dimension while a user is creating a new dimension by the aggregation of existing ones, he will be encouraged (and assisted in his task) to restructure the dimensions in order to cut the cycle. This is a first step towards a rational control of the emergence and evolution of dimensions.","PeriodicalId":91385,"journal":{"name":"Proceedings of the ACM Symposium on Document Engineering. ACM Symposium on Document Engineering","volume":"48 8 1","pages":"167-170"},"PeriodicalIF":0.0,"publicationDate":"2011-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75226434","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A generic calculus of XML editing deltas XML编辑增量的一般演算

Proceedings of the ACM Symposium on Document Engineering. ACM Symposium on Document Engineering Pub Date : 2011-09-19 DOI: 10.1145/2034691.2034718

Jean-Yves Vion-Dury

引用次数: 5