Proceedings of the ACM Symposium on Document Engineering. ACM Symposium on Document Engineering最新文献

筛选
英文 中文
Component-based hypervideo model: high-level operational specification of hypervideos 基于组件的超视频模型:超视频的高级操作规范
Madjid Sadallah, Olivier Aubert, Yannick Prié
{"title":"Component-based hypervideo model: high-level operational specification of hypervideos","authors":"Madjid Sadallah, Olivier Aubert, Yannick Prié","doi":"10.1145/2034691.2034701","DOIUrl":"https://doi.org/10.1145/2034691.2034701","url":null,"abstract":"Hypervideo offers enhanced video-centric experiences. Usually defined from a hypermedia perspective, the lack of a dedicated specification hampers hypervideo domain and concepts from being broadly investigated. This article proposes a specialized hypervideo model that addresses hypervideo specificities.\u0000 Following the principles of component-based modeling and annotation-driven content abstracting, the Component-based Hypervideo Model (CHM) that we propose is a high level representation of hypervideos that intends to provide a general and dedicated hypervideo data model.\u0000 Considered as a video-centric interactive document, the CHM hypervideo presentation and interaction features are expressed through a high level operational specification. Our annotation-driven approach promotes a clear separation of data from video content and document visualizations. The model serves as a basis for a Web-oriented implementation that provides a declarative syntax and accompanying tools for hypervideo document design in a Web standards-compliant manner.","PeriodicalId":91385,"journal":{"name":"Proceedings of the ACM Symposium on Document Engineering. ACM Symposium on Document Engineering","volume":"35 1","pages":"53-56"},"PeriodicalIF":0.0,"publicationDate":"2011-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72546775","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Building a topic hierarchy using the bag-of-related-words representation 使用相关词袋表示构建主题层次结构
R. G. Rossi, S. O. Rezende
{"title":"Building a topic hierarchy using the bag-of-related-words representation","authors":"R. G. Rossi, S. O. Rezende","doi":"10.1145/2034691.2034733","DOIUrl":"https://doi.org/10.1145/2034691.2034733","url":null,"abstract":"A simple and intuitive way to organize a huge document collection is by a topic hierarchy. Generally two steps are carried out to build a topic hierarchy automatically: 1) hierarchical document clustering and 2) cluster labeling. For both steps, a good textual document representation is essential. The bag-of-words is the common way to represent text collections. In this representation, each document is represented by a vector where each word in the document collection represents a dimension (feature). This approach has well known problems as the high dimensionality and sparsity of data. Besides, most of the concepts are composed by more than one word, as \"document engineering\" or \"text mining\". In this paper an approach called bag-of-related-words is proposed to generate features compounded by a set of related words with a dimensionality smaller than the bag-of-words. The features are extracted from each textual document of a collection using association rules. Different ways to map the document into transactions in order to allow the extraction of association rules and interest measures to prune the number of features are analyzed. To evaluate how much the proposed approach can aid the topic hierarchy building, we carried out an objective evaluation for the clustering structure, and a subjective evaluation for topic hierarchies. All the results were compared with the bag-of-words. The obtained results demonstrated that the proposed representation is better than the bag-of-words for the topic hierarchy building.","PeriodicalId":91385,"journal":{"name":"Proceedings of the ACM Symposium on Document Engineering. ACM Symposium on Document Engineering","volume":"28 1","pages":"195-204"},"PeriodicalIF":0.0,"publicationDate":"2011-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85627473","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Evaluating CRDTs for real-time document editing 评估实时文档编辑的crdt
Mehdi Ahmed-Nacer, C. Ignat, G. Oster, Hyun-Gul Roh, Pascal Urso
{"title":"Evaluating CRDTs for real-time document editing","authors":"Mehdi Ahmed-Nacer, C. Ignat, G. Oster, Hyun-Gul Roh, Pascal Urso","doi":"10.1145/2034691.2034717","DOIUrl":"https://doi.org/10.1145/2034691.2034717","url":null,"abstract":"Nowadays, real-time editing systems are catching on. Tools such as Etherpad or Google Docs enable multiple authors at dispersed locations to collaboratively write shared documents. In such systems, a replication mechanism is required to ensure consistency when merging concurrent changes performed on the same document. Current editing systems make use of operational transformation (OT), a traditional replication mechanism for concurrent document editing.\u0000 Recently, Commutative Replicated Data Types (CRDTs) were introduced as a new class of replication mechanisms whose concurrent operations are designed to be natively commutative. CRDTs, such as WOOT, Logoot, Treedoc, and RGAs, are expected to be substitutes of replication mechanisms in collaborative editing systems.\u0000 This paper demonstrates the suitability of CRDTs for real-time collaborative editing. To reflect the tendency of decentralised collaboration, which can resist censorship, tolerate failures, and let users have control over documents, we collected editing logs from real-time peer-to-peer collaborations. We present our experiment results obtained by replaying those editing logs on various CRDTs and an OT algorithm implemented in the same environment.","PeriodicalId":91385,"journal":{"name":"Proceedings of the ACM Symposium on Document Engineering. ACM Symposium on Document Engineering","volume":"23 1","pages":"103-112"},"PeriodicalIF":0.0,"publicationDate":"2011-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90918305","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 78
Version control workshop 版本控制车间
Neil Fraser
{"title":"Version control workshop","authors":"Neil Fraser","doi":"10.1145/2034691.2034745","DOIUrl":"https://doi.org/10.1145/2034691.2034745","url":null,"abstract":"This three hour workshop takes participants on a tour of popular Version Control systems, particularly Subversion and Git. By the end of the workshop each participant will be proficient in using both of these systems. The focus is on solving real-world problems, such as resolving conflicting changes or rolling back a change. This workshop is not about the theory or academic underpinnings of such systems. Participants are required to bring a Macintosh, Linux or Windows laptop.","PeriodicalId":91385,"journal":{"name":"Proceedings of the ACM Symposium on Document Engineering. ACM Symposium on Document Engineering","volume":"30 1","pages":"267-268"},"PeriodicalIF":0.0,"publicationDate":"2011-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84375882","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Contributions to the study of SMS spam filtering: new collection and results 对短信垃圾邮件过滤研究的贡献:新的收集和结果
Tiago A. Almeida, J. M. G. Hidalgo, A. Yamakami
{"title":"Contributions to the study of SMS spam filtering: new collection and results","authors":"Tiago A. Almeida, J. M. G. Hidalgo, A. Yamakami","doi":"10.1145/2034691.2034742","DOIUrl":"https://doi.org/10.1145/2034691.2034742","url":null,"abstract":"The growth of mobile phone users has lead to a dramatic increasing of SMS spam messages. In practice, fighting mobile phone spam is difficult by several factors, including the lower rate of SMS that has allowed many users and service providers to ignore the issue, and the limited availability of mobile phone spam-filtering software. On the other hand, in academic settings, a major handicap is the scarcity of public SMS spam datasets, that are sorely needed for validation and comparison of different classifiers. Moreover, as SMS messages are fairly short, content-based spam filters may have their performance degraded. In this paper, we offer a new real, public and non-encoded SMS spam collection that is the largest one as far as we know. Moreover, we compare the performance achieved by several established machine learning methods. The results indicate that Support Vector Machine outperforms other evaluated classifiers and, hence, it can be used as a good baseline for further comparison.","PeriodicalId":91385,"journal":{"name":"Proceedings of the ACM Symposium on Document Engineering. ACM Symposium on Document Engineering","volume":"34 1","pages":"259-262"},"PeriodicalIF":0.0,"publicationDate":"2011-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82765843","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 406
An exploratory analysis of mind maps 对思维导图的探索性分析
J. Beel, Stefan Langer
{"title":"An exploratory analysis of mind maps","authors":"J. Beel, Stefan Langer","doi":"10.1145/2034691.2034709","DOIUrl":"https://doi.org/10.1145/2034691.2034709","url":null,"abstract":"The results presented in this paper come from an exploratory study of 19,379 mind maps created by 11,179 users from the mind mapping applications 'Docear' and 'MindMeister'. The objective was to find out how mind maps are structured and which information they contain. The results include: A typical mind map is rather small, with 31 nodes on average (median), whereas each node usually contains between one to three words. In 66.12% of cases there are few notes, if any, and the number of hyperlinks tends to be rather low, too, but depends upon the mind mapping application. Most mind maps are edited only on one (60.76%) or two days (18.41%). It is to expect that a typical user creates around 2.7 mind maps (mean) a year. However, there are exceptions which create a long tail. One user created 243 mind maps, the largest mind map contained 52,182 nodes, one node contained 7,497 words and one mind map was edited on 142 days.","PeriodicalId":91385,"journal":{"name":"Proceedings of the ACM Symposium on Document Engineering. ACM Symposium on Document Engineering","volume":"1 1","pages":"81-84"},"PeriodicalIF":0.0,"publicationDate":"2011-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90207235","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 39
A versatile model for web page representation, information extraction and content re-packaging 一个用于网页表示、信息提取和内容重新包装的通用模型
Bernhard Krüpl, R. R. Fayzrakhmanov, Wolfgang Holzinger, Mathias Panzenböck, Robert Baumgartner
{"title":"A versatile model for web page representation, information extraction and content re-packaging","authors":"Bernhard Krüpl, R. R. Fayzrakhmanov, Wolfgang Holzinger, Mathias Panzenböck, Robert Baumgartner","doi":"10.1145/2034691.2034721","DOIUrl":"https://doi.org/10.1145/2034691.2034721","url":null,"abstract":"On today's Web, designers take huge efforts to create visually rich websites that boast a magnitude of interactive elements. Contrarily, most web information extraction (WIE) algorithms are still based on attributed tree methods which struggle to deal with this complexity. In this paper, we introduce a versatile model to represent web documents. The model is based on gestalt theory principles---trying to capture the most important aspects in a formally exact way. It (i) represents and unifies access to visual layout, content and functional aspects; (ii) is implemented with semantic web techniques that can be leveraged for i.e. automatic reasoning. Considering the visual appearance of a web page, we view it as a collection of gestalt figures---based on gestalt primitives---each representing a specific design pattern, be it navigation menus or news articles. Based on this model, we introduce our WIE methodology, a re-engineering process involving design patterns, statistical distributions and text content properties. The complete framework consists of the UOM model, which formalizes the mentioned components, and the MANM layer that hints on structure and serialization, providing document re-packaging foundations. Finally, we discuss how we have applied and evaluated our model in the area of web accessibility.","PeriodicalId":91385,"journal":{"name":"Proceedings of the ACM Symposium on Document Engineering. ACM Symposium on Document Engineering","volume":"71 1","pages":"129-138"},"PeriodicalIF":0.0,"publicationDate":"2011-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85147539","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
Citation pattern matching algorithms for citation-based plagiarism detection: greedy citation tiling, citation chunking and longest common citation sequence 基于引文的抄袭检测的引文模式匹配算法:贪婪引文平铺、引文分块和最长公共引文序列
Bela Gipp, Norman Meuschke
{"title":"Citation pattern matching algorithms for citation-based plagiarism detection: greedy citation tiling, citation chunking and longest common citation sequence","authors":"Bela Gipp, Norman Meuschke","doi":"10.1145/2034691.2034741","DOIUrl":"https://doi.org/10.1145/2034691.2034741","url":null,"abstract":"Plagiarism Detection Systems have been developed to locate instances of plagiarism e.g. within scientific papers. Studies have shown that the existing approaches deliver reasonable results in identifying copy&paste plagiarism, but fail to detect more sophisticated forms such as paraphrased plagiarism, translation plagiarism or idea plagiarism. The authors of this paper demonstrated in recent studies that the detection rate can be significantly improved by not only relying on text analysis, but by additionally analyzing the citations of a document. Citations are valuable language independent markers that are similar to a fingerprint. In fact, our examinations of real world cases have shown that the order of citations in a document often remains similar even if the text has been strongly paraphrased or translated in order to disguise plagiarism.\u0000 This paper introduces three algorithms and discusses their suitability for the purpose of citation-based plagiarism detection. Due to the numerous ways in which plagiarism can occur, these algorithms need to be versatile. They must be capable of detecting transpositions, scaling and combinations in a local and global form. The algorithms are coined Greedy Citation Tiling, Citation Chunking and Longest Common Citation Sequence. The evaluation showed that if these algorithms are combined, common forms of plagiarism can be detected reliably.","PeriodicalId":91385,"journal":{"name":"Proceedings of the ACM Symposium on Document Engineering. ACM Symposium on Document Engineering","volume":"242 1","pages":"249-258"},"PeriodicalIF":0.0,"publicationDate":"2011-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76551406","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 76
Introduction of a dynamic assistance to the creative process of adding dimensions to multistructured documents 介绍了一个动态的辅助,以创造性的过程,增加维度,以多结构的文件
P. Portier, S. Calabretto
{"title":"Introduction of a dynamic assistance to the creative process of adding dimensions to multistructured documents","authors":"P. Portier, S. Calabretto","doi":"10.1145/2034691.2034728","DOIUrl":"https://doi.org/10.1145/2034691.2034728","url":null,"abstract":"We consider documents as the results of dynamic processes of documentary fragments' associations. We have experienced that once a substantial number of associations exist, users need some synoptic views. One possible way of providing such views relies in the organization of associations into relevant subsets that we call \"dimensions\". Thus, dimensions offer orders along which a documentary archive can be traversed. Many works have proposed efficient ways of presenting combinations of dimensions through graphical user interfaces. Moreover, there are studies on the structural properties of dimensional hypertexts. However, the problem of the origins and evolution of dimensions has not yet received a similar attention. Thus, we propose a mechanism based on a simple structural constraint for helping users in the construction of dimensions: if a cycle appears within a dimension while a user is creating a new dimension by the aggregation of existing ones, he will be encouraged (and assisted in his task) to restructure the dimensions in order to cut the cycle. This is a first step towards a rational control of the emergence and evolution of dimensions.","PeriodicalId":91385,"journal":{"name":"Proceedings of the ACM Symposium on Document Engineering. ACM Symposium on Document Engineering","volume":"48 8 1","pages":"167-170"},"PeriodicalIF":0.0,"publicationDate":"2011-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75226434","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A generic calculus of XML editing deltas XML编辑增量的一般演算
Jean-Yves Vion-Dury
{"title":"A generic calculus of XML editing deltas","authors":"Jean-Yves Vion-Dury","doi":"10.1145/2034691.2034718","DOIUrl":"https://doi.org/10.1145/2034691.2034718","url":null,"abstract":"In previous work we outlined a mathematical model of the so-called XML editing deltas and proposed a first study of their formal properties. We expected at least three outputs from this theoretical work: a common basis to compare performances of the various algorithms through a structural normalization of deltas, a universal and flexible patch application model and a clearer separation of patch and merge engine performance from delta generation performance. This paper presents the full calculus and reports significant progresses with respect to formalizing a normalization procedure. Such method is key to defining an equivalence relation between editing scripts and eventually designing optimizers compiler back-ends, new patch specification languages and execution models.","PeriodicalId":91385,"journal":{"name":"Proceedings of the ACM Symposium on Document Engineering. ACM Symposium on Document Engineering","volume":"8 1","pages":"113-120"},"PeriodicalIF":0.0,"publicationDate":"2011-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87306624","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信