{"title":"Component-based hypervideo model: high-level operational specification of hypervideos","authors":"Madjid Sadallah, Olivier Aubert, Yannick Prié","doi":"10.1145/2034691.2034701","DOIUrl":"https://doi.org/10.1145/2034691.2034701","url":null,"abstract":"Hypervideo offers enhanced video-centric experiences. Usually defined from a hypermedia perspective, the lack of a dedicated specification hampers hypervideo domain and concepts from being broadly investigated. This article proposes a specialized hypervideo model that addresses hypervideo specificities.\u0000 Following the principles of component-based modeling and annotation-driven content abstracting, the Component-based Hypervideo Model (CHM) that we propose is a high level representation of hypervideos that intends to provide a general and dedicated hypervideo data model.\u0000 Considered as a video-centric interactive document, the CHM hypervideo presentation and interaction features are expressed through a high level operational specification. Our annotation-driven approach promotes a clear separation of data from video content and document visualizations. The model serves as a basis for a Web-oriented implementation that provides a declarative syntax and accompanying tools for hypervideo document design in a Web standards-compliant manner.","PeriodicalId":91385,"journal":{"name":"Proceedings of the ACM Symposium on Document Engineering. ACM Symposium on Document Engineering","volume":"35 1","pages":"53-56"},"PeriodicalIF":0.0,"publicationDate":"2011-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72546775","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Building a topic hierarchy using the bag-of-related-words representation","authors":"R. G. Rossi, S. O. Rezende","doi":"10.1145/2034691.2034733","DOIUrl":"https://doi.org/10.1145/2034691.2034733","url":null,"abstract":"A simple and intuitive way to organize a huge document collection is by a topic hierarchy. Generally two steps are carried out to build a topic hierarchy automatically: 1) hierarchical document clustering and 2) cluster labeling. For both steps, a good textual document representation is essential. The bag-of-words is the common way to represent text collections. In this representation, each document is represented by a vector where each word in the document collection represents a dimension (feature). This approach has well known problems as the high dimensionality and sparsity of data. Besides, most of the concepts are composed by more than one word, as \"document engineering\" or \"text mining\". In this paper an approach called bag-of-related-words is proposed to generate features compounded by a set of related words with a dimensionality smaller than the bag-of-words. The features are extracted from each textual document of a collection using association rules. Different ways to map the document into transactions in order to allow the extraction of association rules and interest measures to prune the number of features are analyzed. To evaluate how much the proposed approach can aid the topic hierarchy building, we carried out an objective evaluation for the clustering structure, and a subjective evaluation for topic hierarchies. All the results were compared with the bag-of-words. The obtained results demonstrated that the proposed representation is better than the bag-of-words for the topic hierarchy building.","PeriodicalId":91385,"journal":{"name":"Proceedings of the ACM Symposium on Document Engineering. ACM Symposium on Document Engineering","volume":"28 1","pages":"195-204"},"PeriodicalIF":0.0,"publicationDate":"2011-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85627473","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mehdi Ahmed-Nacer, C. Ignat, G. Oster, Hyun-Gul Roh, Pascal Urso
{"title":"Evaluating CRDTs for real-time document editing","authors":"Mehdi Ahmed-Nacer, C. Ignat, G. Oster, Hyun-Gul Roh, Pascal Urso","doi":"10.1145/2034691.2034717","DOIUrl":"https://doi.org/10.1145/2034691.2034717","url":null,"abstract":"Nowadays, real-time editing systems are catching on. Tools such as Etherpad or Google Docs enable multiple authors at dispersed locations to collaboratively write shared documents. In such systems, a replication mechanism is required to ensure consistency when merging concurrent changes performed on the same document. Current editing systems make use of operational transformation (OT), a traditional replication mechanism for concurrent document editing.\u0000 Recently, Commutative Replicated Data Types (CRDTs) were introduced as a new class of replication mechanisms whose concurrent operations are designed to be natively commutative. CRDTs, such as WOOT, Logoot, Treedoc, and RGAs, are expected to be substitutes of replication mechanisms in collaborative editing systems.\u0000 This paper demonstrates the suitability of CRDTs for real-time collaborative editing. To reflect the tendency of decentralised collaboration, which can resist censorship, tolerate failures, and let users have control over documents, we collected editing logs from real-time peer-to-peer collaborations. We present our experiment results obtained by replaying those editing logs on various CRDTs and an OT algorithm implemented in the same environment.","PeriodicalId":91385,"journal":{"name":"Proceedings of the ACM Symposium on Document Engineering. ACM Symposium on Document Engineering","volume":"23 1","pages":"103-112"},"PeriodicalIF":0.0,"publicationDate":"2011-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90918305","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Version control workshop","authors":"Neil Fraser","doi":"10.1145/2034691.2034745","DOIUrl":"https://doi.org/10.1145/2034691.2034745","url":null,"abstract":"This three hour workshop takes participants on a tour of popular Version Control systems, particularly Subversion and Git. By the end of the workshop each participant will be proficient in using both of these systems. The focus is on solving real-world problems, such as resolving conflicting changes or rolling back a change. This workshop is not about the theory or academic underpinnings of such systems. Participants are required to bring a Macintosh, Linux or Windows laptop.","PeriodicalId":91385,"journal":{"name":"Proceedings of the ACM Symposium on Document Engineering. ACM Symposium on Document Engineering","volume":"30 1","pages":"267-268"},"PeriodicalIF":0.0,"publicationDate":"2011-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84375882","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Contributions to the study of SMS spam filtering: new collection and results","authors":"Tiago A. Almeida, J. M. G. Hidalgo, A. Yamakami","doi":"10.1145/2034691.2034742","DOIUrl":"https://doi.org/10.1145/2034691.2034742","url":null,"abstract":"The growth of mobile phone users has lead to a dramatic increasing of SMS spam messages. In practice, fighting mobile phone spam is difficult by several factors, including the lower rate of SMS that has allowed many users and service providers to ignore the issue, and the limited availability of mobile phone spam-filtering software. On the other hand, in academic settings, a major handicap is the scarcity of public SMS spam datasets, that are sorely needed for validation and comparison of different classifiers. Moreover, as SMS messages are fairly short, content-based spam filters may have their performance degraded. In this paper, we offer a new real, public and non-encoded SMS spam collection that is the largest one as far as we know. Moreover, we compare the performance achieved by several established machine learning methods. The results indicate that Support Vector Machine outperforms other evaluated classifiers and, hence, it can be used as a good baseline for further comparison.","PeriodicalId":91385,"journal":{"name":"Proceedings of the ACM Symposium on Document Engineering. ACM Symposium on Document Engineering","volume":"34 1","pages":"259-262"},"PeriodicalIF":0.0,"publicationDate":"2011-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82765843","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An exploratory analysis of mind maps","authors":"J. Beel, Stefan Langer","doi":"10.1145/2034691.2034709","DOIUrl":"https://doi.org/10.1145/2034691.2034709","url":null,"abstract":"The results presented in this paper come from an exploratory study of 19,379 mind maps created by 11,179 users from the mind mapping applications 'Docear' and 'MindMeister'. The objective was to find out how mind maps are structured and which information they contain. The results include: A typical mind map is rather small, with 31 nodes on average (median), whereas each node usually contains between one to three words. In 66.12% of cases there are few notes, if any, and the number of hyperlinks tends to be rather low, too, but depends upon the mind mapping application. Most mind maps are edited only on one (60.76%) or two days (18.41%). It is to expect that a typical user creates around 2.7 mind maps (mean) a year. However, there are exceptions which create a long tail. One user created 243 mind maps, the largest mind map contained 52,182 nodes, one node contained 7,497 words and one mind map was edited on 142 days.","PeriodicalId":91385,"journal":{"name":"Proceedings of the ACM Symposium on Document Engineering. ACM Symposium on Document Engineering","volume":"1 1","pages":"81-84"},"PeriodicalIF":0.0,"publicationDate":"2011-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90207235","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bernhard Krüpl, R. R. Fayzrakhmanov, Wolfgang Holzinger, Mathias Panzenböck, Robert Baumgartner
{"title":"A versatile model for web page representation, information extraction and content re-packaging","authors":"Bernhard Krüpl, R. R. Fayzrakhmanov, Wolfgang Holzinger, Mathias Panzenböck, Robert Baumgartner","doi":"10.1145/2034691.2034721","DOIUrl":"https://doi.org/10.1145/2034691.2034721","url":null,"abstract":"On today's Web, designers take huge efforts to create visually rich websites that boast a magnitude of interactive elements. Contrarily, most web information extraction (WIE) algorithms are still based on attributed tree methods which struggle to deal with this complexity. In this paper, we introduce a versatile model to represent web documents. The model is based on gestalt theory principles---trying to capture the most important aspects in a formally exact way. It (i) represents and unifies access to visual layout, content and functional aspects; (ii) is implemented with semantic web techniques that can be leveraged for i.e. automatic reasoning. Considering the visual appearance of a web page, we view it as a collection of gestalt figures---based on gestalt primitives---each representing a specific design pattern, be it navigation menus or news articles. Based on this model, we introduce our WIE methodology, a re-engineering process involving design patterns, statistical distributions and text content properties. The complete framework consists of the UOM model, which formalizes the mentioned components, and the MANM layer that hints on structure and serialization, providing document re-packaging foundations. Finally, we discuss how we have applied and evaluated our model in the area of web accessibility.","PeriodicalId":91385,"journal":{"name":"Proceedings of the ACM Symposium on Document Engineering. ACM Symposium on Document Engineering","volume":"71 1","pages":"129-138"},"PeriodicalIF":0.0,"publicationDate":"2011-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85147539","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Citation pattern matching algorithms for citation-based plagiarism detection: greedy citation tiling, citation chunking and longest common citation sequence","authors":"Bela Gipp, Norman Meuschke","doi":"10.1145/2034691.2034741","DOIUrl":"https://doi.org/10.1145/2034691.2034741","url":null,"abstract":"Plagiarism Detection Systems have been developed to locate instances of plagiarism e.g. within scientific papers. Studies have shown that the existing approaches deliver reasonable results in identifying copy&paste plagiarism, but fail to detect more sophisticated forms such as paraphrased plagiarism, translation plagiarism or idea plagiarism. The authors of this paper demonstrated in recent studies that the detection rate can be significantly improved by not only relying on text analysis, but by additionally analyzing the citations of a document. Citations are valuable language independent markers that are similar to a fingerprint. In fact, our examinations of real world cases have shown that the order of citations in a document often remains similar even if the text has been strongly paraphrased or translated in order to disguise plagiarism.\u0000 This paper introduces three algorithms and discusses their suitability for the purpose of citation-based plagiarism detection. Due to the numerous ways in which plagiarism can occur, these algorithms need to be versatile. They must be capable of detecting transpositions, scaling and combinations in a local and global form. The algorithms are coined Greedy Citation Tiling, Citation Chunking and Longest Common Citation Sequence. The evaluation showed that if these algorithms are combined, common forms of plagiarism can be detected reliably.","PeriodicalId":91385,"journal":{"name":"Proceedings of the ACM Symposium on Document Engineering. ACM Symposium on Document Engineering","volume":"242 1","pages":"249-258"},"PeriodicalIF":0.0,"publicationDate":"2011-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76551406","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Introduction of a dynamic assistance to the creative process of adding dimensions to multistructured documents","authors":"P. Portier, S. Calabretto","doi":"10.1145/2034691.2034728","DOIUrl":"https://doi.org/10.1145/2034691.2034728","url":null,"abstract":"We consider documents as the results of dynamic processes of documentary fragments' associations. We have experienced that once a substantial number of associations exist, users need some synoptic views. One possible way of providing such views relies in the organization of associations into relevant subsets that we call \"dimensions\". Thus, dimensions offer orders along which a documentary archive can be traversed. Many works have proposed efficient ways of presenting combinations of dimensions through graphical user interfaces. Moreover, there are studies on the structural properties of dimensional hypertexts. However, the problem of the origins and evolution of dimensions has not yet received a similar attention. Thus, we propose a mechanism based on a simple structural constraint for helping users in the construction of dimensions: if a cycle appears within a dimension while a user is creating a new dimension by the aggregation of existing ones, he will be encouraged (and assisted in his task) to restructure the dimensions in order to cut the cycle. This is a first step towards a rational control of the emergence and evolution of dimensions.","PeriodicalId":91385,"journal":{"name":"Proceedings of the ACM Symposium on Document Engineering. ACM Symposium on Document Engineering","volume":"48 8 1","pages":"167-170"},"PeriodicalIF":0.0,"publicationDate":"2011-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75226434","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A generic calculus of XML editing deltas","authors":"Jean-Yves Vion-Dury","doi":"10.1145/2034691.2034718","DOIUrl":"https://doi.org/10.1145/2034691.2034718","url":null,"abstract":"In previous work we outlined a mathematical model of the so-called XML editing deltas and proposed a first study of their formal properties. We expected at least three outputs from this theoretical work: a common basis to compare performances of the various algorithms through a structural normalization of deltas, a universal and flexible patch application model and a clearer separation of patch and merge engine performance from delta generation performance. This paper presents the full calculus and reports significant progresses with respect to formalizing a normalization procedure. Such method is key to defining an equivalence relation between editing scripts and eventually designing optimizers compiler back-ends, new patch specification languages and execution models.","PeriodicalId":91385,"journal":{"name":"Proceedings of the ACM Symposium on Document Engineering. ACM Symposium on Document Engineering","volume":"8 1","pages":"113-120"},"PeriodicalIF":0.0,"publicationDate":"2011-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87306624","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}