{"title":"Quality assurance in document conversion: a hit?","authors":"Christoph Becker","doi":"10.1145/2064058.2064061","DOIUrl":"https://doi.org/10.1145/2064058.2064061","url":null,"abstract":"This paper discusses challenges and opportunities of using human computation and crowdsourcing for the task of quality assurance in document conversion processes and proposes a hybrid computer-human system approach. Digital content is never presented to a user directly, but always needs an intermediate presentation that is generated through an algorithm (such as a document viewer) that interprets data. When converting data such as documents, the question of authenticity of the derived representation of these documents requires a comparison of the intellectually perceivable outcome of different interpretations. Such Quality Assurance is a key obstacle to scalability in document conversion processes. Currently, there is a severe lack of scalable techniques. We argue that this comparison is a Human Intelligence Task (HIT). To investigate the feasibility, potential pitfalls and key challenges in leveraging the wisdom of the crowd for this task, we have conducted several pilot experiments. We describe and discuss these experiments, and identify a number of key challenges that need to be addressed. In particular, we discuss the questions of motivation; task semantics; presentation and interaction design; and quality control. Finally, we outline a proposal to address these challenges in a hybrid computer-human system.","PeriodicalId":258166,"journal":{"name":"Workshop on Research Advances in Large Digital Book Repositories","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134588521","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Evidence finding using a collection of books","authors":"Marc-Allen Cartright, H. Feild, James Allan","doi":"10.1145/2064058.2064063","DOIUrl":"https://doi.org/10.1145/2064058.2064063","url":null,"abstract":"This paper introduces the task of Evidence Finding, a novel information retrieval task that uses books - a traditionally more trust-worthy source of information - to help provide evidence to support a statement. What makes this evidence-finding task different from other tasks, such as the related INEX Prove It task, is that both the statement for which evidence is sought and its context are given to the search system. A practical application of this system is to provide supporting or refuting evidence from books for a statement made within a Wikipedia article, using the entire article as contextual support for query generation. We provide details of this task as well as an analysis of a number of retrieval methods that address this task.","PeriodicalId":258166,"journal":{"name":"Workshop on Research Advances in Large Digital Book Repositories","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121567306","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Young-Min Kim, P. Bellot, Elodie Faath, Marin Dacos
{"title":"Automatic annotation of bibliographical references in digital humanities books, articles and blogs","authors":"Young-Min Kim, P. Bellot, Elodie Faath, Marin Dacos","doi":"10.1145/2064058.2064068","DOIUrl":"https://doi.org/10.1145/2064058.2064068","url":null,"abstract":"In this paper, we deal with the problem of extracting and processing useful information from bibliographic references in Digital Humanities (DH) data. A machine learning technique for sequential data analysis, Conditional Random Field is applied to a corpus extracted from OpenEdition site, a web platform for journals and book collections in the humanities and social sciences. We present our ongoing project with this purpose that includes the construction of a proper corpus and a efficient CRF model on this as a preliminary. This project is supported by Google Grant for Digital Humanities. A number of experiments are conducted to find one of the best settings for a CRF model on the corpus, and we verify them both in an automatic and manual way of evaluation.","PeriodicalId":258166,"journal":{"name":"Workshop on Research Advances in Large Digital Book Repositories","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129187466","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The impact of author ranking in a library catalogue","authors":"J. Kamps","doi":"10.1145/2064058.2064067","DOIUrl":"https://doi.org/10.1145/2064058.2064067","url":null,"abstract":"The field of information retrieval has witnessed over 50 years of research on retrieval methods for metadata descriptions and controlled indexing languages, the prototypical example being the library catalogue. It seems only natural to resort to additional data for improving book retrieval, such as the text of the book in whole or in part (table of contents, abstract) or contributed social data acquired through crowdsourcing social cataloguing sites like LibraryThing. Without denying the potential value of such additional data, we want to challenge the underlying assumption that applying novel retrieval methods to traditional book descriptions cannot improve book retrieval. Specifically, this paper investigates the effectiveness of author rankings in a library catalogue. We show that a standard retrieval model results in a book ranking that meets and exceeds the effectiveness of catalogue systems. We show that using expert finding methods we also can obtain effective author rankings that complement the traditional book rankings. Moreover, ranking books on author scores leads to substantial and significant improvements over the original book rankings. If we base our book ranking on the combination of the author scores and the book scores we see no further improvements. Hence our results clearly demonstrate the importance of author ranking for retrieving library catalogue records: authors capture an important aspect of relevance and one that is not obvious to those unfamiliar with specific area of interest.","PeriodicalId":258166,"journal":{"name":"Workshop on Research Advances in Large Digital Book Repositories","volume":"384 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126731833","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Tools for whom: readers, fans, or authors?","authors":"Tim Regan","doi":"10.1145/2064058.2064060","DOIUrl":"https://doi.org/10.1145/2064058.2064060","url":null,"abstract":"Digitization, digital books, the web, social media, etc. all have the potential to change the way we read and write and indeed styles of reading and writing have changed. But we can also look at the ways in which such tools and possibilities have changed the way we write, produce, and read paper books. This position paper looks at the author's experiences developing a book text visualization tool for analyzing the works in Philip Pullman's children's literature trilogy, His Dark Materials, and how the prototype has prompted thinking about the way tools for different reader audiences should differ.","PeriodicalId":258166,"journal":{"name":"Workshop on Research Advances in Large Digital Book Repositories","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134161655","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Changes in reading research proposition: some psychological aspects of reading 2.0","authors":"Adam Sofronijević","doi":"10.1145/2064058.2064073","DOIUrl":"https://doi.org/10.1145/2064058.2064073","url":null,"abstract":"The reading process as a paradigm for the intimate experience and individual cogitation of the content is changing, mostly because of technological and social innovations known as the Web 2.0. Emerging new quality in reading is described via Reading 2.0 concept. This concept is in turn contrasted with the concept of a Solitary reader that focuses on some aspects of the reading process as perceived in the past. In this context research proposal of some psychological aspects of Reading 2.0 is presented. Building on available research in text structure importance for cognitive processes during comprehension of scientific texts, the role of interactivity and collaboration in changing readers' misconceptions are proposed as important research areas. Various aspects of collaboration and interactivity are compared in regards to two different text structures in order to develop plausible scenarios for testing changes in readers' erroneous prior knowledge. Further research in this area is proposed and benefits that might arise from better understanding of discussed phenomena are described. Some conclusions on research directions and possibilities are presented. Possibilities for a librarian's role in the interpretive communities embracing interactivity and collaboration in reading are discussed in order to enhance research proposal and provide an input for the future of this profession.","PeriodicalId":258166,"journal":{"name":"Workshop on Research Advances in Large Digital Book Repositories","volume":"74 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130592845","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"How to carry over historic books into social networks","authors":"Heimo Müller, H. Maurer","doi":"10.1145/2064058.2064065","DOIUrl":"https://doi.org/10.1145/2064058.2064065","url":null,"abstract":"This paper describes how to make use of e-books that look like printed books in a knowledge network. After an overview of digitalization efforts and current digital library initiatives we introduce quality measures for the digitalization process. After digitalization an Interactive Internet Book (IIB) has to offer a kind of digital binding, annotation efforts and sophisticated ways for user interaction. We claim that the quality and the enhancements of an Interactive Internet Book go far beyond what is traditionally assumed: it is not enough to scan books. The scans have to be of high quality, allow good OCR to permit full text searches; books need not only be \"packaged\" but also need meta-data and functionalities that one can expect from a computer supported medium that go far beyond what is possible with traditional printed books. Those factors are critical for the use of e-books in social media environments, yet this is often still overlooked. Finally, we describe a working prototype and demonstrate the advantages obtained with a use case.","PeriodicalId":258166,"journal":{"name":"Workshop on Research Advances in Large Digital Book Repositories","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132926358","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"New trends for reading scientific documents","authors":"Hélène de Ribaupierre, G. Falquet","doi":"10.1145/2064058.2064064","DOIUrl":"https://doi.org/10.1145/2064058.2064064","url":null,"abstract":"When scientists or engineers are looking for information in document collections, or on the web, they generally have a precise objective in mind. Instead of looking for documents \"about a topic T\", they rather try to answer specific needs such as finding the definition of a concept, finding results for a particular problem, checking whether an idea has already been tested, or comparing the scientific conclusions of two articles. This paper presents an indexing model which includes the decomposition of documents into fragments that will correspond to discourse elements (definition, hypothesis, method, result, etc.). The division of documents into fragments should allow scientists to retrieve more relevant information and to make queries more precise. Each type of discourse element will be modeled by defining specific characteristics. The representation model will allow operations that merge or combine retrieved documents or fragments. We will propose an interface that allows strategic reading such as parallel reading of fragments.","PeriodicalId":258166,"journal":{"name":"Workshop on Research Advances in Large Digital Book Repositories","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133914234","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Towards an engaging e-reading experience","authors":"Luca Colombo, M. Landoni","doi":"10.1145/2064058.2064074","DOIUrl":"https://doi.org/10.1145/2064058.2064074","url":null,"abstract":"Electronic books are gaining in popularity, but their potential is not fully exploited yet. This is especially true for children eBooks. HEBE (Highly Engaging eBook Experiences) project, aims to actively involve children in the design of a new concept of electronic book that enables the reader to \"get lost\" in it, namely to enable a new, highly engaging, reading experience.\u0000 From literature review emerged that no (or very few) research has been held about immersive reading in a digital environment, while there are many studies about user engagement with technology.\u0000 In this paper we propose a theoretical framework, derived from Csikszentmihalyi's flow theory, to be used for assessing the level of engagement of digital reading experiences.\u0000 Early outcomes we obtained through preliminary qualitative inquiry and a research plan describing future directions of the work are then presented.","PeriodicalId":258166,"journal":{"name":"Workshop on Research Advances in Large Digital Book Repositories","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129790013","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"HCI design principles for ereaders","authors":"Jennifer Pearson, G. Buchanan, H. Thimbleby","doi":"10.1145/1871854.1871860","DOIUrl":"https://doi.org/10.1145/1871854.1871860","url":null,"abstract":"As interactive digital documents are becoming more and more commonplace, we find ourselves searching for new ways to make good use of them. The fast delivery and large storage capacity that digital devices offer, make reading from bulky physical books seem obsolete, even nonsensical. EReaders, the latest craze in digital reading, follows from the introduction of eInk and promises paper-like reading capabilities with the added digital benefits.. But is the excitement justified? Can you `curl up' with an eReader in the same way as you can a physical book, or is the design of eReading devices hindering this process?.\u0000 As of yet, no one has taken a scientific view of current eReader technology from the systematic standpoint of basic HCI principles. This paper discusses guidelines for good eReader design and illustrates them with examples of shortcomings of some of the more popular eReader devices on the market today.","PeriodicalId":258166,"journal":{"name":"Workshop on Research Advances in Large Digital Book Repositories","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125194914","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}