Basma Makhlouf Shabou, Julien Tièche, J. Knafou, A. Gaudinat
{"title":"Algorithmic methods to explore the automation of the appraisal of structured and unstructured digital data","authors":"Basma Makhlouf Shabou, Julien Tièche, J. Knafou, A. Gaudinat","doi":"10.1108/rmj-09-2019-0049","DOIUrl":"https://doi.org/10.1108/rmj-09-2019-0049","url":null,"abstract":"\u0000Purpose\u0000This paper aims to describe an interdisciplinary and innovative research conducted in Switzerland, at the Geneva School of Business Administration HES-SO and supported by the State Archives of Neuchâtel (Office des archives de l'État de Neuchâtel, OAEN). The problem to be addressed is one of the most classical ones: how to extract and discriminate relevant data in a huge amount of diversified and complex data record formats and contents. The goal of this study is to provide a framework and a proof of concept for a software that helps taking defensible decisions on the retention and disposal of records and data proposed to the OAEN. For this purpose, the authors designed two axes: the archival axis, to propose archival metrics for the appraisal of structured and unstructured data, and the data mining axis to propose algorithmic methods as complementary or/and additional metrics for the appraisal process.\u0000\u0000\u0000Design/methodology/approach\u0000Based on two axes, this exploratory study designs and tests the feasibility of archival metrics that are paired to data mining metrics, to advance, as much as possible, the digital appraisal process in a systematic or even automatic way. Under Axis 1, the authors have initiated three steps: first, the design of a conceptual framework to records data appraisal with a detailed three-dimensional approach (trustworthiness, exploitability, representativeness). In addition, the authors defined the main principles and postulates to guide the operationalization of the conceptual dimensions. Second, the operationalization proposed metrics expressed in terms of variables supported by a quantitative method for their measurement and scoring. Third, the authors shared this conceptual framework proposing the dimensions and operationalized variables (metrics) with experienced professionals to validate them. The expert’s feedback finally gave the authors an idea on: the relevance and the feasibility of these metrics. Those two aspects may demonstrate the acceptability of such method in a real-life archival practice. In parallel, Axis 2 proposes functionalities to cover not only macro analysis for data but also the algorithmic methods to enable the computation of digital archival and data mining metrics. Based on that, three use cases were proposed to imagine plausible and illustrative scenarios for the application of such a solution.\u0000\u0000\u0000Findings\u0000The main results demonstrate the feasibility of measuring the value of data and records with a reproducible method. More specifically, for Axis 1, the authors applied the metrics in a flexible and modular way. The authors defined also the main principles needed to enable computational scoring method. The results obtained through the expert’s consultation on the relevance of 42 metrics indicate an acceptance rate above 80%. In addition, the results show that 60% of all metrics can be automated. Regarding Axis 2, 33 functionalities were developed and proposed under six main types: macro","PeriodicalId":20923,"journal":{"name":"Records Management Journal","volume":" ","pages":""},"PeriodicalIF":1.4,"publicationDate":"2020-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1108/rmj-09-2019-0049","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45859667","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"From tree to network: reordering an archival catalogue","authors":"M. Bell","doi":"10.1108/rmj-09-2019-0051","DOIUrl":"https://doi.org/10.1108/rmj-09-2019-0051","url":null,"abstract":"This paper presents the results of a number of experiments performed at the National Archives, all related to the theme of linking collections of records. This paper aims to present a methodology for translating a hierarchy into a network structure using a number of methods for deriving statistical distributions from records metadata or content and then aggregating them. Simple similarity metrics are then used to compare and link, collections of records with similar characteristics.,The approach taken is to consider a record at any level of the catalogue hierarchy as a summary of its children. A distribution for each child record is created (e.g. word counts and date distribution) and averaged/summed with the other children. This process is repeated up the hierarchy to find a representative distribution of the whole series. By doing this the authors can compare record series together and create a similarity network.,The summarising method was found to be applicable not only to a hierarchical catalogue but also to web archive data, which is by nature stored in a hierarchical folder structure. The case studies raised many questions worthy of further exploration such as how to present distributions and uncertainty to users and how to compare methods, which produce similarity scores on different scales.,Although the techniques used to create distributions such as topic modelling and word frequency counts, are not new and have been used to compare documents, to the best of the knowledge applying the averaging approach to the archival catalogue is new. This provides an interesting method for zooming in and out of a collection, creating networks at different levels of granularity according to user needs.","PeriodicalId":20923,"journal":{"name":"Records Management Journal","volume":"30 1","pages":"379-394"},"PeriodicalIF":1.4,"publicationDate":"2020-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1108/rmj-09-2019-0051","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45716054","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Ethics by design: a code of ethics for the digital age","authors":"Bernice Ibiricu, Marja Leena van der Made","doi":"10.1108/rmj-08-2019-0044","DOIUrl":"https://doi.org/10.1108/rmj-08-2019-0044","url":null,"abstract":"\u0000Purpose\u0000This paper aims to provide a framework for a code of ethics related to digital and leading edge technologies.\u0000\u0000\u0000Design/methodology/approach\u0000The proposed ethical framework is anchored in data protection legislation, and results from a combination of case studies, observed user behaviour and decision-making processes.\u0000\u0000\u0000Findings\u0000A concise and user-friendly ethical framework ensures the embedded code of conduct is respected and observed by all employees concerned.\u0000\u0000\u0000Originality/value\u0000An ethical framework aligned with EU data protection legislation is required.\u0000","PeriodicalId":20923,"journal":{"name":"Records Management Journal","volume":" ","pages":""},"PeriodicalIF":1.4,"publicationDate":"2020-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1108/rmj-08-2019-0044","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45789342","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Archiving experience: an exploration of the challenges of preserving virtual reality","authors":"Zack Lischer-Katz","doi":"10.1108/rmj-09-2019-0054","DOIUrl":"https://doi.org/10.1108/rmj-09-2019-0054","url":null,"abstract":"This paper aims to explore the opportunities and challenges that immersive virtual reality (VR) technologies pose for archival theory and practice.,This conceptual paper reviews research on VR adoption in information institutions and the preservation challenges of VR to identify ways in which VR has the potential to disrupt existing archival theory and practice.,Existing archival approaches are found to be disrupted by the multi-layered structural characteristics of VR, the part–whole relationships between the technological elements of VR environments and the three-dimensional content they contain and the immersive, experiential nature of VR experiences. This paper argues that drawing on perspectives from phenomenology and digital materiality is helpful for addressing the preservation challenges of VR.,The findings extend conceptualizations of preservation by identifying gaps in existing preservation approaches to VR and stressing the importance of “experience” as a central element of archival practice and by emphasizing the embodied dimensions of interpreting archival records and the multiple scales of materiality that archival researchers and practitioners should consider to preserve VR.,These findings provide guidance for digital curators and preservationists by outlining the current thinking on VR preservation and the impact of VR on digital preservation strategies.,This paper gives new insight into VR as an emerging area of concern to digital curation and preservation and expands archival thinking with new conceptualizations that disrupt existing paradigms.","PeriodicalId":20923,"journal":{"name":"Records Management Journal","volume":"30 1","pages":"253-274"},"PeriodicalIF":1.4,"publicationDate":"2020-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1108/rmj-09-2019-0054","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43174158","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Natural language processing and machine learning as practical toolsets for archival processing","authors":"T. Hutchinson","doi":"10.1108/rmj-09-2019-0055","DOIUrl":"https://doi.org/10.1108/rmj-09-2019-0055","url":null,"abstract":"PurposeThis study aims to provide an overview of recent efforts relating to natural language processing (NLP) and machine learning applied to archival processing, particularly appraisal and sensitivity reviews, and propose functional requirements and workflow considerations for transitioning from experimental to operational use of these tools.Design/methodology/approachThe paper has four main sections. 1) A short overview of the NLP and machine learning concepts referenced in the paper. 2) A review of the literature reporting on NLP and machine learning applied to archival processes. 3) An overview and commentary on key existing and developing tools that use NLP or machine learning techniques for archives. 4) This review and analysis will inform a discussion of functional requirements and workflow considerations for NLP and machine learning tools for archival processing.FindingsApplications for processing e-mail have received the most attention so far, although most initiatives have been experimental or project based. It now seems feasible to branch out to develop more generalized tools for born-digital, unstructured records. Effective NLP and machine learning tools for archival processing should be usable, interoperable, flexible, iterative and configurable.Originality/valueMost implementations of NLP for archives have been experimental or project based. The main exception that has moved into production is ePADD, which includes robust NLP features through its named entity recognition module. This paper takes a broader view, assessing the prospects and possible directions for integrating NLP tools and techniques into archival workflows.","PeriodicalId":20923,"journal":{"name":"Records Management Journal","volume":" ","pages":""},"PeriodicalIF":1.4,"publicationDate":"2020-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1108/rmj-09-2019-0055","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46213476","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sigal Arie Erez, Tobias Blanke, Michael Bryant, K. Rodríguez, R. Speck, Veerle Vanden Daelen
{"title":"Record linking in the EHRI portal","authors":"Sigal Arie Erez, Tobias Blanke, Michael Bryant, K. Rodríguez, R. Speck, Veerle Vanden Daelen","doi":"10.1108/rmj-08-2019-0045","DOIUrl":"https://doi.org/10.1108/rmj-08-2019-0045","url":null,"abstract":"\u0000Purpose\u0000This paper aims to describe the European Holocaust Research Infrastructure (EHRI) project's ongoing efforts to virtually integrate trans-national archival sources via the reconstruction of collection provenance as it relates to copy collections (material copied from one archive to another) and the co-referencing of subject and authority terms across material held by distinct institutions.\u0000\u0000\u0000Design/methodology/approach\u0000This paper is a case study of approximately 6,000 words length. The authors describe the scope of the problem of archival fragmentation from both cultural and technical perspectives, with particular focus on Holocaust-related material, and describe, with graph-based visualisations, two ways in which EHRI seeks to better integrate information about fragmented material.\u0000\u0000\u0000Findings\u0000As a case study, the principal contributions of this paper include reports on our experience with extracting provenance-based connections between archival descriptions from encoded finding aids and the challenges of co-referencing access points in the absence of domain-specific controlled vocabularies.\u0000\u0000\u0000Originality/value\u0000Record linking in general is an important technique in computational approaches to humanities research and one that has rightly received significant attention from scholars. In the context of historical archives, however, the material itself is in most cases not digitised, meaning that computational attempts at linking must rely on finding aids which constitute much fewer rich data sources. The EHRI project’s work in this area is therefore quite pioneering and has implications for archival integration on a larger scale, where the disruptive potential of Linked Open Data is most obvious.\u0000","PeriodicalId":20923,"journal":{"name":"Records Management Journal","volume":" ","pages":""},"PeriodicalIF":1.4,"publicationDate":"2020-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1108/rmj-08-2019-0045","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45157215","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Publishing and using record-keeping structural information in a blockchain","authors":"Thomas Sødring, Petter Reinholdtsen, Svein Ølnes","doi":"10.1108/rmj-09-2019-0056","DOIUrl":"https://doi.org/10.1108/rmj-09-2019-0056","url":null,"abstract":"This paper aims to examine the role blockchain can play for record-keeping by exploring what information from a record-keeping system it is possible to publish to a blockchain. A credible approach is presented, followed by a discussion on both benefits and limitations.,The approach is a combination of theorised possibilities verified with practical software implementation. The basis for the work is relevant record-keeping and blockchain literature.,The results show that it is possible to separate the formal record keeping structure from content, and this opens for new possibilities when integrating record keeping and block chain technologies. However, the approach does come with some limitations.,The approach is beneficial where there is a record-keeping standard that has a clearly defined metadata model, and that also makes use of globally unique identifiers. Privacy legislation, for example, GDPR, may limit the scope of an implementation of the approach.,The originality lies in presenting an approach whereby a record-keeping standard is analysed, separating structural and content information to publish structural information to a blockchain.","PeriodicalId":20923,"journal":{"name":"Records Management Journal","volume":"30 1","pages":"325-343"},"PeriodicalIF":1.4,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1108/rmj-09-2019-0056","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47753491","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Caught in the middle?","authors":"V. Lemieux, Chris Rowell, M. Seidel, C. Woo","doi":"10.1108/rmj-09-2019-0048","DOIUrl":"https://doi.org/10.1108/rmj-09-2019-0048","url":null,"abstract":"\u0000Purpose\u0000Distributed trust technologies, such as blockchain, propose to permit peer-to-peer transactions without trusted third parties. Yet not all implementations of such technologies fully decentralize. Information professionals make strategic choices about the level of decentralization when implementing such solutions, and many organizations are taking a hybrid (i.e. partially decentralized) approach to the implementation of distributed trust technologies. This paper conjectures that while hybrid approaches may resolve some challenges of decentralizing information governance, they also introduce others. To better understand these challenges, this paper aims first to elaborate a framework that conceptualizes a centralized–decentralized information governance continuum along three distinct dimensions: custody, ownership and right to access data. This paper then applies this framework to two illustrative blockchain case studies – a pilot Brazilian land transfer recording solution and a Canadian health data consent sharing project – to exemplify how the current transition state of blockchain pilots straddles both the old (centralized) and new (decentralized) worlds. Finally, this paper outlines the novel challenges that hybrid approaches introduce for information governance and what information professionals should do to navigate this thorny transition period. Counterintuitively, it may be much better for information professionals to embrace decentralization when implementing distributed trust technologies, as hybrid models could offer the worst of both the centralized and future decentralized worlds when consideration is given to the balance between information governance risks and new strategic business opportunities.\u0000\u0000\u0000Design/methodology/approach\u0000This paper illustrates how blockchain is transforming organizations and societies by highlighting new strategic information governance challenges using our original analytic framework in two detailed blockchain case studies – a pilot solution in Brazil to record land transfers (Flores et al., 2018) and another in Canada to handle health data sharing consent (Hofman et al., 2018). The two case studies represent research output of the first phase of an ongoing multidisciplinary research project focused on gaining an understanding of how blockchain technology generates organizational, societal and data transformations and challenges. The analytic framework was developed inductively from a thematic synthesis of the findings of the case studies conducted under the auspices of this research project. Each case discussed in detail in this paper was chosen from among the project's case studies, as it represents a desire to move away from the old centralized world of information governance to a new decentralized one. However, each case study also represents and embodies a transition state between the old and new worlds and highlights many of the associated strategic information governance challenges.\u0000\u0000\u0000Findings\u0000","PeriodicalId":20923,"journal":{"name":"Records Management Journal","volume":" ","pages":""},"PeriodicalIF":1.4,"publicationDate":"2020-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1108/rmj-09-2019-0048","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47307233","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Working in contexts for which transparency is important","authors":"J. Bunn","doi":"10.1108/rmj-08-2019-0038","DOIUrl":"https://doi.org/10.1108/rmj-08-2019-0038","url":null,"abstract":"This paper aims to introduce the topic of explainable artificial intelligence (XAI) and reports on the outcomes of an interdisciplinary workshop exploring it. It reflects on XAI through the frame and concerns of the recordkeeping profession.,This paper takes a reflective approach. The origins of XAI are outlined as a way of exploring how it can be viewed and how it is currently taking shape. The workshop and its outcomes are briefly described and reflections on the process of investigating and taking part in conversations about XAI are offered.,The article reinforces the value of undertaking interdisciplinary and exploratory conversations with others. It offers new perspectives on XAI and suggests ways in which recordkeeping can productively engage with it, as both a disruptive force on its thinking and a set of newly emerging record forms to be created and managed.,The value of this paper comes from the way in which the introduction it provides will allow recordkeepers to gain a sense of what XAI is and the different ways in which they are both already engaging and can continue to engage with it.","PeriodicalId":20923,"journal":{"name":"Records Management Journal","volume":"30 1","pages":"143-153"},"PeriodicalIF":1.4,"publicationDate":"2020-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1108/rmj-08-2019-0038","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41570431","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A record-keeping approach to managing IoT-data for government agencies","authors":"Thomas Sødring, Petter Reinholdtsen, David Massey","doi":"10.1108/rmj-09-2019-0050","DOIUrl":"https://doi.org/10.1108/rmj-09-2019-0050","url":null,"abstract":"\u0000Purpose\u0000Particular attention to the issue of information management will be required to meet the expected growth in IoT-devices and the data they generate. As government agencies start collecting and using such information, they must also deal with the issue of privacy, to comply with laws and regulations. The approach discussed here shows that record-keeping principles may form part of a solution to the issue of managing IoT-data for government agencies.\u0000\u0000\u0000Design/methodology/approach\u0000This study uses the generally accepted record-keeping principles as a basis for a high-level discussion on how IoT-data can be managed. This is followed by a presentation and discussion on how the Norwegian record-keeping standard, Noark, can be extended to highlight practical issues.\u0000\u0000\u0000Findings\u0000Record keeping has principles that are relevant to the management of IoT-data. Further an implementation of the chosen use-cases is possible based on an existing record keeping standard. Record keeping is one of many information science approaches that can manage IoT-data.\u0000\u0000\u0000Research limitations/implications\u0000The main limitations are that the discussion cannot cover all types of IoT-devices, nor can all issues be captured with a limited choice of examples. The results should be seen within the context of the types of devices discussed and limited to the chosen use-cases. However, the level of abstraction used means the results may be applicable to similar scenarios.\u0000\u0000\u0000Originality/value\u0000The approach shows that record-keeping principles may be used as an approach to manage IoT-data. This discussion is useful when compared with other information science approaches, e.g. big-data or semantic Web approaches. The practicalities of a record-keeping approach are also discussed and relevant strengths and weaknesses are showed.\u0000","PeriodicalId":20923,"journal":{"name":"Records Management Journal","volume":" ","pages":""},"PeriodicalIF":1.4,"publicationDate":"2020-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1108/rmj-09-2019-0050","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49466278","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}