Algorithmic methods to explore the automation of the appraisal of structured and unstructured digital data

IF 0.8 Q3 INFORMATION SCIENCE & LIBRARY SCIENCE
Basma Makhlouf Shabou, Julien Tièche, J. Knafou, A. Gaudinat
{"title":"Algorithmic methods to explore the automation of the appraisal of structured and unstructured digital data","authors":"Basma Makhlouf Shabou, Julien Tièche, J. Knafou, A. Gaudinat","doi":"10.1108/rmj-09-2019-0049","DOIUrl":null,"url":null,"abstract":"\nPurpose\nThis paper aims to describe an interdisciplinary and innovative research conducted in Switzerland, at the Geneva School of Business Administration HES-SO and supported by the State Archives of Neuchâtel (Office des archives de l'État de Neuchâtel, OAEN). The problem to be addressed is one of the most classical ones: how to extract and discriminate relevant data in a huge amount of diversified and complex data record formats and contents. The goal of this study is to provide a framework and a proof of concept for a software that helps taking defensible decisions on the retention and disposal of records and data proposed to the OAEN. For this purpose, the authors designed two axes: the archival axis, to propose archival metrics for the appraisal of structured and unstructured data, and the data mining axis to propose algorithmic methods as complementary or/and additional metrics for the appraisal process.\n\n\nDesign/methodology/approach\nBased on two axes, this exploratory study designs and tests the feasibility of archival metrics that are paired to data mining metrics, to advance, as much as possible, the digital appraisal process in a systematic or even automatic way. Under Axis 1, the authors have initiated three steps: first, the design of a conceptual framework to records data appraisal with a detailed three-dimensional approach (trustworthiness, exploitability, representativeness). In addition, the authors defined the main principles and postulates to guide the operationalization of the conceptual dimensions. Second, the operationalization proposed metrics expressed in terms of variables supported by a quantitative method for their measurement and scoring. Third, the authors shared this conceptual framework proposing the dimensions and operationalized variables (metrics) with experienced professionals to validate them. The expert’s feedback finally gave the authors an idea on: the relevance and the feasibility of these metrics. Those two aspects may demonstrate the acceptability of such method in a real-life archival practice. In parallel, Axis 2 proposes functionalities to cover not only macro analysis for data but also the algorithmic methods to enable the computation of digital archival and data mining metrics. Based on that, three use cases were proposed to imagine plausible and illustrative scenarios for the application of such a solution.\n\n\nFindings\nThe main results demonstrate the feasibility of measuring the value of data and records with a reproducible method. More specifically, for Axis 1, the authors applied the metrics in a flexible and modular way. The authors defined also the main principles needed to enable computational scoring method. The results obtained through the expert’s consultation on the relevance of 42 metrics indicate an acceptance rate above 80%. In addition, the results show that 60% of all metrics can be automated. Regarding Axis 2, 33 functionalities were developed and proposed under six main types: macro analysis, microanalysis, statistics, retrieval, administration and, finally, the decision modeling and machine learning. The relevance of metrics and functionalities is based on the theoretical validity and computational character of their method. These results are largely satisfactory and promising.\n\n\nOriginality/value\nThis study offers a valuable aid to improve the validity and performance of archival appraisal processes and decision-making. Transferability and applicability of these archival and data mining metrics could be considered for other types of data. An adaptation of this method and its metrics could be tested on research data, medical data or banking data.\n","PeriodicalId":20923,"journal":{"name":"Records Management Journal","volume":" ","pages":""},"PeriodicalIF":0.8000,"publicationDate":"2020-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1108/rmj-09-2019-0049","citationCount":"11","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Records Management Journal","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1108/rmj-09-2019-0049","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"INFORMATION SCIENCE & LIBRARY SCIENCE","Score":null,"Total":0}
引用次数: 11

Abstract

Purpose This paper aims to describe an interdisciplinary and innovative research conducted in Switzerland, at the Geneva School of Business Administration HES-SO and supported by the State Archives of Neuchâtel (Office des archives de l'État de Neuchâtel, OAEN). The problem to be addressed is one of the most classical ones: how to extract and discriminate relevant data in a huge amount of diversified and complex data record formats and contents. The goal of this study is to provide a framework and a proof of concept for a software that helps taking defensible decisions on the retention and disposal of records and data proposed to the OAEN. For this purpose, the authors designed two axes: the archival axis, to propose archival metrics for the appraisal of structured and unstructured data, and the data mining axis to propose algorithmic methods as complementary or/and additional metrics for the appraisal process. Design/methodology/approach Based on two axes, this exploratory study designs and tests the feasibility of archival metrics that are paired to data mining metrics, to advance, as much as possible, the digital appraisal process in a systematic or even automatic way. Under Axis 1, the authors have initiated three steps: first, the design of a conceptual framework to records data appraisal with a detailed three-dimensional approach (trustworthiness, exploitability, representativeness). In addition, the authors defined the main principles and postulates to guide the operationalization of the conceptual dimensions. Second, the operationalization proposed metrics expressed in terms of variables supported by a quantitative method for their measurement and scoring. Third, the authors shared this conceptual framework proposing the dimensions and operationalized variables (metrics) with experienced professionals to validate them. The expert’s feedback finally gave the authors an idea on: the relevance and the feasibility of these metrics. Those two aspects may demonstrate the acceptability of such method in a real-life archival practice. In parallel, Axis 2 proposes functionalities to cover not only macro analysis for data but also the algorithmic methods to enable the computation of digital archival and data mining metrics. Based on that, three use cases were proposed to imagine plausible and illustrative scenarios for the application of such a solution. Findings The main results demonstrate the feasibility of measuring the value of data and records with a reproducible method. More specifically, for Axis 1, the authors applied the metrics in a flexible and modular way. The authors defined also the main principles needed to enable computational scoring method. The results obtained through the expert’s consultation on the relevance of 42 metrics indicate an acceptance rate above 80%. In addition, the results show that 60% of all metrics can be automated. Regarding Axis 2, 33 functionalities were developed and proposed under six main types: macro analysis, microanalysis, statistics, retrieval, administration and, finally, the decision modeling and machine learning. The relevance of metrics and functionalities is based on the theoretical validity and computational character of their method. These results are largely satisfactory and promising. Originality/value This study offers a valuable aid to improve the validity and performance of archival appraisal processes and decision-making. Transferability and applicability of these archival and data mining metrics could be considered for other types of data. An adaptation of this method and its metrics could be tested on research data, medical data or banking data.
探索结构化和非结构化数字数据评估自动化的算法方法
目的本文旨在描述在瑞士日内瓦工商管理学院HES-SO进行的一项跨学科创新研究,该研究得到了诺伊沙特尔国家档案馆(OAEN诺伊沙泰尔档案馆)的支持。需要解决的问题是最经典的问题之一:如何在大量多样化和复杂的数据记录格式和内容中提取和区分相关数据。本研究的目标是为一个软件提供一个框架和概念验证,该软件有助于就向OAEN提出的记录和数据的保留和处置做出合理的决定。为此,作者设计了两个轴:档案轴,提出用于评估结构化和非结构化数据的档案度量;数据挖掘轴,提出算法方法,作为评估过程的补充或/和附加度量。设计/方法论/方法基于两个轴,这项探索性研究设计并测试了与数据挖掘指标配对的档案指标的可行性,以尽可能系统甚至自动地推进数字评估过程。在Axis 1下,作者启动了三个步骤:首先,设计一个概念框架,以详细的三维方法(可信度、可利用性、代表性)记录数据评估。此外,作者还定义了指导概念维度操作的主要原则和假设。其次,操作化提出了用变量表示的指标,并用定量方法对其进行测量和评分。第三,作者与经验丰富的专业人士分享了这一概念框架,提出了维度和可操作变量(指标),以验证它们。专家的反馈最终给了作者一个想法:这些指标的相关性和可行性。这两个方面可以证明这种方法在现实档案实践中的可接受性。与此同时,Axis 2提出的功能不仅涵盖数据的宏观分析,还涵盖算法方法,以实现数字档案和数据挖掘指标的计算。在此基础上,提出了三个用例来想象应用这种解决方案的合理和说明性场景。主要结果证明了用可重复的方法测量数据和记录价值的可行性。更具体地说,对于Axis 1,作者以灵活和模块化的方式应用了度量。作者还定义了实现计算评分方法所需的主要原则。通过专家对42个指标相关性的咨询获得的结果表明,接受率超过80%。此外,结果表明,60%的指标可以实现自动化。关于Axis 2,在六种主要类型下开发和提出了33种功能:宏观分析、微观分析、统计、检索、管理,最后是决策建模和机器学习。度量和函数的相关性基于其方法的理论有效性和计算特性。这些结果在很大程度上是令人满意和有希望的。原创性/价值本研究为提高档案评估过程和决策的有效性和绩效提供了宝贵的帮助。对于其他类型的数据,可以考虑这些档案和数据挖掘指标的可传输性和适用性。这种方法及其指标的适应性可以在研究数据、医疗数据或银行数据上进行测试。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Records Management Journal
Records Management Journal INFORMATION SCIENCE & LIBRARY SCIENCE-
CiteScore
3.50
自引率
7.10%
发文量
11
期刊介绍: ■Electronic records management ■Effect of government policies on record management ■Strategic developments in both the public and private sectors ■Systems design and implementation ■Models for records management ■Best practice, standards and guidelines ■Risk management and business continuity ■Performance measurement ■Continuing professional development ■Consortia and co-operation ■Marketing ■Preservation ■Legal and ethical issues
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信