Machine learning methods for isolating indigenous language catalog descriptions

IF 4.7 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Yi Liu, Carrie Heitman, Leen-Kiat Soh, Peter Whiteley
{"title":"Machine learning methods for isolating indigenous language catalog descriptions","authors":"Yi Liu,&nbsp;Carrie Heitman,&nbsp;Leen-Kiat Soh,&nbsp;Peter Whiteley","doi":"10.1007/s00146-025-02223-y","DOIUrl":null,"url":null,"abstract":"<div><p>Museum collection databases contain echoes of encounter between colonial collectors (broadly defined) and Indigenous people from around the world. The moment of acquisition—when an item passed out of a community and into the hands of the collector—often included multilingual acts of translation. An artist may have shared the Indigenous name of the object, or the terms associated with its origin and use. Late nineteenth and twemtieth century museum registrars would in turn transcribe this information from field logs into museum catalogs. Over time, these catalog entries were transformed into digital records within collections managements systems (e.g., EMu, PastPerfect, etc.). As a result of this 150-year process, today’s museum collection databases are riddled with Indigenous words and descriptions, scattered across various metadata fields. They may include Native place-names, family names or vocabulary terms that, when translated, extend far beyond the categories ascribed by museum collection managers. These instances of Indigenous description may also serve as a crucial bridge for reconnecting source communities with items of particular interest to their cultural heritage and linguistic preservation efforts. Aiming to enhance the accessibility of Indigenous languages contained in the metadata of cultural heritage collections, this paper explores applications of machine learning methodologies to identify Indigenous terms present in museum catalogs. Specifically, we discuss methods that incorporate the Google Cloud Language Identification Service to detect A:shiwi (Pueblo of Zuni) language terms through a case study of metadata records from the two largest natural history museums in the USA. We utilize an elimination mechanism to exclude specific languages (e.g., English and Spanish) at the word and phrase levels to detect A:shiwi terms. Our approach outperforms conventional methods in terms of accuracy, recall, precision, and F1-scores. This method can be used to confront the “Digital Heap” of cultural heritage records across institutions to improve the discoverability of Indigenous languages in metadata descriptions and reconnect source communities with items of cultural patrimony.</p></div>","PeriodicalId":47165,"journal":{"name":"AI & Society","volume":"40 6","pages":"4461 - 4471"},"PeriodicalIF":4.7000,"publicationDate":"2025-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"AI & Society","FirstCategoryId":"1085","ListUrlMain":"https://link.springer.com/article/10.1007/s00146-025-02223-y","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Museum collection databases contain echoes of encounter between colonial collectors (broadly defined) and Indigenous people from around the world. The moment of acquisition—when an item passed out of a community and into the hands of the collector—often included multilingual acts of translation. An artist may have shared the Indigenous name of the object, or the terms associated with its origin and use. Late nineteenth and twemtieth century museum registrars would in turn transcribe this information from field logs into museum catalogs. Over time, these catalog entries were transformed into digital records within collections managements systems (e.g., EMu, PastPerfect, etc.). As a result of this 150-year process, today’s museum collection databases are riddled with Indigenous words and descriptions, scattered across various metadata fields. They may include Native place-names, family names or vocabulary terms that, when translated, extend far beyond the categories ascribed by museum collection managers. These instances of Indigenous description may also serve as a crucial bridge for reconnecting source communities with items of particular interest to their cultural heritage and linguistic preservation efforts. Aiming to enhance the accessibility of Indigenous languages contained in the metadata of cultural heritage collections, this paper explores applications of machine learning methodologies to identify Indigenous terms present in museum catalogs. Specifically, we discuss methods that incorporate the Google Cloud Language Identification Service to detect A:shiwi (Pueblo of Zuni) language terms through a case study of metadata records from the two largest natural history museums in the USA. We utilize an elimination mechanism to exclude specific languages (e.g., English and Spanish) at the word and phrase levels to detect A:shiwi terms. Our approach outperforms conventional methods in terms of accuracy, recall, precision, and F1-scores. This method can be used to confront the “Digital Heap” of cultural heritage records across institutions to improve the discoverability of Indigenous languages in metadata descriptions and reconnect source communities with items of cultural patrimony.

分离本土语言目录描述的机器学习方法
博物馆的藏品数据库包含了来自世界各地的殖民收藏家(广义上的)和土著居民之间相遇的回声。获取的那一刻——当一件物品从一个社区传递到收藏家手中——通常包括多语种的翻译。艺术家可能分享了该物品的土著名称,或与其起源和用途相关的术语。19世纪末和20世纪,博物馆的登记员反过来将这些信息从现场日志转录到博物馆目录中。随着时间的推移,这些目录条目在收藏管理系统(例如,EMu, PastPerfect等)中被转换为数字记录。由于这150年的过程,今天的博物馆收藏数据库充斥着土著文字和描述,分散在各种元数据领域。它们可能包括当地的地名、姓氏或词汇,这些词汇在翻译后远远超出了博物馆收藏经理所赋予的类别。这些土著描述的实例也可以作为一个至关重要的桥梁,将源社区与其文化遗产和语言保护工作中特别感兴趣的项目重新联系起来。为了提高文化遗产收藏元数据中土著语言的可访问性,本文探索了机器学习方法的应用,以识别博物馆目录中存在的土著术语。具体来说,我们讨论了结合谷歌云语言识别服务的方法,通过对美国两个最大的自然历史博物馆的元数据记录的案例研究来检测A:shiwi (Zuni的普韦布洛人)语言术语。我们利用消除机制在单词和短语级别排除特定语言(例如英语和西班牙语)来检测A:shiwi术语。我们的方法在准确率、召回率、精确度和f1分数方面优于传统方法。该方法可用于应对各机构文化遗产记录的“数字堆”,以提高元数据描述中土著语言的可发现性,并将源社区与文化遗产项目重新联系起来。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
AI & Society
AI & Society COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-
CiteScore
8.00
自引率
20.00%
发文量
257
期刊介绍: AI & Society: Knowledge, Culture and Communication, is an International Journal publishing refereed scholarly articles, position papers, debates, short communications, and reviews of books and other publications. Established in 1987, the Journal focuses on societal issues including the design, use, management, and policy of information, communications and new media technologies, with a particular emphasis on cultural, social, cognitive, economic, ethical, and philosophical implications. AI & Society has a broad scope and is strongly interdisciplinary. We welcome contributions and participation from researchers and practitioners in a variety of fields including information technologies, humanities, social sciences, arts and sciences. This includes broader societal and cultural impacts, for example on governance, security, sustainability, identity, inclusion, working life, corporate and community welfare, and well-being of people. Co-authored articles from diverse disciplines are encouraged. AI & Society seeks to promote an understanding of the potential, transformative impacts and critical consequences of pervasive technology for societies. Technological innovations, including new sciences such as biotech, nanotech and neuroscience, offer a great potential for societies, but also pose existential risk. Rooted in the human-centred tradition of science and technology, the Journal acts as a catalyst, promoter and facilitator of engagement with diversity of voices and over-the-horizon issues of arts, science, technology and society. AI & Society expects that, in keeping with the ethos of the journal, submissions should provide a substantial and explicit argument on the societal dimension of research, particularly the benefits, impacts and implications for society. This may include factors such as trust, biases, privacy, reliability, responsibility, and competence of AI systems. Such arguments should be validated by critical comment on current research in this area. Curmudgeon Corner will retain its opinionated ethos. The journal is in three parts: a) full length scholarly articles; b) strategic ideas, critical reviews and reflections; c) Student Forum is for emerging researchers and new voices to communicate their ongoing research to the wider academic community, mentored by the Journal Advisory Board; Book Reviews and News; Curmudgeon Corner for the opinionated. Papers in the Original Section may include original papers, which are underpinned by theoretical, methodological, conceptual or philosophical foundations. The Open Forum Section may include strategic ideas, critical reviews and potential implications for society of current research. Network Research Section papers make substantial contributions to theoretical and methodological foundations within societal domains. These will be multi-authored papers that include a summary of the contribution of each author to the paper. Original, Open Forum and Network papers are peer reviewed. The Student Forum Section may include theoretical, methodological, and application orientations of ongoing research including case studies, as well as, contextual action research experiences. Papers in this section are normally single-authored and are also formally reviewed. Curmudgeon Corner is a short opinionated column on trends in technology, arts, science and society, commenting emphatically on issues of concern to the research community and wider society. Normal word length: Original and Network Articles 10k, Open Forum 8k, Student Forum 6k, Curmudgeon 1k. The exception to the co-author limit of Original and Open Forum (4), Network (10), Student (3) and Curmudgeon (2) articles will be considered for their special contributions. Please do not send your submissions by email but use the "Submit manuscript" button. NOTE TO AUTHORS: The Journal expects its authors to include, in their submissions: a) An acknowledgement of the pre-accept/pre-publication versions of their manuscripts on non-commercial and academic sites. b) Images: obtain permissions from the copyright holder/original sources. c) Formal permission from their ethics committees when conducting studies with people.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信