开发计算机视觉和机器学习策略,以解锁政府创建的记录

IF 4.7 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Greg Jansen, Richard Marciano
{"title":"开发计算机视觉和机器学习策略,以解锁政府创建的记录","authors":"Greg Jansen,&nbsp;Richard Marciano","doi":"10.1007/s00146-025-02231-y","DOIUrl":null,"url":null,"abstract":"<div><p>This paper outlines the development of a proof-of-concept workflow using machine learning and computer vision techniques to unlock the data within digitized handwritten US Census forms from the 1950s. The 1950s US Census includes over 6.5 million page images and was only recently made available to the public on April 1, 2022, following a 72-year access restriction period. Our project uses computational treatments to assist researchers in their efforts to recover and preserve the history of the erased Sacramento Japantown. Sacramento once housed the fourth largest Japantown in the United States before experiencing WWII Japanese American Incarceration and the 1950s US Government program of urban renewal. The goal is to augment a researcher’s work in selecting a subset of Census pages for further transcription and analysis. We demonstrate a workflow for extracting demographic information using computer vision for image segmentation, and machine learning for handwritten character recognition. The workflow consists of a computational filtering process for Census records and a user interface for page review. These computational techniques are suitable for other cities, states, and communities, and demonstrate new strategies to unlock vital demographic information. The approach highlights the potential benefits of computational techniques for the analysis of form-based historical records of the twentieth century that can have an impact on social justice.</p></div>","PeriodicalId":47165,"journal":{"name":"AI & Society","volume":"40 6","pages":"4513 - 4529"},"PeriodicalIF":4.7000,"publicationDate":"2025-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Developing computer vision and machine learning strategies to unlock government-created records\",\"authors\":\"Greg Jansen,&nbsp;Richard Marciano\",\"doi\":\"10.1007/s00146-025-02231-y\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>This paper outlines the development of a proof-of-concept workflow using machine learning and computer vision techniques to unlock the data within digitized handwritten US Census forms from the 1950s. The 1950s US Census includes over 6.5 million page images and was only recently made available to the public on April 1, 2022, following a 72-year access restriction period. Our project uses computational treatments to assist researchers in their efforts to recover and preserve the history of the erased Sacramento Japantown. Sacramento once housed the fourth largest Japantown in the United States before experiencing WWII Japanese American Incarceration and the 1950s US Government program of urban renewal. The goal is to augment a researcher’s work in selecting a subset of Census pages for further transcription and analysis. We demonstrate a workflow for extracting demographic information using computer vision for image segmentation, and machine learning for handwritten character recognition. The workflow consists of a computational filtering process for Census records and a user interface for page review. These computational techniques are suitable for other cities, states, and communities, and demonstrate new strategies to unlock vital demographic information. The approach highlights the potential benefits of computational techniques for the analysis of form-based historical records of the twentieth century that can have an impact on social justice.</p></div>\",\"PeriodicalId\":47165,\"journal\":{\"name\":\"AI & Society\",\"volume\":\"40 6\",\"pages\":\"4513 - 4529\"},\"PeriodicalIF\":4.7000,\"publicationDate\":\"2025-02-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"AI & Society\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://link.springer.com/article/10.1007/s00146-025-02231-y\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"AI & Society","FirstCategoryId":"1085","ListUrlMain":"https://link.springer.com/article/10.1007/s00146-025-02231-y","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

摘要

本文概述了使用机器学习和计算机视觉技术来解锁20世纪50年代数字化手写美国人口普查表格中的数据的概念验证工作流的发展。20世纪50年代的美国人口普查包括超过650万页图像,直到最近才在72年的访问限制期之后,于2022年4月1日向公众开放。我们的项目使用计算机处理来帮助研究人员努力恢复和保存被抹去的萨克拉门托日本城的历史。在经历第二次世界大战日裔美国人监禁和20世纪50年代美国政府城市重建计划之前,萨克拉门托曾经是美国第四大日本城。目标是增加研究人员的工作,选择人口普查页面的子集,以进一步转录和分析。我们演示了一个使用计算机视觉提取人口统计信息的工作流程,用于图像分割,以及用于手写字符识别的机器学习。该工作流由用于普查记录的计算过滤过程和用于页面审查的用户界面组成。这些计算技术适用于其他城市、州和社区,并展示了解锁重要人口统计信息的新策略。该方法强调了计算技术在分析20世纪基于形式的历史记录方面的潜在好处,这些记录可能对社会正义产生影响。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

Developing computer vision and machine learning strategies to unlock government-created records

Developing computer vision and machine learning strategies to unlock government-created records

Developing computer vision and machine learning strategies to unlock government-created records

This paper outlines the development of a proof-of-concept workflow using machine learning and computer vision techniques to unlock the data within digitized handwritten US Census forms from the 1950s. The 1950s US Census includes over 6.5 million page images and was only recently made available to the public on April 1, 2022, following a 72-year access restriction period. Our project uses computational treatments to assist researchers in their efforts to recover and preserve the history of the erased Sacramento Japantown. Sacramento once housed the fourth largest Japantown in the United States before experiencing WWII Japanese American Incarceration and the 1950s US Government program of urban renewal. The goal is to augment a researcher’s work in selecting a subset of Census pages for further transcription and analysis. We demonstrate a workflow for extracting demographic information using computer vision for image segmentation, and machine learning for handwritten character recognition. The workflow consists of a computational filtering process for Census records and a user interface for page review. These computational techniques are suitable for other cities, states, and communities, and demonstrate new strategies to unlock vital demographic information. The approach highlights the potential benefits of computational techniques for the analysis of form-based historical records of the twentieth century that can have an impact on social justice.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
AI & Society
AI & Society COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-
CiteScore
8.00
自引率
20.00%
发文量
257
期刊介绍: AI & Society: Knowledge, Culture and Communication, is an International Journal publishing refereed scholarly articles, position papers, debates, short communications, and reviews of books and other publications. Established in 1987, the Journal focuses on societal issues including the design, use, management, and policy of information, communications and new media technologies, with a particular emphasis on cultural, social, cognitive, economic, ethical, and philosophical implications. AI & Society has a broad scope and is strongly interdisciplinary. We welcome contributions and participation from researchers and practitioners in a variety of fields including information technologies, humanities, social sciences, arts and sciences. This includes broader societal and cultural impacts, for example on governance, security, sustainability, identity, inclusion, working life, corporate and community welfare, and well-being of people. Co-authored articles from diverse disciplines are encouraged. AI & Society seeks to promote an understanding of the potential, transformative impacts and critical consequences of pervasive technology for societies. Technological innovations, including new sciences such as biotech, nanotech and neuroscience, offer a great potential for societies, but also pose existential risk. Rooted in the human-centred tradition of science and technology, the Journal acts as a catalyst, promoter and facilitator of engagement with diversity of voices and over-the-horizon issues of arts, science, technology and society. AI & Society expects that, in keeping with the ethos of the journal, submissions should provide a substantial and explicit argument on the societal dimension of research, particularly the benefits, impacts and implications for society. This may include factors such as trust, biases, privacy, reliability, responsibility, and competence of AI systems. Such arguments should be validated by critical comment on current research in this area. Curmudgeon Corner will retain its opinionated ethos. The journal is in three parts: a) full length scholarly articles; b) strategic ideas, critical reviews and reflections; c) Student Forum is for emerging researchers and new voices to communicate their ongoing research to the wider academic community, mentored by the Journal Advisory Board; Book Reviews and News; Curmudgeon Corner for the opinionated. Papers in the Original Section may include original papers, which are underpinned by theoretical, methodological, conceptual or philosophical foundations. The Open Forum Section may include strategic ideas, critical reviews and potential implications for society of current research. Network Research Section papers make substantial contributions to theoretical and methodological foundations within societal domains. These will be multi-authored papers that include a summary of the contribution of each author to the paper. Original, Open Forum and Network papers are peer reviewed. The Student Forum Section may include theoretical, methodological, and application orientations of ongoing research including case studies, as well as, contextual action research experiences. Papers in this section are normally single-authored and are also formally reviewed. Curmudgeon Corner is a short opinionated column on trends in technology, arts, science and society, commenting emphatically on issues of concern to the research community and wider society. Normal word length: Original and Network Articles 10k, Open Forum 8k, Student Forum 6k, Curmudgeon 1k. The exception to the co-author limit of Original and Open Forum (4), Network (10), Student (3) and Curmudgeon (2) articles will be considered for their special contributions. Please do not send your submissions by email but use the "Submit manuscript" button. NOTE TO AUTHORS: The Journal expects its authors to include, in their submissions: a) An acknowledgement of the pre-accept/pre-publication versions of their manuscripts on non-commercial and academic sites. b) Images: obtain permissions from the copyright holder/original sources. c) Formal permission from their ethics committees when conducting studies with people.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信