Nikolai Vogler, Kartik Goyal, Samuel V. Lemley, D. J. Schuldt, Christopher N. Warren, Max G'Sell, Taylor Berg-Kirkpatrick
{"title":"对流水账进行分组以了解早期现代书籍的印刷情况","authors":"Nikolai Vogler, Kartik Goyal, Samuel V. Lemley, D. J. Schuldt, Christopher N. Warren, Max G'Sell, Taylor Berg-Kirkpatrick","doi":"arxiv-2405.00752","DOIUrl":null,"url":null,"abstract":"We propose a novel computational approach to automatically analyze the\nphysical process behind printing of early modern letterpress books via\nclustering the running titles found at the top of their pages. Specifically, we\ndesign and compare custom neural and feature-based kernels for computing\npairwise visual similarity of a scanned document's running titles and cluster\nthe titles in order to track any deviations from the expected pattern of a\nbook's printing. Unlike body text which must be reset for every page, the\nrunning titles are one of the static type elements in a skeleton forme i.e. the\nframe used to print each side of a sheet of paper, and were often re-used\nduring a book's printing. To evaluate the effectiveness of our approach, we\nmanually annotate the running title clusters on about 1600 pages across 8 early\nmodern books of varying size and formats. Our method can detect potential\ndeviation from the expected patterns of such skeleton formes, which helps\nbibliographers understand the phenomena associated with a text's transmission,\nsuch as censorship. We also validate our results against a manual bibliographic\nanalysis of a counterfeit early edition of Thomas Hobbes' Leviathan (1651).","PeriodicalId":501285,"journal":{"name":"arXiv - CS - Digital Libraries","volume":"12 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Clustering Running Titles to Understand the Printing of Early Modern Books\",\"authors\":\"Nikolai Vogler, Kartik Goyal, Samuel V. Lemley, D. J. Schuldt, Christopher N. Warren, Max G'Sell, Taylor Berg-Kirkpatrick\",\"doi\":\"arxiv-2405.00752\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We propose a novel computational approach to automatically analyze the\\nphysical process behind printing of early modern letterpress books via\\nclustering the running titles found at the top of their pages. Specifically, we\\ndesign and compare custom neural and feature-based kernels for computing\\npairwise visual similarity of a scanned document's running titles and cluster\\nthe titles in order to track any deviations from the expected pattern of a\\nbook's printing. Unlike body text which must be reset for every page, the\\nrunning titles are one of the static type elements in a skeleton forme i.e. the\\nframe used to print each side of a sheet of paper, and were often re-used\\nduring a book's printing. To evaluate the effectiveness of our approach, we\\nmanually annotate the running title clusters on about 1600 pages across 8 early\\nmodern books of varying size and formats. Our method can detect potential\\ndeviation from the expected patterns of such skeleton formes, which helps\\nbibliographers understand the phenomena associated with a text's transmission,\\nsuch as censorship. We also validate our results against a manual bibliographic\\nanalysis of a counterfeit early edition of Thomas Hobbes' Leviathan (1651).\",\"PeriodicalId\":501285,\"journal\":{\"name\":\"arXiv - CS - Digital Libraries\",\"volume\":\"12 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Digital Libraries\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2405.00752\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Digital Libraries","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2405.00752","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Clustering Running Titles to Understand the Printing of Early Modern Books
We propose a novel computational approach to automatically analyze the
physical process behind printing of early modern letterpress books via
clustering the running titles found at the top of their pages. Specifically, we
design and compare custom neural and feature-based kernels for computing
pairwise visual similarity of a scanned document's running titles and cluster
the titles in order to track any deviations from the expected pattern of a
book's printing. Unlike body text which must be reset for every page, the
running titles are one of the static type elements in a skeleton forme i.e. the
frame used to print each side of a sheet of paper, and were often re-used
during a book's printing. To evaluate the effectiveness of our approach, we
manually annotate the running title clusters on about 1600 pages across 8 early
modern books of varying size and formats. Our method can detect potential
deviation from the expected patterns of such skeleton formes, which helps
bibliographers understand the phenomena associated with a text's transmission,
such as censorship. We also validate our results against a manual bibliographic
analysis of a counterfeit early edition of Thomas Hobbes' Leviathan (1651).