{"title":"Mapping the Past: Geographically Linking an Early 20th Century Swedish Encyclopedia with Wikidata","authors":"Axel Ahlin, Alfred Myrne, Pierre Nugues","doi":"arxiv-2406.17903","DOIUrl":null,"url":null,"abstract":"In this paper, we describe the extraction of all the location entries from a\nprominent Swedish encyclopedia from the early 20th century, the \\textit{Nordisk\nFamiljebok} `Nordic Family Book.' We focused on the second edition called\n\\textit{Uggleupplagan}, which comprises 38 volumes and over 182,000 articles.\nThis makes it one of the most extensive Swedish encyclopedias. Using a\nclassifier, we first determined the category of the entries. We found that\napproximately 22 percent of them were locations. We applied a named entity\nrecognition to these entries and we linked them to Wikidata. Wikidata enabled\nus to extract their precise geographic locations resulting in almost 18,000\nvalid coordinates. We then analyzed the distribution of these locations and the\nentry selection process. It showed a higher density within Sweden, Germany, and\nthe United Kingdom. The paper sheds light on the selection and representation\nof geographic information in the \\textit{Nordisk Familjebok}, providing\ninsights into historical and societal perspectives. It also paves the way for\nfuture investigations into entry selection in different time periods and\ncomparative analyses among various encyclopedias.","PeriodicalId":501285,"journal":{"name":"arXiv - CS - Digital Libraries","volume":"66 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Digital Libraries","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2406.17903","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In this paper, we describe the extraction of all the location entries from a
prominent Swedish encyclopedia from the early 20th century, the \textit{Nordisk
Familjebok} `Nordic Family Book.' We focused on the second edition called
\textit{Uggleupplagan}, which comprises 38 volumes and over 182,000 articles.
This makes it one of the most extensive Swedish encyclopedias. Using a
classifier, we first determined the category of the entries. We found that
approximately 22 percent of them were locations. We applied a named entity
recognition to these entries and we linked them to Wikidata. Wikidata enabled
us to extract their precise geographic locations resulting in almost 18,000
valid coordinates. We then analyzed the distribution of these locations and the
entry selection process. It showed a higher density within Sweden, Germany, and
the United Kingdom. The paper sheds light on the selection and representation
of geographic information in the \textit{Nordisk Familjebok}, providing
insights into historical and societal perspectives. It also paves the way for
future investigations into entry selection in different time periods and
comparative analyses among various encyclopedias.