{"title":"Linguistic Archives and Language Communities Questionnaire","authors":"I. Khait, Leonore Lukschy, Mandana Seyfeddinipur","doi":"10.12794/langarc1851179","DOIUrl":"https://doi.org/10.12794/langarc1851179","url":null,"abstract":"Digital language archives hold vast amounts of materials in endangered or marginalised languages. However, due to limitations in technical infrastructure and the design of these archives, the materials are usually not easily accessible to speakers of the languages represented or their descendants. With the goal to establish best practices for researchers archiving linguistic data, this paper presents a questionnaire designed to assess how archival materials can be made more readily available to language communities.","PeriodicalId":315889,"journal":{"name":"Proceedings of the International Workshop on Digital Language Archives: LangArc 2021","volume":"37 18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116789105","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Linguistic Repositories as Asset: Challenge for Sociolinguistic Approach in Brazil","authors":"Raquel Meister Ko. Freitag","doi":"10.12794/langarc1851177","DOIUrl":"https://doi.org/10.12794/langarc1851177","url":null,"abstract":"This paper provides remarks for a management plan for Brazilian linguistic documentation repositories in order to contribute to their conservation. The depreciation, authorship, sharing, and financing problems are discussed, pointing solutions.","PeriodicalId":315889,"journal":{"name":"Proceedings of the International Workshop on Digital Language Archives: LangArc 2021","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124255200","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Track to the Past: Tracking Workflows, Versions, and Citations of Legacy Language Data","authors":"Tobias Weber","doi":"10.12794/langarc1851185","DOIUrl":"https://doi.org/10.12794/langarc1851185","url":null,"abstract":"This paper discusses three issues encountered with legacy language data in archives: First, the provenance of an artefact containing the data may be unclear, as well as all procedures that shaped its form(at) or contents. Second, legacy language data are often orphan data with opaque links to other versions, or texts providing more information on them and their contents. Third, these data predate methods of data citation, thus requiring retroactive ways of citation tracking. With a few adjustments to their infrastructures, digital archives can be used as a platform to track workflows, versioning, and citations of legacy language data.","PeriodicalId":315889,"journal":{"name":"Proceedings of the International Workshop on Digital Language Archives: LangArc 2021","volume":"78 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115307112","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Anna Smirnova Henriques, Aleksandra S. Skorobogatova, Tatiana V. Kachkovskaia, P. Skrelin, S. Ruseishvili, S. Madureira, Irina A. Sekerina
{"title":"Challenges in Heritage Language Documentations: BraPoRus, Spoken Corpus of Heritage Russian in Brazil","authors":"Anna Smirnova Henriques, Aleksandra S. Skorobogatova, Tatiana V. Kachkovskaia, P. Skrelin, S. Ruseishvili, S. Madureira, Irina A. Sekerina","doi":"10.12794/langarc1851178","DOIUrl":"https://doi.org/10.12794/langarc1851178","url":null,"abstract":"The Bolshevik revolution in 1917, followed by the Civil War, induced a big wave of emigration from the ex-Russian Empire. These emigrants created their “Russia Abroad”. Many Russians stayed in Europe or China, but, in the 1940s and 1950s, many of them went to the USA, Latin America and other destinations. The importance of preserving the memories and documents of the old waves of the Russian emigration is crucial. Our group is collecting a corpus of heritage Russian in Brazil, the BRAzilian POrtuguese RUSsian Corpus (BraPoRus). While the history of Russian immigration in Brazil is to some extent studied, their remarkably preserved Russian has not been described. Our current aim is to describe the BraPoRus, a corpus that consists of multiple speech samples of older Russian heritage speakers in Brazil, and to discuss the best ways to make these data available in the forms that satisfy the requirements both for the linguistic and sociological research.","PeriodicalId":315889,"journal":{"name":"Proceedings of the International Workshop on Digital Language Archives: LangArc 2021","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114574790","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"How Software Features and Linguistic Analyses Add Value to Orthographic Markup in Transcription of Multilingual Recordings for Digital Archives","authors":"E. Rodríguez, R. Vann","doi":"10.12794/langarc1851183","DOIUrl":"https://doi.org/10.12794/langarc1851183","url":null,"abstract":"This report discusses the importance of accounting for language contact and discourse circumstance in orthographic transcriptions of multilingual recordings of spoken language for deposit in digital language archives (DLAs). Our account provides a linguistically informed approach to the multilingual representation of spontaneous speech patterns, taking steps toward documenting ancestral and emergent codes. Our findings lead to portable lessons learned including (a) the conclusion that transcriptions can benefit from a bottom-up approach targeting particular linguistic features of sociocultural relevance to the community documented and (b) the implication (for researchers developing transcriptions for other DLAs) that the principled implementation of particular software features in tandem with systematic linguistic analysis can be helpful in finding and classifying such features, especially in multilingual recordings.","PeriodicalId":315889,"journal":{"name":"Proceedings of the International Workshop on Digital Language Archives: LangArc 2021","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114911443","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Linguistic Analysis, Ethical Practice, and Quality Assurance in Anonymizing Recordings of Spoken Language for Deposit in Digital Archives","authors":"Diana Sofia Ovalle Lopez, R. Vann","doi":"10.12794/langarc1851180","DOIUrl":"https://doi.org/10.12794/langarc1851180","url":null,"abstract":"This report considers linguistic analyses as matters of ethical practice and quality assurance in the anonymization of recordings of spoken language for deposit in a digital language archive. Ethically, researchers must be committed to protecting the identities of primary data providers. Accordingly, conducting pragmatic analyses before initiating technical anonymization procedures can aid in determining exactly what discourse, in what contexts, might constitute identifying information in need of anonymization. Qualitatively, one of the main goals of language documentation is to preserve as much primary data as possible for future research. Accordingly, conducting phonotactic analyses with the help of computer software can aid in determining precise chronometer readings for each tonal insertion to excise as little primary data as possible during anonymizations. These findings warrant further research on anonymization protocols in digital language archive projects.","PeriodicalId":315889,"journal":{"name":"Proceedings of the International Workshop on Digital Language Archives: LangArc 2021","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121318086","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Emerging Role of Libraries in Language Archiving in India A Case Study of SiDHELA","authors":"Karthic Narayanan, Meriaba Takhellambam","doi":"10.12794/langarc1851181","DOIUrl":"https://doi.org/10.12794/langarc1851181","url":null,"abstract":"SiDHELA is a language archive developed by the Centre for Endangered Languages, Sikkim University in collaboration with the Central Library, Sikkim University. It is the first language archive developed in India. SiDHELA is a model attempt at digital archiving in collaboration with communities of Sikkim and North Bengal region of India. The main highlight of the paper is the possibilities which emerges out of a collaboration between under resourced indigenous communities and an institutional library backed by a language documentation project to curate digital contents for endangered and lesser known languages from under resourced regions like the Northeast of India.","PeriodicalId":315889,"journal":{"name":"Proceedings of the International Workshop on Digital Language Archives: LangArc 2021","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125469736","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Vera Ferreira, Leonore Lukschy, Buachut Watyam, Siripen Ungsitipoonpor, Mandana Seyfeddinipur
{"title":"A Website Is a Website Is a Website: Why Trusted Repositories Are Needed More Than Ever","authors":"Vera Ferreira, Leonore Lukschy, Buachut Watyam, Siripen Ungsitipoonpor, Mandana Seyfeddinipur","doi":"10.12794/langarc1851176","DOIUrl":"https://doi.org/10.12794/langarc1851176","url":null,"abstract":"Over the last two decades there has been a surge in activists, linguists, anthropologists, documenters digitally recording endangered language use. These unique records often are uploaded to corporate social media sites or to privately run websites. Despite popular belief, uploading these materials to a server does not mean they are archived and preserved for future generations. In this paper we discuss the differences between professional archiving systems and content management system (CMS) based approaches to making language materials accessible. Looking at the example of the Archive of Languages and Cultures of Ethnic Groups of Thailand we discuss the benefits of a Mukurtu based community website, and how linking it to a professional archive can ensure long-term preservation of precious and unique language materials.","PeriodicalId":315889,"journal":{"name":"Proceedings of the International Workshop on Digital Language Archives: LangArc 2021","volume":"154 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127313491","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Challenges to Representing Personal Names and Language Names in Language Archives: Examples from Northeast India","authors":"Mary Burke, S. Chelliah","doi":"10.12794/langarc1851173","DOIUrl":"https://doi.org/10.12794/langarc1851173","url":null,"abstract":"Language archives are not only a valuable resource for language communities to tell their stories and to create lasting records of their ways of life, but also for those interested in anthropology, linguistics, agriculture, or art history. This recent emphasis on archiving primary datasets in linguistics has resulted in an abundance of datasets online; however, of the languages of South Asia, only a small percentage are represented in digital language archives or described thoroughly. Though several of these languages are being documented, this material is at risk of being lost or inaccessible without concerted attention paid to long-term preservation. There are several obstacles to documenting and archiving language materials from this area, including political instability and lack of access to infrastructure. This submission reviews one particular challenge to data management relevant to South Asia, which is the complexity of names (of individuals, groups, and languages). We provide examples from Northeast India and recommendations based on experience from CoRSAL (Computational Resource for South Asia).","PeriodicalId":315889,"journal":{"name":"Proceedings of the International Workshop on Digital Language Archives: LangArc 2021","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116804373","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mark Phillips, Mary Burke, H. Tarver, Oksana L. Zavalina
{"title":"Leveraging Digital Library Infrastructure to Build a Language Archive","authors":"Mark Phillips, Mary Burke, H. Tarver, Oksana L. Zavalina","doi":"10.12794/langarc1851182","DOIUrl":"https://doi.org/10.12794/langarc1851182","url":null,"abstract":"Building a digital language archive requires a number of steps to ensure collecting, describing, preserving, and providing access to language data in effective and efficient ways. The Computational Resource for South Asian Languages (CoRSAL) group has partnered with the University of North Texas (UNT) Digital Library to build a series of interconnected digital collections that leverage existing UNT technical and metadata infrastructure to provide access to data from and for various language communities. This article introduces the reader to the background of this project and discusses some of the important for representing language materials areas where UNT metadata has needed flexibility to better fit the needs of intended audiences. These areas include a workflow for standardized language representation (the Language field), defining roles for persons related to the item (Creator and Contributor fields), and representing interconnections between related items (the Relation field). Although further work is needed to improve language data representation in the CoRSAL digital language archive, we believe the model adopted by our team and lessons learned could benefit others in the language archiving community.","PeriodicalId":315889,"journal":{"name":"Proceedings of the International Workshop on Digital Language Archives: LangArc 2021","volume":"316 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123060100","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}