Mariana O. Silva, Clarisse Scofield, Luiza de Melo-Gomes, Mirella M. Moro
{"title":"公共领域葡萄牙语作品交叉收集数据集","authors":"Mariana O. Silva, Clarisse Scofield, Luiza de Melo-Gomes, Mirella M. Moro","doi":"10.5753/jidm.2022.2349","DOIUrl":null,"url":null,"abstract":"Many datasets are published in English to get more engagement, popularity and reach within a research community. Indeed, most sciences are language-agnostic and thrive on publicly available data. However, such a claim is not always valid for Arts, where Literature and Music are two examples of fields that heavily rely on the language of the work. Especially in Literature, combining human expertise with book consumers’ data may generate what is needed to sustain constant changes experienced in the book publishing market. Therefore, we introduce PPORTAL, the first public domain Portuguese-language literature dataset that is composed of a wide variety of book-related metadata. Afterintroducing its building process and content, we present an exploratory data analysis with a quantitative description of its main features. We also show its usability as a resource on different research domains through examples of real-world applications, as well as pointing out other potential applications.","PeriodicalId":301338,"journal":{"name":"J. Inf. Data Manag.","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Cross-collection Dataset of Public Domain Portuguese-language Works\",\"authors\":\"Mariana O. Silva, Clarisse Scofield, Luiza de Melo-Gomes, Mirella M. Moro\",\"doi\":\"10.5753/jidm.2022.2349\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Many datasets are published in English to get more engagement, popularity and reach within a research community. Indeed, most sciences are language-agnostic and thrive on publicly available data. However, such a claim is not always valid for Arts, where Literature and Music are two examples of fields that heavily rely on the language of the work. Especially in Literature, combining human expertise with book consumers’ data may generate what is needed to sustain constant changes experienced in the book publishing market. Therefore, we introduce PPORTAL, the first public domain Portuguese-language literature dataset that is composed of a wide variety of book-related metadata. Afterintroducing its building process and content, we present an exploratory data analysis with a quantitative description of its main features. We also show its usability as a resource on different research domains through examples of real-world applications, as well as pointing out other potential applications.\",\"PeriodicalId\":301338,\"journal\":{\"name\":\"J. Inf. Data Manag.\",\"volume\":\"24 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-08-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"J. Inf. Data Manag.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.5753/jidm.2022.2349\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"J. Inf. Data Manag.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5753/jidm.2022.2349","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Cross-collection Dataset of Public Domain Portuguese-language Works
Many datasets are published in English to get more engagement, popularity and reach within a research community. Indeed, most sciences are language-agnostic and thrive on publicly available data. However, such a claim is not always valid for Arts, where Literature and Music are two examples of fields that heavily rely on the language of the work. Especially in Literature, combining human expertise with book consumers’ data may generate what is needed to sustain constant changes experienced in the book publishing market. Therefore, we introduce PPORTAL, the first public domain Portuguese-language literature dataset that is composed of a wide variety of book-related metadata. Afterintroducing its building process and content, we present an exploratory data analysis with a quantitative description of its main features. We also show its usability as a resource on different research domains through examples of real-world applications, as well as pointing out other potential applications.