Marcos Gabriel Mendes Lauande , Geraldo Braz Júnior , João Dallyson Sousa de Almeida , Vandecia Rejane Monteiro Fernandes , Anselmo Cardoso de Paiva , Rui Miguel Gil da Costa , Amanda Mara Teles , Leandro Lima da Silva , Haissa Oliveira Brito , Flávia Castello Branco Vidal
{"title":"PCPAm -一个用于分类任务的阴茎癌组织病理图像数据集","authors":"Marcos Gabriel Mendes Lauande , Geraldo Braz Júnior , João Dallyson Sousa de Almeida , Vandecia Rejane Monteiro Fernandes , Anselmo Cardoso de Paiva , Rui Miguel Gil da Costa , Amanda Mara Teles , Leandro Lima da Silva , Haissa Oliveira Brito , Flávia Castello Branco Vidal","doi":"10.1016/j.dib.2025.111823","DOIUrl":null,"url":null,"abstract":"<div><div>Penile cancer has an incidence strongly linked to sociocultural factors, being more common in underdeveloped countries like Brazil, where it represents approximately 2% of cancers affecting men. This dataset was created to address the scarcity of publicly available resources for classifying histopathological images in penile cancer research. The images were collected in 2021 from tissue samples obtained through biopsies of patients undergoing treatment for penile cancer. After staining with Hematoxylin and Eosin (H&E), the tissue samples were photographed using a Leica ICC50 HD camera attached to a bright-field microscope (Leica DM500). The dataset comprises 194 high-resolution images (2048 × 1536 pixels), categorized by magnification (40X and 100X) and pathological classification (Tumor or Non-Tumor). Metadata includes additional information such as histological grade and, for some images, HPV status. Although previous works have focused primarily on binary classification tasks, the dataset includes additional labels, such as histological grade and HPV (Human Papilloma Virus) presence, which provide opportunities for multi-label classification or other types of predictive modelling. These extended labels enhance the dataset’s versatility for more complex tasks in medical image analysis. The dataset holds significant reuse potential for machine learning tasks beyond binary classification, allowing researchers to explore additional layers of analysis, such as HPV detection and histological grading. It can also be used for model benchmarking and comparative studies in cancer research, contributing to developing new diagnostic tools. The dataset and metadata are available for further research and model development.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"61 ","pages":"Article 111823"},"PeriodicalIF":1.4000,"publicationDate":"2025-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"PCPAm - A dataset of histopathological images of penile cancer for classification tasks\",\"authors\":\"Marcos Gabriel Mendes Lauande , Geraldo Braz Júnior , João Dallyson Sousa de Almeida , Vandecia Rejane Monteiro Fernandes , Anselmo Cardoso de Paiva , Rui Miguel Gil da Costa , Amanda Mara Teles , Leandro Lima da Silva , Haissa Oliveira Brito , Flávia Castello Branco Vidal\",\"doi\":\"10.1016/j.dib.2025.111823\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Penile cancer has an incidence strongly linked to sociocultural factors, being more common in underdeveloped countries like Brazil, where it represents approximately 2% of cancers affecting men. This dataset was created to address the scarcity of publicly available resources for classifying histopathological images in penile cancer research. The images were collected in 2021 from tissue samples obtained through biopsies of patients undergoing treatment for penile cancer. After staining with Hematoxylin and Eosin (H&E), the tissue samples were photographed using a Leica ICC50 HD camera attached to a bright-field microscope (Leica DM500). The dataset comprises 194 high-resolution images (2048 × 1536 pixels), categorized by magnification (40X and 100X) and pathological classification (Tumor or Non-Tumor). Metadata includes additional information such as histological grade and, for some images, HPV status. Although previous works have focused primarily on binary classification tasks, the dataset includes additional labels, such as histological grade and HPV (Human Papilloma Virus) presence, which provide opportunities for multi-label classification or other types of predictive modelling. These extended labels enhance the dataset’s versatility for more complex tasks in medical image analysis. The dataset holds significant reuse potential for machine learning tasks beyond binary classification, allowing researchers to explore additional layers of analysis, such as HPV detection and histological grading. It can also be used for model benchmarking and comparative studies in cancer research, contributing to developing new diagnostic tools. The dataset and metadata are available for further research and model development.</div></div>\",\"PeriodicalId\":10973,\"journal\":{\"name\":\"Data in Brief\",\"volume\":\"61 \",\"pages\":\"Article 111823\"},\"PeriodicalIF\":1.4000,\"publicationDate\":\"2025-06-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Data in Brief\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2352340925005505\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"MULTIDISCIPLINARY SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Data in Brief","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2352340925005505","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
PCPAm - A dataset of histopathological images of penile cancer for classification tasks
Penile cancer has an incidence strongly linked to sociocultural factors, being more common in underdeveloped countries like Brazil, where it represents approximately 2% of cancers affecting men. This dataset was created to address the scarcity of publicly available resources for classifying histopathological images in penile cancer research. The images were collected in 2021 from tissue samples obtained through biopsies of patients undergoing treatment for penile cancer. After staining with Hematoxylin and Eosin (H&E), the tissue samples were photographed using a Leica ICC50 HD camera attached to a bright-field microscope (Leica DM500). The dataset comprises 194 high-resolution images (2048 × 1536 pixels), categorized by magnification (40X and 100X) and pathological classification (Tumor or Non-Tumor). Metadata includes additional information such as histological grade and, for some images, HPV status. Although previous works have focused primarily on binary classification tasks, the dataset includes additional labels, such as histological grade and HPV (Human Papilloma Virus) presence, which provide opportunities for multi-label classification or other types of predictive modelling. These extended labels enhance the dataset’s versatility for more complex tasks in medical image analysis. The dataset holds significant reuse potential for machine learning tasks beyond binary classification, allowing researchers to explore additional layers of analysis, such as HPV detection and histological grading. It can also be used for model benchmarking and comparative studies in cancer research, contributing to developing new diagnostic tools. The dataset and metadata are available for further research and model development.
期刊介绍:
Data in Brief provides a way for researchers to easily share and reuse each other''s datasets by publishing data articles that: -Thoroughly describe your data, facilitating reproducibility. -Make your data, which is often buried in supplementary material, easier to find. -Increase traffic towards associated research articles and data, leading to more citations. -Open up doors for new collaborations. Because you never know what data will be useful to someone else, Data in Brief welcomes submissions that describe data from all research areas.