BRSET: A Brazilian Multilabel Ophthalmological Dataset of Retina Fundus Photos

Luis Filipe Nakayama, David Restrepo, Joao Matos, Lucas Zago Ribeiro, Fernando Korn Malerbi, Leo Anthony Celi, Caio Saito Regatieri
{"title":"BRSET: A Brazilian Multilabel Ophthalmological Dataset of Retina Fundus Photos","authors":"Luis Filipe Nakayama, David Restrepo, Joao Matos, Lucas Zago Ribeiro, Fernando Korn Malerbi, Leo Anthony Celi, Caio Saito Regatieri","doi":"10.1101/2024.01.23.24301660","DOIUrl":null,"url":null,"abstract":"Introduction: The Brazilian Multilabel Ophthalmological Dataset (BRSET) addresses the scarcity of publicly available ophthalmological datasets in Latin America. BRSET comprises 16,266 color fundus retinal photos from 8,524 Brazilian patients, aiming to enhance data representativeness, serving as a research and teaching tool. It contains sociodemographic information, enabling investigations into differential model performance across demographic groups.\nMethods: Data from three São Paulo outpatient centers yielded demographic and medical information from electronic records, including nationality, age, sex, clinical history, insulin use, and duration of diabetes diagnosis. A retinal specialist labeled images for anatomical features (optic disc, blood vessels, macula), quality control (focus, illumination, image field, artifacts), and pathologies (e.g., diabetic retinopathy). Diabetic retinopathy was graded using International Clinic Diabetic Retinopathy and Scottish Diabetic Retinopathy Grading. Validation used Dino V2 Base for feature extraction, with 70% training and 30% testing subsets. Support Vector Machines (SVM) and Logistic Regression (LR) were employed with weighted training. Performance metrics included area under the receiver operating curve (AUC) and Macro F1-score.\nResults: BRSET comprises 65.1% Canon CR2 and 34.9% Nikon NF5050 images. 61.8% of the patients are female, and the average age is 57.6 years. Diabetic retinopathy affected 15.8% of patients, across a spectrum of disease severity. Anatomically, 20.2% showed abnormal optic discs, 4.9% abnormal blood vessels, and 28.8% abnormal macula. Models were trained on BRSET in three prediction tasks: “diabetes diagnosis”; “sex classification”; and “diabetic retinopathy diagnosis”.\nDiscussion: BRSET is the first multilabel ophthalmological dataset in Brazil and Latin America. It provides an opportunity for investigating model biases by evaluating performance across demographic groups. The model performance of three prediction tasks demonstrates the value of the dataset for external validation and for teaching medical computer vision to learners in Latin America using locally relevant data sources.","PeriodicalId":501390,"journal":{"name":"medRxiv - Ophthalmology","volume":"9 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"medRxiv - Ophthalmology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1101/2024.01.23.24301660","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Introduction: The Brazilian Multilabel Ophthalmological Dataset (BRSET) addresses the scarcity of publicly available ophthalmological datasets in Latin America. BRSET comprises 16,266 color fundus retinal photos from 8,524 Brazilian patients, aiming to enhance data representativeness, serving as a research and teaching tool. It contains sociodemographic information, enabling investigations into differential model performance across demographic groups. Methods: Data from three São Paulo outpatient centers yielded demographic and medical information from electronic records, including nationality, age, sex, clinical history, insulin use, and duration of diabetes diagnosis. A retinal specialist labeled images for anatomical features (optic disc, blood vessels, macula), quality control (focus, illumination, image field, artifacts), and pathologies (e.g., diabetic retinopathy). Diabetic retinopathy was graded using International Clinic Diabetic Retinopathy and Scottish Diabetic Retinopathy Grading. Validation used Dino V2 Base for feature extraction, with 70% training and 30% testing subsets. Support Vector Machines (SVM) and Logistic Regression (LR) were employed with weighted training. Performance metrics included area under the receiver operating curve (AUC) and Macro F1-score. Results: BRSET comprises 65.1% Canon CR2 and 34.9% Nikon NF5050 images. 61.8% of the patients are female, and the average age is 57.6 years. Diabetic retinopathy affected 15.8% of patients, across a spectrum of disease severity. Anatomically, 20.2% showed abnormal optic discs, 4.9% abnormal blood vessels, and 28.8% abnormal macula. Models were trained on BRSET in three prediction tasks: “diabetes diagnosis”; “sex classification”; and “diabetic retinopathy diagnosis”. Discussion: BRSET is the first multilabel ophthalmological dataset in Brazil and Latin America. It provides an opportunity for investigating model biases by evaluating performance across demographic groups. The model performance of three prediction tasks demonstrates the value of the dataset for external validation and for teaching medical computer vision to learners in Latin America using locally relevant data sources.
BRSET:巴西视网膜眼底照片多标签眼科数据集
简介巴西多标签眼科数据集(BRSET)解决了拉丁美洲缺乏公开眼科数据集的问题。巴西多标签眼科数据集由来自 8524 名巴西患者的 16266 张彩色眼底视网膜照片组成,旨在提高数据的代表性,作为研究和教学工具。它包含社会人口信息,可用于研究不同人口群体的模型性能差异:来自圣保罗三个门诊中心的数据提供了电子病历中的人口统计学和医学信息,包括国籍、年龄、性别、临床病史、胰岛素使用情况和糖尿病诊断持续时间。视网膜专家对图像的解剖特征(视盘、血管、黄斑)、质量控制(聚焦、照明、像场、伪影)和病理(如糖尿病视网膜病变)进行标记。糖尿病视网膜病变采用国际临床糖尿病视网膜病变和苏格兰糖尿病视网膜病变分级法进行分级。验证使用 Dino V2 Base 进行特征提取,其中 70% 为训练子集,30% 为测试子集。采用支持向量机(SVM)和逻辑回归(LR)进行加权训练。性能指标包括接收器工作曲线下面积(AUC)和宏观 F1 分数:BRSET包括65.1%的佳能CR2和34.9%的尼康NF5050图像。61.8%的患者为女性,平均年龄为 57.6 岁。15.8%的患者患有糖尿病视网膜病变,病情严重程度不一。从解剖学角度看,20.2%的患者视盘异常,4.9%的患者血管异常,28.8%的患者黄斑异常。在 BRSET 上对三个预测任务的模型进行了训练:"糖尿病诊断"、"性别分类 "和 "糖尿病视网膜病变诊断":BRSET是巴西和拉丁美洲首个多标签眼科数据集。它为通过评估不同人口群体的性能来研究模型偏差提供了机会。三个预测任务的模型性能证明了该数据集的外部验证价值,以及利用本地相关数据源向拉丁美洲学习者传授医学计算机视觉知识的价值。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信