Data in Brief最新文献

筛选
英文 中文
Data of REEs (Ce, Nd, Th) analysis from Bangka tin tailing applying froth flotation method using sodium oleate and KClO3. 使用油酸钠和 KClO3 的泡沫浮选法对 Bangka 锡尾矿中的 REEs(Ce、Nd、Th)进行分析的数据。
IF 1
Data in Brief Pub Date : 2024-11-23 eCollection Date: 2024-12-01 DOI: 10.1016/j.dib.2024.111157
Wiwik Dahan, Djoko Hartanto, Ratna Ediati, Rita Sundari, Subandrio, Irfan Marwanza
{"title":"Data of REEs (Ce, Nd, Th) analysis from Bangka tin tailing applying froth flotation method using sodium oleate and KClO<sub>3</sub>.","authors":"Wiwik Dahan, Djoko Hartanto, Ratna Ediati, Rita Sundari, Subandrio, Irfan Marwanza","doi":"10.1016/j.dib.2024.111157","DOIUrl":"https://doi.org/10.1016/j.dib.2024.111157","url":null,"abstract":"<p><p>This article presented the data of REEs (Rare Earth Elements) analysis from exploitation of Bangka tin tailing, Indonesia. Nowadays, REEs have broad applications in modern industry such as computer memory, DVDs, rechargeable batteries, cell phones, catalytic converters, fluorescent lighting, negative ion generators, and much more. A 30 min. and 400 rpm froth flotation method has utilized 0.06 M sodium oleate flotation agent, 0.07 M KClO<sub>3</sub> depressant, and 2.0 M HCl for pH arrangement of 10.0 g sample to analyse REEs from 170 mesh Bangka tin tailing at 25 °C. The analysis is found to be cerium (1.60 %, pH 7.0), neodymium (0.70 %, pH 7.0), and thorium (0.95 %, pH 8.0) in the collector, while at the same time, the concentrations of cerium (9.45 %, pH 7.0), neodymium (3.15 % pH 7.0), and thorium (3.90 %, pH 8.0) in the tailing (depressant) applying froth flotation method. Based on the variation of KClO<sub>3</sub> concentrations at the given condition (0.06 M sodium oleate, pH 7.0, 30 min., 400 rpm. 25 °C), the recovery of REEs in the collector using froth flotation method is as follows: the highest concentrations of cerium (11.80 %) and neodymium (3.93 %) were obtained at 0.005 M KClO<sub>3</sub> and pH 7.0, while the highest concentrations of thorium (4.50 %) obtained at KClO<sub>3</sub> free (none KClO<sub>3</sub>) at the same pH. The utilization of sodium oleate flotation agent and KClO<sub>3</sub> depressant in the flotation REE recovery from tin tailing can be viewed as a new contribution of this study since unpublished previous work applied palmitate collector and no any depressant for REE recovery from the same mining area. The contribution of REE analysis from Bangka tin tailing applying froth flotation method is valuable for mining industry.</p>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"57 ","pages":"111157"},"PeriodicalIF":1.0,"publicationDate":"2024-11-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11663978/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142881738","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A lung cancer diagnosis and treatment dataset with geno- and phenotypical characteristics of the patient.
IF 1
Data in Brief Pub Date : 2024-11-23 eCollection Date: 2024-12-01 DOI: 10.1016/j.dib.2024.111167
Belén Ríos-Sánchez, Guillermo Vigueras, Aaron García, Daniel Gómez-Bravo, Ernestina Menasalvas, María Torrente, Consuelo Parejo, Fotis Aisopos, Dimitrios Vogiatzis, Disha Purohit, Mariano Provencio, María-Esther Vidal, Alejandro Rodríguez-González
{"title":"A lung cancer diagnosis and treatment dataset with geno- and phenotypical characteristics of the patient.","authors":"Belén Ríos-Sánchez, Guillermo Vigueras, Aaron García, Daniel Gómez-Bravo, Ernestina Menasalvas, María Torrente, Consuelo Parejo, Fotis Aisopos, Dimitrios Vogiatzis, Disha Purohit, Mariano Provencio, María-Esther Vidal, Alejandro Rodríguez-González","doi":"10.1016/j.dib.2024.111167","DOIUrl":"10.1016/j.dib.2024.111167","url":null,"abstract":"<p><p>This dataset comprises information about 1242 lung cancer patients collected by the Medical Oncology Department of the Puerta de Hierro University Hospital of Majadahonda in Madrid, Spain. It includes information about cancer diagnosis and treatment, as well as personal and medical data recorded during anamneses. The dataset could assist in data analysis with the aim of discovering relationships between the applied treatment(s), the evolution of the disease and the associated adverse effects. A greater understanding of treatment effects based on the particular conditions of the patient and the diagnosis could directly impact the healthcare system, helping to improve expectations about lung cancer as well as reducing treatment toxicities and adverse effects.</p>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"57 ","pages":"111167"},"PeriodicalIF":1.0,"publicationDate":"2024-11-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11683279/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142906660","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dataset from Scintrex CG-5 gravity meters acquired at the Zhetygen calibration line in Kazakhstan.
IF 1
Data in Brief Pub Date : 2024-11-23 eCollection Date: 2024-12-01 DOI: 10.1016/j.dib.2024.111151
Roman Sermiagin, Yeraly Kalen, Nurgan Kemerbayev, Khaini-Kamal Kassymkanova, Nikolay Kosarev, Guzyaliya Mussina, Assel Batalova
{"title":"Dataset from Scintrex CG-5 gravity meters acquired at the Zhetygen calibration line in Kazakhstan.","authors":"Roman Sermiagin, Yeraly Kalen, Nurgan Kemerbayev, Khaini-Kamal Kassymkanova, Nikolay Kosarev, Guzyaliya Mussina, Assel Batalova","doi":"10.1016/j.dib.2024.111151","DOIUrl":"https://doi.org/10.1016/j.dib.2024.111151","url":null,"abstract":"<p><p>The article provides a dataset derived from Scintrex CG-5 gravity meter observation files collected during five years of annual measurements along the Zhetygen calibration line utilizing three meters. Geoken, a Kazakhstani enterprise, routinely conducts these measurements to calibrate its meters necessary for manufacturing operations. Researchers can use this constantly updated dataset to study the behavior of the CG-5 gravity meters' calibration function in time and the measurement range. The measurement range is 100,900 µGal.</p>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"57 ","pages":"111151"},"PeriodicalIF":1.0,"publicationDate":"2024-11-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11664004/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142881755","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Blood smear imagery dataset for malaria parasite detection: A case of Tanzania.
IF 1
Data in Brief Pub Date : 2024-11-23 eCollection Date: 2024-12-01 DOI: 10.1016/j.dib.2024.111169
Beston Lufyagila, Bonny Mgawe, Anael Sam
{"title":"Blood smear imagery dataset for malaria parasite detection: A case of Tanzania.","authors":"Beston Lufyagila, Bonny Mgawe, Anael Sam","doi":"10.1016/j.dib.2024.111169","DOIUrl":"10.1016/j.dib.2024.111169","url":null,"abstract":"<p><p>Malaria is a major public health issue in many regions of Africa, including Tanzania. The Tanzania Malaria National Strategic Plan (2021-2025) emphasizes on high-quality testing services availability, high coverage of timely diagnosis of malaria, and availability of innovative diagnostic systems for effective detection, treatment and control of malaria. This would be achieved by employing state of the art technologies like Machine learning. However, Machine learning requires diverse dataset to work effectively and efficiently. Therefore, this paper presents blood smear imagery dataset that can be used by researchers to develop computer vision systems for malaria parasite detection. The imagery dataset were acquired by setting up a 40X-2500X Real 4 K compound microscope with a 4k SONY IMX334 sensor camera mounted to it in five health centres of Tanga region. Blood samples taken according to normal routine of diagnosing patients in health care, were stained using Giemsa reagent and examined under microscope. Following these procedures, the study collected and annotated Thick infected blood smear images ( <math><mrow><mi>n</mi> <mo>=</mo> <mn>1139</mn> <mo>)</mo></mrow> </math> ; Thick uninfected blood smear images ( <math><mrow><mi>n</mi> <mo>=</mo> <mn>1071</mn></mrow> </math> ); Thin uninfected blood smear images ( <math><mrow><mi>n</mi> <mo>=</mo> <mn>270</mn></mrow> </math> ); and Thin infected blood smear images ( <math><mrow><mi>n</mi> <mo>=</mo> <mn>1064</mn></mrow> </math> ). Furthermore, the curated dataset have been uploaded in a public Harvard data verse repository. In summary, the dataset aims to support the creation of diagnostic tools that improve malaria detection, thereby advancing health outcomes and aiding malaria control initiatives in Tanzania and other regions impacted by the disease.</p>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"57 ","pages":"111169"},"PeriodicalIF":1.0,"publicationDate":"2024-11-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11648091/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142834233","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Orthopantomogram teeth segmentation and numbering dataset.
IF 1
Data in Brief Pub Date : 2024-11-23 eCollection Date: 2024-12-01 DOI: 10.1016/j.dib.2024.111152
Niha Adnan, Fahad Umer
{"title":"Orthopantomogram teeth segmentation and numbering dataset.","authors":"Niha Adnan, Fahad Umer","doi":"10.1016/j.dib.2024.111152","DOIUrl":"10.1016/j.dib.2024.111152","url":null,"abstract":"<p><p>With the digitization of radiographs, vast amounts of data have become accessible, enabling the curation and development of extensive datasets. Among radiographic modalities, Orthopantomograms (OPGs) are widely utilized in clinical practice. The integration of automated diagnostic processes into routine clinical practice holds great potential as an adjunct for dentists.Various OPG datasets exist, however their limitations affect the robustness of Artificial Intelligence (AI) models trained on them. This paper introduces an OPG dataset specifically designed for training AI algorithms in teeth segmentation and numbering tasks. A key feature of this dataset is its dual annotation, which allows for individual tooth segmentation by class, as well as numbering according to the Fédération Dentaire Internationale system.This dual-annotated dataset enhances the existing pool of OPG datasets and can be leveraged for further training of pre-trained algorithms or the development of new ones. Moreover, it offers researchers to carry out annotations tailored to their respective research objectives, thereby facilitating the development of AI models capable of addressing diverse diagnostic tasks.</p>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"57 ","pages":"111152"},"PeriodicalIF":1.0,"publicationDate":"2024-11-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11648156/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142834359","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A comprehensive dataset of near infrared spectroscopy measurements to predict nitrogen and carbon contents in a wide range of tissues from Brassica napus plants grown under contrasted environments. 近红外光谱测量的综合数据集,用于预测在对比环境下生长的甘蓝型油菜植物各种组织中的氮和碳含量。
IF 1
Data in Brief Pub Date : 2024-11-23 eCollection Date: 2024-12-01 DOI: 10.1016/j.dib.2024.111163
Sophie Rolland, Françoise Leprince, Solenn Guichard, Françoise Le Cahérec, Anne Laperche, Nathalie Nesi
{"title":"A comprehensive dataset of near infrared spectroscopy measurements to predict nitrogen and carbon contents in a wide range of tissues from <i>Brassica napus</i> plants grown under contrasted environments.","authors":"Sophie Rolland, Françoise Leprince, Solenn Guichard, Françoise Le Cahérec, Anne Laperche, Nathalie Nesi","doi":"10.1016/j.dib.2024.111163","DOIUrl":"10.1016/j.dib.2024.111163","url":null,"abstract":"<p><p>Winter oilseed rape (WOSR, <i>Brassica napus</i> L.) is the third largest oil crop worldwide that also provides a source of high quality plant-based proteins. Nitrogen (N) and carbon (C) play a key role in plant growth. Determination of N and C contents of plant tissues throughout the growth cycle is crucial in assessing plant nutritional status and allowing precise input management. In the dataset presented in this article, 2427 WOSR samples arising from a large diversity of tissues collected on WOSR diversity were analyzed by near infrared spectroscopy from 4000 to 12,000 cm<sup>-1</sup>. At the same time, reference chemical data for the N and C contents of the same samples were determined by elemental analysis using the Dumas method. Partial least squares regression has been used to develop predictive models linking spectral and chemical data, so that new samples can be characterized without the need for reference methods. This dataset could be used to test new calculation algorithms in order to enhance prediction performance or for training purposes. These models can be used as a rapid method for determining N and/or C content, adding to decision-support tools for fertilizer application throughout the plant developmental cycle.</p>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"57 ","pages":"111163"},"PeriodicalIF":1.0,"publicationDate":"2024-11-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11663971/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142881733","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A dataset of Roman Urdu text with spelling variations for sentence level sentiment analysis.
IF 1
Data in Brief Pub Date : 2024-11-23 eCollection Date: 2024-12-01 DOI: 10.1016/j.dib.2024.111170
Mudasar Ahmed Soomro, Rafia Naz Memon, Asghar Ali Chandio, Mehwish Leghari, Muhammad Hanif Soomro
{"title":"A dataset of Roman Urdu text with spelling variations for sentence level sentiment analysis.","authors":"Mudasar Ahmed Soomro, Rafia Naz Memon, Asghar Ali Chandio, Mehwish Leghari, Muhammad Hanif Soomro","doi":"10.1016/j.dib.2024.111170","DOIUrl":"https://doi.org/10.1016/j.dib.2024.111170","url":null,"abstract":"<p><p>Roman Urdu text is very widespread on many websites. People mostly prefer to give their social comments or product reviews in Roman Urdu, and Roman Urdu is counted as non-standard language. The main reason for this is that there is no rule for word spellings within Roman Urdu words, so people create and post their own word spellings, like \"2mro\" is a nonstandard spelling for tomorrow. This paper aims to collect two Roman Urdu datasets: one is roman Urdu words with various spelling variations. This dataset contains 5244 Roman Urdu words, within which we have included variations in word spellings ranging from (one) to (five) different spellings for each word. The second dataset consists of Roman Urdu reviews, which were collected from (seven) different internet-based sources. This dataset contains multiclass reviews, namely \"very positive,\" \"positive,\" \"very negative,\" \"negative,\" and \"neutral\", respectively. We gathered a total of 28,090 reviews. The sentiments of the reviews were made by the domain experts who were familiar with the Urdu language.</p>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"57 ","pages":"111170"},"PeriodicalIF":1.0,"publicationDate":"2024-11-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11683287/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142906658","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dataset of Sentinel-1 SAR and Sentinel-2 RGB-NDVI imagery.
IF 1
Data in Brief Pub Date : 2024-11-20 eCollection Date: 2024-12-01 DOI: 10.1016/j.dib.2024.111160
Ahmed Alejandro Cardona-Mesa, Rubén Darío Vásquez-Salazar, Luis Gómez, Carlos M Travieso-González, Andrés F Garavito-González, Esteban Vásquez-Cano, Jean Pierre Díaz-Paz
{"title":"Dataset of Sentinel-1 SAR and Sentinel-2 RGB-NDVI imagery.","authors":"Ahmed Alejandro Cardona-Mesa, Rubén Darío Vásquez-Salazar, Luis Gómez, Carlos M Travieso-González, Andrés F Garavito-González, Esteban Vásquez-Cano, Jean Pierre Díaz-Paz","doi":"10.1016/j.dib.2024.111160","DOIUrl":"10.1016/j.dib.2024.111160","url":null,"abstract":"<p><p>This article presents a comprehensive dataset combining Synthetic Aperture Radar (SAR) imagery from the Sentinel-1 mission with optical imagery, including RGB and Normalized Difference Vegetation Index (NDVI), from the Sentinel-2 mission. The dataset consists of 8800 images, organized into four folders-SAR_VV, SAR_VH, RGB, and NDVI-each containing 2200 images with dimensions of 512 × 512 pixels. These images were collected from various global locations using random geographic coordinates and strict criteria for cloud cover, snow presence, and water percentage, ensuring high-quality and diverse data. The primary motivation for creating this dataset is to address the limitations of optical sensors, which are often hindered by cloud cover and atmospheric conditions. By integrating SAR data, which is unaffected by these factors, the dataset offers a robust tool for a wide range of applications, including land cover classification, vegetation monitoring, and environmental change detection. The dataset is particularly valuable for training machine learning models that require multimodal inputs, such as translating SAR images to optical imagery or enhancing the quality of noisy data. Additionally, the structure of the dataset and the preprocessing steps applied make it readily usable for various research purposes. The SAR images are processed to Level-1 Ground Range Detected (GRD) format, including radiometric calibration and terrain correction, while the optical images are filtered to ensure minimal cloud interference.</p>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"57 ","pages":"111160"},"PeriodicalIF":1.0,"publicationDate":"2024-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11648187/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142834250","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dataset for the identification of a ultra-low frequency multidirectional energy harvester for wind turbines.
IF 1
Data in Brief Pub Date : 2024-11-20 eCollection Date: 2024-12-01 DOI: 10.1016/j.dib.2024.111126
Julen Bacaicoa, Mikel Hualde-Otamendi, Mikel Merino-Olagüe, Aitor Plaza, Xabier Iriarte, Carlos Castellano-Aldave, Alfonso Carlosena
{"title":"Dataset for the identification of a ultra-low frequency multidirectional energy harvester for wind turbines.","authors":"Julen Bacaicoa, Mikel Hualde-Otamendi, Mikel Merino-Olagüe, Aitor Plaza, Xabier Iriarte, Carlos Castellano-Aldave, Alfonso Carlosena","doi":"10.1016/j.dib.2024.111126","DOIUrl":"10.1016/j.dib.2024.111126","url":null,"abstract":"<p><p>This paper presents a publicly available dataset designed to support the identification (characterization) and performance optimization of an ultra-low-frequency multidirectional vibration energy harvester. The dataset includes detailed measurements from experiments performed to fully characterize its dynamic behaviour. The experimental data encompasses both input (acceleration)-output (energy) relationships, as well as internal system dynamics, measured using a synchronized image processing and signal acquisition system. In addition to the raw input-output data, the dataset also provides post-processed information, such as the angular positions of the moving masses, their velocities and accelerations, derived from recorded high-speed videos at 240 <math><mrow><mi>H</mi> <mi>z</mi></mrow> </math> . The dataset also includes the measured power output generated in the coils. This dataset is intended to enable further research on vibration energy harvesters by providing experimental data for identification, model validation, and performance optimization, particularly in the context of energy harvesting in low-frequency and multidirectional environments, such as those encountered in wind turbines.</p>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"57 ","pages":"111126"},"PeriodicalIF":1.0,"publicationDate":"2024-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11647144/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142834246","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A dataset to train intrusion detection systems based on machine learning models for electrical substations. 基于变电站机器学习模型的入侵检测系统训练数据集。
IF 1
Data in Brief Pub Date : 2024-11-20 eCollection Date: 2024-12-01 DOI: 10.1016/j.dib.2024.111153
Esteban Damián Gutiérrez Mlot, Jose Saldana, Ricardo J Rodríguez, Igor Kotsiuba, Carlos Gañán
{"title":"A dataset to train intrusion detection systems based on machine learning models for electrical substations.","authors":"Esteban Damián Gutiérrez Mlot, Jose Saldana, Ricardo J Rodríguez, Igor Kotsiuba, Carlos Gañán","doi":"10.1016/j.dib.2024.111153","DOIUrl":"https://doi.org/10.1016/j.dib.2024.111153","url":null,"abstract":"<p><p>The growing integration of Information and Communication Technology into Operational Technology environments in electrical substations exposes them to new cybersecurity threats. This paper presents a comprehensive dataset of substation traffic, aimed at improving the training and benchmarking of Intrusion Detection Systems (IDS) installed in these facilities that are based on machine learning techniques. The dataset includes raw network captures and flows from real substations, filtered and anonymized to ensure privacy. It covers the main protocols and standards used in substation environments: IEC61850, IEC104, NTP, and PTP. Additionally, the dataset includes traces obtained during several cyberattacks, which were simulated in a controlled laboratory environment, providing a rich resource for developing and testing machine learning models for cybersecurity applications in substations. A set of complementary tools for dataset creation and preprocessing are also included to standardize the methodology, ensuring consistency and reproducibility. In summary, the dataset addresses the critical need for high-quality, targeted data for tuning IDS at electrical substations and contributes to the advancement of secure and reliable power distribution networks.</p>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"57 ","pages":"111153"},"PeriodicalIF":1.0,"publicationDate":"2024-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11647109/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142834084","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信