Atomic Data and Nuclear Data Tables最新文献_第5页

Using Landsat-5 for Accurate Historical LULC Classification: A Comparison of Machine Learning Models 使用Landsat-5进行准确的历史LULC分类:机器学习模型的比较

IF 1.8 3区物理与天体物理

Atomic Data and Nuclear Data Tables Pub Date : 2023-08-30 DOI: 10.3390/data8090138

D. Krivoguz, S. Chernyi, Elena Zinchenko, Artem Silkin, A. Zinchenko

{"title":"Using Landsat-5 for Accurate Historical LULC Classification: A Comparison of Machine Learning Models","authors":"D. Krivoguz, S. Chernyi, Elena Zinchenko, Artem Silkin, A. Zinchenko","doi":"10.3390/data8090138","DOIUrl":"https://doi.org/10.3390/data8090138","url":null,"abstract":"This study investigates the application of various machine learning models for land use and land cover (LULC) classification in the Kerch Peninsula. The study utilizes archival field data, cadastral data, and published scientific literature for model training and testing, using Landsat-5 imagery from 1990 as input data. Four machine learning models (deep neural network, Random Forest, support vector machine (SVM), and AdaBoost) are employed, and their hyperparameters are tuned using random search and grid search. Model performance is evaluated through cross-validation and confusion matrices. The deep neural network achieves the highest accuracy (96.2%) and performs well in classifying water, urban lands, open soils, and high vegetation. However, it faces challenges in classifying grasslands, bare lands, and agricultural areas. The Random Forest model achieves an accuracy of 90.5% but struggles with differentiating high vegetation from agricultural lands. The SVM model achieves an accuracy of 86.1%, while the AdaBoost model performs the lowest with an accuracy of 58.4%. The novel contributions of this study include the comparison and evaluation of multiple machine learning models for land use classification in the Kerch Peninsula. The deep neural network and Random Forest models outperform SVM and AdaBoost in terms of accuracy. However, the use of limited data sources such as cadastral data and scientific articles may introduce limitations and potential errors. Future research should consider incorporating field studies and additional data sources for improved accuracy. This study provides valuable insights for land use classification, facilitating the assessment and management of natural resources in the Kerch Peninsula. The findings contribute to informed decision-making processes and lay the groundwork for further research in the field.","PeriodicalId":55580,"journal":{"name":"Atomic Data and Nuclear Data Tables","volume":"59 1","pages":""},"PeriodicalIF":1.8,"publicationDate":"2023-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84280879","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Knowledge Graph Dataset for Semantic Enrichment of Picture Description in NAPS Database 基于aps数据库的图片描述语义丰富的知识图谱数据集

IF 1.8 3区物理与天体物理

Atomic Data and Nuclear Data Tables Pub Date : 2023-08-24 DOI: 10.3390/data8090136

M. Horvat, G. Gledec, Tomislav Jagušt, Z. Kalafatić

{"title":"Knowledge Graph Dataset for Semantic Enrichment of Picture Description in NAPS Database","authors":"M. Horvat, G. Gledec, Tomislav Jagušt, Z. Kalafatić","doi":"10.3390/data8090136","DOIUrl":"https://doi.org/10.3390/data8090136","url":null,"abstract":"This data description introduces a comprehensive knowledge graph (KG) dataset with detailed information about the relevant high-level semantics of visual stimuli used to induce emotional states stored in the Nencki Affective Picture System (NAPS) repository. The dataset contains 6808 systematically manually assigned annotations for 1356 NAPS pictures in 5 categories, linked to WordNet synsets and Suggested Upper Merged Ontology (SUMO) concepts presented in a tabular format. Both knowledge databases provide an extensive and supervised taxonomy glossary suitable for describing picture semantics. The annotation glossary consists of 935 WordNet and 513 SUMO entities. A description of the dataset and the specific processes used to collect, process, review, and publish the dataset as open data are also provided. This dataset is unique in that it captures complex objects, scenes, actions, and the overall context of emotional stimuli with knowledge taxonomies at a high level of quality. It provides a valuable resource for a variety of projects investigating emotion, attention, and related phenomena. In addition, researchers can use this dataset to explore the relationship between emotions and high-level semantics or to develop data-retrieval tools to generate personalized stimuli sequences. The dataset is freely available in common formats (Excel and CSV).","PeriodicalId":55580,"journal":{"name":"Atomic Data and Nuclear Data Tables","volume":"4 1","pages":""},"PeriodicalIF":1.8,"publicationDate":"2023-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90351238","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Enhancing Small Tabular Clinical Trial Dataset through Hybrid Data Augmentation: Combining SMOTE and WCGAN-GP 通过混合数据增强增强小表格临床试验数据集:SMOTE和wggan - gp的结合

IF 1.8 3区物理与天体物理

Atomic Data and Nuclear Data Tables Pub Date : 2023-08-23 DOI: 10.3390/data8090135

Winston Wang, Tun-Wen Pai

{"title":"Enhancing Small Tabular Clinical Trial Dataset through Hybrid Data Augmentation: Combining SMOTE and WCGAN-GP","authors":"Winston Wang, Tun-Wen Pai","doi":"10.3390/data8090135","DOIUrl":"https://doi.org/10.3390/data8090135","url":null,"abstract":"This study addressed the challenge of training generative adversarial networks (GANs) on small tabular clinical trial datasets for data augmentation, which are known to pose difficulties in training due to limited sample sizes. To overcome this obstacle, a hybrid approach is proposed, combining the synthetic minority oversampling technique (SMOTE) to initially augment the original data to a more substantial size for improving the subsequent GAN training with a Wasserstein conditional generative adversarial network with gradient penalty (WCGAN-GP), proven for its state-of-art performance and enhanced stability. The ultimate objective of this research was to demonstrate that the quality of synthetic tabular data generated by the final WCGAN-GP model maintains the structural integrity and statistical representation of the original small dataset using this hybrid approach. This focus is particularly relevant for clinical trials, where limited data availability due to privacy concerns and restricted accessibility to subject enrollment pose common challenges. Despite the limitation of data, the findings demonstrate that the hybrid approach successfully generates synthetic data that closely preserved the characteristics of the original small dataset. By harnessing the power of this hybrid approach to generate faithful synthetic data, the potential for enhancing data-driven research in drug clinical trials become evident. This includes enabling a robust analysis on small datasets, supplementing the lack of clinical trial data, facilitating its utility in machine learning tasks, even extending to using the model for anomaly detection to ensure better quality control during clinical trial data collection, all while prioritizing data privacy and implementing strict data protection measures.","PeriodicalId":55580,"journal":{"name":"Atomic Data and Nuclear Data Tables","volume":"5 1","pages":""},"PeriodicalIF":1.8,"publicationDate":"2023-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83725294","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Quantifying Webpage Performance: A Comparative Analysis of TCP/IP and QUIC Communication Protocols for Improved Efficiency 量化网页性能:TCP/IP和QUIC通信协议提高效率的比较分析

IF 1.8 3区物理与天体物理

Atomic Data and Nuclear Data Tables Pub Date : 2023-08-19 DOI: 10.3390/data8080134

T. C. C. Nepomuceno, K. Nepomuceno, Fabiano Carlos da Silva, Silas Garrido Teixeira de Carvalho Santos

{"title":"Quantifying Webpage Performance: A Comparative Analysis of TCP/IP and QUIC Communication Protocols for Improved Efficiency","authors":"T. C. C. Nepomuceno, K. Nepomuceno, Fabiano Carlos da Silva, Silas Garrido Teixeira de Carvalho Santos","doi":"10.3390/data8080134","DOIUrl":"https://doi.org/10.3390/data8080134","url":null,"abstract":"Browsing is a prevalent activity on the World Wide Web, and users usually demonstrate significant expectations for expeditious information retrieval and seamless transactions. This article presents a comprehensive performance evaluation of the most frequently accessed webpages in recent years using Data Envelopment Analysis (DEA) adapted to the context (inverse DEA), comparing their performance under two distinct communication protocols: TCP/IP and QUIC. To assess performance disparities, parametric and non-parametric hypothesis tests are employed to investigate the appropriateness of each website’s communication protocols. We provide data on the inputs, outputs, and efficiency scores for 82 out of the world’s top 100 most-accessed websites, describing how experiments and analyses were conducted. The evaluation yields quantitative metrics pertaining to the technical efficiency of the websites and efficient benchmarks for best practices. Nine websites are considered efficient from the point of view of at least one of the communication protocols. Considering TCP/IP, about 80.5% of all units (66 webpages) need to reduce more than 50% of their page load time to be competitive, while this number is 28.05% (23 webpages), considering QUIC communication protocol. In addition, results suggest that TCP/IP protocol has an unfavorable effect on the overall distribution of inefficiencies.","PeriodicalId":55580,"journal":{"name":"Atomic Data and Nuclear Data Tables","volume":"1 1","pages":""},"PeriodicalIF":1.8,"publicationDate":"2023-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80034243","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

VR Traffic Dataset on Broad Range of End-User Activities 基于广泛终端用户活动的VR流量数据集

IF 1.8 3区物理与天体物理

Atomic Data and Nuclear Data Tables Pub Date : 2023-08-17 DOI: 10.3390/data8080132

Marina Polupanova

引用次数: 0

Leveraging Return Prediction Approaches for Improved Value-at-Risk Estimation 利用回报预测方法改进风险价值评估

IF 1.8 3区物理与天体物理

Atomic Data and Nuclear Data Tables Pub Date : 2023-08-17 DOI: 10.3390/data8080133

F. Bagheri, Diego Reforgiato Recupero, Espen Sirnes

{"title":"Leveraging Return Prediction Approaches for Improved Value-at-Risk Estimation","authors":"F. Bagheri, Diego Reforgiato Recupero, Espen Sirnes","doi":"10.3390/data8080133","DOIUrl":"https://doi.org/10.3390/data8080133","url":null,"abstract":"Value at risk is a statistic used to anticipate the largest possible losses over a specific time frame and within some level of confidence, usually 95% or 99%. For risk management and regulators, it offers a solution for trustworthy quantitative risk management tools. VaR has become the most widely used and accepted indicator of downside risk. Today, commercial banks and financial institutions utilize it as a tool to estimate the size and probability of upcoming losses in portfolios and, as a result, to estimate and manage the degree of risk exposure. The goal is to obtain the average number of VaR “failures” or “breaches” (losses that are more than the VaR) as near to the target rate as possible. It is also desired that the losses be evenly distributed as possible. VaR can be modeled in a variety of ways. The simplest method is to estimate volatility based on prior returns according to the assumption that volatility is constant. Otherwise, the volatility process can be modeled using the GARCH model. Machine learning techniques have been used in recent years to carry out stock market forecasts based on historical time series. A machine learning system is often trained on an in-sample dataset, where it can adjust and improve specific hyperparameters in accordance with the underlying metric. The trained model is tested on an out-of-sample dataset. We compared the baselines for the VaR estimation of a day (d) according to different metrics (i) to their respective variants that included stock return forecast information of d and stock return data of the days before d and (ii) to a GARCH model that included return prediction information of d and stock return data of the days before d. Various strategies such as ARIMA and a proposed ensemble of regressors have been employed to predict stock returns. We observed that the versions of the univariate techniques and GARCH integrated with return predictions outperformed the baselines in four different marketplaces.","PeriodicalId":55580,"journal":{"name":"Atomic Data and Nuclear Data Tables","volume":"47 1","pages":""},"PeriodicalIF":1.8,"publicationDate":"2023-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86412668","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Draft Genome Sequence Data of Streptomyces anulatus, Strain K-31 环状链霉菌K-31基因组序列数据草图

IF 1.8 3区物理与天体物理

Atomic Data and Nuclear Data Tables Pub Date : 2023-08-10 DOI: 10.3390/data8080131

A. Bogoyavlenskiy, M. Alexyuk, A. Sadanov, V. Berezin, L. Trenozhnikova, G. Baymakhanova

引用次数: 0

Towards Action-State Process Model Discovery 迈向动作状态过程模型发现

IF 1.8 3区物理与天体物理

Atomic Data and Nuclear Data Tables Pub Date : 2023-08-09 DOI: 10.3390/data8080130

A. Bottrighi, Marco Guazzone, G. Leonardi, S. Montani, Manuel Striani, P. Terenziani

引用次数: 0

Anomaly Detection in Student Activity in Solving Unique Programming Exercises: Motivated Students against Suspicious Ones 在解决独特的程式设计习题中，学生活动的异常侦测:有动机的学生对抗可疑的学生

IF 1.8 3区物理与天体物理

Atomic Data and Nuclear Data Tables Pub Date : 2023-08-08 DOI: 10.3390/data8080129

Liliya A. Demidova, Peter N. Sovietov, E. Andrianova, Anna A. Demidova

{"title":"Anomaly Detection in Student Activity in Solving Unique Programming Exercises: Motivated Students against Suspicious Ones","authors":"Liliya A. Demidova, Peter N. Sovietov, E. Andrianova, Anna A. Demidova","doi":"10.3390/data8080129","DOIUrl":"https://doi.org/10.3390/data8080129","url":null,"abstract":"This article presents a dataset containing messages from the Digital Teaching Assistant (DTA) system, which records the results from the automatic verification of students’ solutions to unique programming exercises of 11 various types. These results are automatically generated by the system, which automates a massive Python programming course at MIREA—Russian Technological University (RTU MIREA). The DTA system is trained to distinguish between approaches to solve programming exercises, as well as to identify correct and incorrect solutions, using intelligent algorithms responsible for analyzing the source code in the DTA system using vector representations of programs based on Markov chains, calculating pairwise Jensen–Shannon distances for programs and using a hierarchical clustering algorithm to detect high-level approaches used by students in solving unique programming exercises. In the process of learning, each student must correctly solve 11 unique exercises in order to receive admission to the intermediate certification in the form of a test. In addition, a motivated student may try to find additional approaches to solve exercises they have already solved. At the same time, not all students are able or willing to solve the 11 unique exercises proposed to them; some will resort to outside help in solving all or part of the exercises. Since all information about the interactions of the students with the DTA system is recorded, it is possible to identify different types of students. First of all, the students can be classified into 2 classes: those who failed to solve 11 exercises and those who received admission to the intermediate certification in the form of a test, having solved the 11 unique exercises correctly. However, it is possible to identify classes of typical, motivated and suspicious students among the latter group based on the proposed dataset. The proposed dataset can be used to develop regression models that will predict outbursts of student activity when interacting with the DTA system, to solve clustering problems, to identify groups of students with a similar behavior model in the learning process and to develop intelligent data classifiers that predict the students’ behavior model and draw appropriate conclusions, not only at the end of the learning process but also during the course of it in order to motivate all students, even those who are classified as suspicious, to visualize the results of the learning process using various tools.","PeriodicalId":55580,"journal":{"name":"Atomic Data and Nuclear Data Tables","volume":"49 1","pages":""},"PeriodicalIF":1.8,"publicationDate":"2023-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90927233","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

VEPL Dataset: A Vegetation Encroachment in Power Line Corridors Dataset for Semantic Segmentation of Drone Aerial Orthomosaics VEPL数据集:用于无人机航拍正像图语义分割的电力线走廊植被侵占数据集

IF 1.8 3区物理与天体物理

Atomic Data and Nuclear Data Tables Pub Date : 2023-08-04 DOI: 10.3390/data8080128

Mateo Cano-Solis, J. Ballesteros, John W. Branch-Bedoya

{"title":"VEPL Dataset: A Vegetation Encroachment in Power Line Corridors Dataset for Semantic Segmentation of Drone Aerial Orthomosaics","authors":"Mateo Cano-Solis, J. Ballesteros, John W. Branch-Bedoya","doi":"10.3390/data8080128","DOIUrl":"https://doi.org/10.3390/data8080128","url":null,"abstract":"Vegetation encroachment in power line corridors has multiple problems for modern energy-dependent societies. Failures due to the contact between power lines and vegetation can result in power outages and millions of dollars in losses. To address this problem, UAVs have emerged as a promising solution due to their ability to quickly and affordably monitor long corridors through autonomous flights or being remotely piloted. However, the extensive and manual task that requires analyzing every image acquired by the UAVs when searching for the existence of vegetation encroachment has led many authors to propose the use of Deep Learning to automate the detection process. Despite the advantages of using a combination of UAV imagery and Deep Learning, there is currently a lack of datasets that help to train Deep Learning models for this specific problem. This paper presents a dataset for the semantic segmentation of vegetation encroachment in power line corridors. RGB orthomosaics were obtained for a rural road area using a commercial UAV. The dataset is composed of pairs of tessellated RGB images, coming from the orthomosaic and corresponding multi-color masks representing three different classes: vegetation, power lines, and the background. A detailed description of the image acquisition process is provided, as well as the labeling task and the data augmentation techniques, among other relevant details to produce the dataset. Researchers would benefit from using the proposed dataset by developing and improving strategies for vegetation encroachment monitoring using UAVs and Deep Learning.","PeriodicalId":55580,"journal":{"name":"Atomic Data and Nuclear Data Tables","volume":"20 1","pages":""},"PeriodicalIF":1.8,"publicationDate":"2023-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84437962","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3