Kacoutchy Jean Ayikpa, Diarra Mamadou, P. Gouton, Kablan Jérôme Adou
{"title":"Classification of Cocoa Pod Maturity Using Similarity Tools on an Image Database: Comparison of Feature Extractors and Color Spaces","authors":"Kacoutchy Jean Ayikpa, Diarra Mamadou, P. Gouton, Kablan Jérôme Adou","doi":"10.3390/data8060099","DOIUrl":"https://doi.org/10.3390/data8060099","url":null,"abstract":"Côte d’Ivoire, the world’s largest cocoa producer, faces the challenge of quality production. Immature or overripe pods cannot produce quality cocoa beans, resulting in losses and an unprofitable harvest. To help farmer cooperatives determine the maturity of cocoa pods in time, our study evaluates the use of automation tools based on similarity measures. Although standard techniques, such as visual inspection and weighing, are commonly used to identify the maturity of cocoa pods, the use of automation tools based on similarity measures can improve the efficiency and accuracy of this process. We set up a database of cocoa pod images and used two feature extractors: one based on convolutional neural networks (CNN), in particular, MobileNet, and the other based on texture analysis using a gray-level co-occurrence matrix (GLCM). We evaluated the impact of different color spaces and feature extraction methods on our database. We used mathematical similarity measurement tools, such as the Euclidean distance, correlation distance, and chi-square distance, to classify cocoa pod images. Our experiments showed that the chi-square distance measurement offered the best accuracy, with a score of 99.61%, when we used GLCM as a feature extractor and the Lab color space. Using automation tools based on similarity measures can improve the efficiency and accuracy of cocoa pod maturity determination. The results of our experiments prove that the chi-square distance is the most appropriate measure of similarity for this task.","PeriodicalId":55580,"journal":{"name":"Atomic Data and Nuclear Data Tables","volume":"45 1","pages":"99"},"PeriodicalIF":1.8,"publicationDate":"2023-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85680359","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Carlos Miranda, G. Sanchez-Torres, Dixon Salcedo Morillo
{"title":"Exploring the Evolution of Sentiment in Spanish Pandemic Tweets: A Data Analysis Based on a Fine-Tuned BERT Architecture","authors":"Carlos Miranda, G. Sanchez-Torres, Dixon Salcedo Morillo","doi":"10.3390/data8060096","DOIUrl":"https://doi.org/10.3390/data8060096","url":null,"abstract":"The COVID-19 pandemic has had a significant impact on various aspects of society, including economic, health, political, and work-related domains. The pandemic has also caused an emotional effect on individuals, reflected in their opinions and comments on social media platforms, such as Twitter. This study explores the evolution of sentiment in Spanish pandemic tweets through a data analysis based on a fine-tuned BERT architecture. A total of six million tweets were collected using web scraping techniques, and pre-processing was applied to filter and clean the data. The fine-tuned BERT architecture was utilized to perform sentiment analysis, which allowed for a deep-learning approach to sentiment classification. The analysis results were graphically represented based on search criteria, such as “COVID-19” and “coronavirus”. This study reveals sentiment trends, significant concerns, relationship with announced news, public reactions, and information dissemination, among other aspects. These findings provide insight into the emotional impact of the COVID-19 pandemic on individuals and the corresponding impact on social media platforms.","PeriodicalId":55580,"journal":{"name":"Atomic Data and Nuclear Data Tables","volume":"46 1","pages":"96"},"PeriodicalIF":1.8,"publicationDate":"2023-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75047136","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Andreas Miltiadous, Katerina D. Tzimourta, Theodora Afrantou, P. Ioannidis, N. Grigoriadis, D. Tsalikakis, P. Angelidis, M. Tsipouras, E. Glavas, N. Giannakeas, A. Tzallas
{"title":"A Dataset of Scalp EEG Recordings of Alzheimer's Disease, Frontotemporal Dementia and Healthy Subjects from Routine EEG","authors":"Andreas Miltiadous, Katerina D. Tzimourta, Theodora Afrantou, P. Ioannidis, N. Grigoriadis, D. Tsalikakis, P. Angelidis, M. Tsipouras, E. Glavas, N. Giannakeas, A. Tzallas","doi":"10.3390/data8060095","DOIUrl":"https://doi.org/10.3390/data8060095","url":null,"abstract":"Recently, there has been a growing research interest in utilizing the electroencephalogram (EEG) as a non-invasive diagnostic tool for neurodegenerative diseases. This article provides a detailed description of a resting-state EEG dataset of individuals with Alzheimer’s disease and frontotemporal dementia, and healthy controls. The dataset was collected using a clinical EEG system with 19 scalp electrodes while participants were in a resting state with their eyes closed. The data collection process included rigorous quality control measures to ensure data accuracy and consistency. The dataset contains recordings of 36 Alzheimer’s patients, 23 frontotemporal dementia patients, and 29 healthy age-matched subjects. For each subject, the Mini-Mental State Examination score is reported. A monopolar montage was used to collect the signals. A raw and preprocessed EEG is included in the standard BIDS format. For the preprocessed signals, established methods such as artifact subspace reconstruction and an independent component analysis have been employed for denoising. The dataset has significant reuse potential since Alzheimer’s EEG Machine Learning studies are increasing in popularity and there is a lack of publicly available EEG datasets. The resting-state EEG data can be used to explore alterations in brain activity and connectivity in these conditions, and to develop new diagnostic and treatment approaches. Additionally, the dataset can be used to compare EEG characteristics between different types of dementia, which could provide insights into the underlying mechanisms of these conditions.","PeriodicalId":55580,"journal":{"name":"Atomic Data and Nuclear Data Tables","volume":"1 1","pages":"95"},"PeriodicalIF":1.8,"publicationDate":"2023-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87165190","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Haadi Umer , Yuri Ralchenko , Igor Bray , Dmitry V. Fursa
{"title":"Electron scattering cross sections for the ground and excited states of tin","authors":"Haadi Umer , Yuri Ralchenko , Igor Bray , Dmitry V. Fursa","doi":"10.1016/j.adt.2023.101586","DOIUrl":"10.1016/j.adt.2023.101586","url":null,"abstract":"<div><p><span>A comprehensive set of cross sections for electron scattering from the ground and first four excited states of tin has been calculated using the Relativistic Convergent Close-Coupling method. Elastic scattering, momentum transfer, total scattering, and total-inelastic scattering cross sections have been produced for the </span><span><math><mrow><mn>5</mn><msup><mrow><mi>p</mi></mrow><mrow><mn>2</mn></mrow></msup></mrow></math></span>\u0000<span><math><mrow><msup><mrow></mrow><mrow><mn>3</mn></mrow></msup><msub><mrow><mi>P</mi></mrow><mrow><mn>0</mn><mo>,</mo><mn>1</mn><mo>,</mo><mn>2</mn></mrow></msub></mrow></math></span>, <span><math><mrow><msup><mrow></mrow><mrow><mn>1</mn></mrow></msup><msub><mrow><mi>D</mi></mrow><mrow><mn>2</mn></mrow></msub></mrow></math></span> and <span><math><mrow><msup><mrow></mrow><mrow><mn>1</mn></mrow></msup><msub><mrow><mi>S</mi></mrow><mrow><mn>0</mn></mrow></msub></mrow></math></span> states of atomic tin over a projectile energy range of 0.1 eV to 1000 eV. Over the same projectile energy range, state-resolved cross sections for excitations to the <span><math><mrow><mn>5</mn><msup><mrow><mi>p</mi></mrow><mrow><mn>2</mn></mrow></msup></mrow></math></span>, <span><math><mrow><mn>5</mn><mi>p</mi><mn>6</mn><mi>s</mi></mrow></math></span>, <span><math><mrow><mn>5</mn><mi>p</mi><mn>5</mn><mi>d</mi></mrow></math></span> and <span><math><mrow><mn>5</mn><mi>p</mi><mn>6</mn><mi>p</mi></mrow></math></span> manifolds from the ground and first four excited states of tin are presented. Total single-ionisation cross sections have been calculated which account for the direct ionisation of electrons in the valence <span><math><mrow><mn>5</mn><mi>p</mi></mrow></math></span> and closed <span><math><mrow><mn>5</mn><mi>s</mi></mrow></math></span><span><span> shells, as well as indirect contributions from excitation auto-ionisation. These ionisation cross sections are presented for projectile energies up to 1000 eV. Maxwellian rate coefficients have been calculated for all studied transitions over </span>electron temperatures ranging from 0.5 eV to 200 eV and fitted with simple formulas. The fit coefficients are tabulated for use in modelling applications.</span></p></div>","PeriodicalId":55580,"journal":{"name":"Atomic Data and Nuclear Data Tables","volume":"154 ","pages":"Article 101586"},"PeriodicalIF":1.8,"publicationDate":"2023-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44142778","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jihye Park, S. Na, Jung-Sook Yoon, Seoree Kim, S. Chun, Jae Jun Kim, Young-Du Kim, Young‐Ho Ahn, Keunsoo Kang, Y. Ko
{"title":"MicroRNA Profiling of Fresh Lung Adenocarcinoma and Adjacent Normal Tissues from Ten Korean Patients Using miRNA-Seq","authors":"Jihye Park, S. Na, Jung-Sook Yoon, Seoree Kim, S. Chun, Jae Jun Kim, Young-Du Kim, Young‐Ho Ahn, Keunsoo Kang, Y. Ko","doi":"10.3390/data8060094","DOIUrl":"https://doi.org/10.3390/data8060094","url":null,"abstract":"MicroRNA transcriptomes from fresh tumors and the adjacent normal tissues were profiled in 10 Korean patients diagnosed with lung adenocarcinoma using a next-generation sequencing (NGS) technique called miRNA-seq. The sequencing quality was assessed using FastQC, and low-quality or adapter-contaminated portions of the reads were removed using Trim Galore. Quality-assured reads were analyzed using miRDeep2 and Bowtie. The abundance of known miRNAs was estimated using the reads per million (RPM) normalization method. Subsequently, using DESeq2 and Wx, we identified differentially expressed miRNAs and potential miRNA biomarkers for lung adenocarcinoma tissues compared to adjacent normal tissues, respectively. We defined reliable miRNA biomarkers for lung adenocarcinoma as those detected by both methods. The miRNA-seq data are available in the Gene Expression Omnibus (GEO) database under accession number GSE196633, and all processed data can be accessed via the Mendeley data website.","PeriodicalId":55580,"journal":{"name":"Atomic Data and Nuclear Data Tables","volume":"98 1","pages":"94"},"PeriodicalIF":1.8,"publicationDate":"2023-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89110342","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
P. Inostroza, Eric Carmona, Å. Arrhenius, M. Krauss, W. Brack, T. Backhaus
{"title":"Target Screening of Chemicals of Emerging Concern (CECs) in Surface Waters of the Swedish West Coast","authors":"P. Inostroza, Eric Carmona, Å. Arrhenius, M. Krauss, W. Brack, T. Backhaus","doi":"10.3390/data8060093","DOIUrl":"https://doi.org/10.3390/data8060093","url":null,"abstract":"The aquatic environment faces increasing threats from a variety of unregulated organic chemicals originating from human activities, collectively known as chemicals of emerging concern (CECs). These include pharmaceuticals, personal-care products, pesticides, surfactants, industrial chemicals, and their transformation products. CECs enter aquatic environments through various sources, including effluents from wastewater treatment plants, industrial facilities, runoff from agricultural and residential areas, as well as accidental spills. Data on the occurrence of CECs in the marine environment are scarce, and more information is needed to assess the chemical and ecological status of water bodies, and to prioritize toxic chemicals for further studies or risk assessment. In this study, we describe a monitoring campaign targeting CECs in surface waters at the Swedish west coast using, for the first time, an on-site large volume solid phase extraction (LVSPE) device. We detected up to 80 and 227 CECs in marine sites and the wastewater treatment plant (WWTP) effluent, respectively. The dataset will contribute to defining pollution fingerprints and assessing the chemical status of marine and freshwater systems affected by industrial hubs, agricultural areas, and the discharge of urban wastewater.","PeriodicalId":55580,"journal":{"name":"Atomic Data and Nuclear Data Tables","volume":"1 1","pages":"93"},"PeriodicalIF":1.8,"publicationDate":"2023-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90874411","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jihye Park, Kyuho Kang, Y. Son, Kwang Seok Kim, Keunsoo Kang, Hae-June Lee
{"title":"Low-Dose Radiation-Induced Transcriptomic Changes in Diabetic Aortic Endothelial Cells","authors":"Jihye Park, Kyuho Kang, Y. Son, Kwang Seok Kim, Keunsoo Kang, Hae-June Lee","doi":"10.3390/data8050092","DOIUrl":"https://doi.org/10.3390/data8050092","url":null,"abstract":"Low-dose radiation refers to exposure to ionizing radiation at levels that are generally considered safe and not expected to cause immediate health effects. However, the effects of low-dose radiation are still not fully understood, and research in this area is ongoing. In this study, we investigated the alterations in gene expression profiles of human aortic endothelial cells (HAECs) and diabetic human aortic endothelial cells (T2D-HAECs) derived from patients with type 2 diabetes. To this end, we used RNA-seq to profile the transcriptomes of cells exposed to varying doses of low-dose radiation (0.1 Gy, 0.5 Gy, and 2.0 Gy) and compared them to a control group with no radiation exposure. Differentially expressed genes and enriched pathways were identified using the DESeq2 and gene set enrichment analysis (GSEA) methods, respectively. The data generated in this study are publicly available through the gene expression omnibus (GEO) database with the accession number GSE228572. This study provides a valuable resource for examining the effects of low-dose radiation on HAECs and T2D-HAECs, thereby contributing to a better understanding of the potential human health risks associated with low-dose radiation exposure.","PeriodicalId":55580,"journal":{"name":"Atomic Data and Nuclear Data Tables","volume":"8 1","pages":"92"},"PeriodicalIF":1.8,"publicationDate":"2023-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80145840","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Soloviev, A. Petrunin, Sofia Gvozdik, R. Sidorov
{"title":"A Set of Geophysical Fields for Modeling of the Lithosphere Structure and Dynamics in the Russian Arctic Zone","authors":"A. Soloviev, A. Petrunin, Sofia Gvozdik, R. Sidorov","doi":"10.3390/data8050091","DOIUrl":"https://doi.org/10.3390/data8050091","url":null,"abstract":"This paper presents a set of various geological and geophysical data for the Arctic zone, including some detailed models for the eastern part of the Russian Arctic zone. This hard-to-access territory has a complex geological structure, which is poorly studied by direct geophysical methods. Therefore, these data can be used in an integrative analysis for different purposes. These are the gravity field, heat flow, and various seismic tomography models. The gravity field data include several reductions calculated during our preceding studies, which are more appropriate for the study of the Earth’s interiors than the initial free air anomalies. Specifically, these are the Bouguer, isostatic, and decompensative gravity anomalies. A surface heat flow map included in the dataset is based on a joint inversion of multiple geophysical data constrained by the observations from the International Heat Flow Commission catalog. Available seismic tomography models were analyzed to select the best one for further investigation. We provide the models for the sedimentary cover and the Moho depth, which are significantly improved compared to the existing ones. The database provides a basis for qualitative and quantitative analysis of the region.","PeriodicalId":55580,"journal":{"name":"Atomic Data and Nuclear Data Tables","volume":"1 1","pages":"91"},"PeriodicalIF":1.8,"publicationDate":"2023-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84657817","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Efficient Deep Learning for Thai Sentiment Analysis","authors":"Nattawat Khamphakdee, Pusadee Seresangtakul","doi":"10.3390/data8050090","DOIUrl":"https://doi.org/10.3390/data8050090","url":null,"abstract":"The number of reviews from customers on travel websites and platforms is quickly increasing. They provide people with the ability to write reviews about their experience with respect to service quality, location, room, and cleanliness, thereby helping others before booking hotels. Many people fail to consider hotel bookings because the numerous reviews take a long time to read, and many are in a non-native language. Thus, hotel businesses need an efficient process to analyze and categorize the polarity of reviews as positive, negative, or neutral. In particular, low-resource languages such as Thai have greater limitations in terms of resources to classify sentiment polarity. In this paper, a sentiment analysis method is proposed for Thai sentiment classification in the hotel domain. Firstly, the Word2Vec technique (the continuous bag-of-words (CBOW) and skip-gram approaches) was applied to create word embeddings of different vector dimensions. Secondly, each word embedding model was combined with deep learning (DL) models to observe the impact of each word vector dimension result. We compared the performance of nine DL models (CNN, LSTM, Bi-LSTM, GRU, Bi-GRU, CNN-LSTM, CNN-BiLSTM, CNN-GRU, and CNN-BiGRU) with different numbers of layers to evaluate their performance in polarity classification. The dataset was classified using the FastText and BERT pre-trained models to carry out the sentiment polarity classification. Finally, our experimental results show that the WangchanBERTa model slightly improved the accuracy, producing a value of 0.9225, and the skip-gram and CNN model combination outperformed other DL models, reaching an accuracy of 0.9170. From the experiments, we found that the word vector dimensions, hyperparameter values, and the number of layers of the DL models affected the performance of sentiment classification. Our research provides guidance for setting suitable hyperparameter values to improve the accuracy of sentiment classification for the Thai language in the hotel domain.","PeriodicalId":55580,"journal":{"name":"Atomic Data and Nuclear Data Tables","volume":"08 1","pages":"90"},"PeriodicalIF":1.8,"publicationDate":"2023-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86201794","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Comprehensive Dataset of Spelling Errors and Users' Corrections in Croatian Language","authors":"G. Gledec, M. Horvat, M. Mikuc, B. Blašković","doi":"10.3390/data8050089","DOIUrl":"https://doi.org/10.3390/data8050089","url":null,"abstract":"This paper presents a unique and extensive dataset containing over 33 million entries with pairs in the form “spelling error → correction” from ispravi.me, the most popular Croatian online spellchecking service, collected since 2008. The dataset, compiled from the contribution of nearly 900,000 users, is a valuable resource for researchers and developers in the field of natural language processing (NLP), improving spellcheck accuracy, and language learning applications. The dataset may be used to accomplish several goals: (1) improving spellchecking accuracy by incorporating common user corrections and reducing false positives and negatives; (2) helping language learners identify common errors and learn correct spelling through targeted feedback; (3) analyzing data trends and patterns to uncover the most common spelling errors and their underlying causes; (4) identifying and evaluating factors that influence typing input; (5) improving NLP applications such as text recognition and machine translation. Tasks specific to the Croatian language include the creation of a letter-level confusion matrix and the refinement of word suggestions based on historical usage of the service. This comprehensive dataset provides researchers and practitioners with a wealth of information, opening the path for advancements in spellchecking, language learning, and NLP applications in the Croatian language.","PeriodicalId":55580,"journal":{"name":"Atomic Data and Nuclear Data Tables","volume":"22 1","pages":"89"},"PeriodicalIF":1.8,"publicationDate":"2023-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78999940","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}