{"title":"Data from extensive monitoring of agricultural practices, soil health, and wheat grain production in 44 farms in Northwestern France from 2021 to 2023.","authors":"Lefèvre Clara, Husson Olivier, Dumora Bruno, Grudé Océane, Lugassy Léa, Sarthou Jean-Pierre","doi":"10.1016/j.dib.2024.111140","DOIUrl":"https://doi.org/10.1016/j.dib.2024.111140","url":null,"abstract":"<p><p>This article presents data measured in 44 farms covering a range of cropping practices, soil, and production parameters under contrasted types of crop management: conventional and conservation agriculture. Eighty-six winter wheat fields in Northwestern France were monitored for two growing seasons (2021-2023). The dataset encompasses data about cropping practices (tillage, soil cover, rotation, pesticide use, nutrition), soils (chemical, biological, and physical parameters, including texture), and grain production (nutritional, technological, and sanitary indicators). This article provides a detailed methodology of one of the first applications of a systemic on-farm study of the food production system, aiming to adopt a \"One Health\" perspective of the crop production system. <i>The data presented here can be accessed</i> at https://doi.org/10.18167/DVN1/SI026U.</p>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"57 ","pages":"111140"},"PeriodicalIF":1.0,"publicationDate":"2024-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11699296/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142930895","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Data in BriefPub Date : 2024-12-04eCollection Date: 2024-12-01DOI: 10.1016/j.dib.2024.111177
Juliette Aminian-Biquet, Claire Colegrove, Alex Driedger, Nicole Raudsepp, Jennifer Sletten, Timothé Vincent, Virgil Zetterlind, Julia Roessger, Anastasiya Laznya, Natașa Vaidianu, Joachim Claudet, Juliette Young, Barbara Horta E Costa
{"title":"Regulations of activities and protection levels in marine protected areas of the European Union: A dataset compiled from multiple data sources.","authors":"Juliette Aminian-Biquet, Claire Colegrove, Alex Driedger, Nicole Raudsepp, Jennifer Sletten, Timothé Vincent, Virgil Zetterlind, Julia Roessger, Anastasiya Laznya, Natașa Vaidianu, Joachim Claudet, Juliette Young, Barbara Horta E Costa","doi":"10.1016/j.dib.2024.111177","DOIUrl":"10.1016/j.dib.2024.111177","url":null,"abstract":"<p><p>The dataset gathers available regulations of human activities and protection levels of Marine Protected Areas (MPAs) of the European Union (EU). The MPA list and polygons were extracted from the MPA database of the European Environment Agency (EEA) and completed with available zoning systems (all were filtered for their marine area reported under the Marine Strategy Framework Directive). Fully-overlapping MPAs were merged. In the resulting dataset, MPA features are provided (gathered from EEA, WDPA, ProtectedSeas), including the year of designation, designation types (e.g., national, Natura 2000) and subtypes (e.g., reserves, national parks), database identifiers (WDPA, Natura 2000, OSPAR, etc.), IUCN categories, and main protection focus. We provide summarized data on maritime activities that overlap with MPA polygons from two types of datasets: activities-focused datasets (national marine spatial plans, and additional European and regional databases, like EMODnet) and MPA-focused datasets gathering data from management plans (ProtectedSeas, expert-based assessments about OSPAR and Portuguese MPAs). This dataset therefore compiles data that could be gathered from accessible legal frameworks regarding aquaculture, fisheries, anchoring, infrastructures (including harbors and renewable energy), mining, transport, coastal land-based uses (desalinization, sewage plants) and other non-extractive uses (e.g., recreational), making them readily accessible. Using the MPA Guide classification system, we computed two scenarios of potential impact for each activity, which were used to assess two scenarios of protection levels per MPA. Some MPAs could not be associated with any MPA features, regulations, or protection levels. Finally, we detail the protocol to match information from multiple databases (e.g., with MPA polygons formatted differently) and provide a quality check by comparing this dataset to previous assessments. This dataset was used to analyze MPAs' protection levels across countries, regions and MPA features (e.g., IUCN categories, designations). It was also used to investigate the sources of information available and the levels of regulations for each maritime activity in EU MPAs. This dataset can therefore be used for further analyses on the use of EU MPAS to regulate activities and to compare with future assessments or with additional data we did not have access to (e.g., gathered at national scale). Such research is crucial to plan and monitor the implementation of the EU 2030 Biodiversity Strategy, targeting 10% of strictly protected MPAs in each sea region.</p>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"57 ","pages":"111177"},"PeriodicalIF":1.0,"publicationDate":"2024-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11683248/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142906691","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Data in BriefPub Date : 2024-12-04eCollection Date: 2024-12-01DOI: 10.1016/j.dib.2024.111199
Gerardo Roa Dabike, Trevor J Cox, Alex J Miller, Bruno M Fazenda, Simone Graetzer, Rebecca R Vos, Michael A Akeroyd, Jennifer Firth, William M Whitmer, Scott Bannister, Alinka Greasley, Jon P Barker
{"title":"The cadenza woodwind dataset: Synthesised quartets for music information retrieval and machine learning.","authors":"Gerardo Roa Dabike, Trevor J Cox, Alex J Miller, Bruno M Fazenda, Simone Graetzer, Rebecca R Vos, Michael A Akeroyd, Jennifer Firth, William M Whitmer, Scott Bannister, Alinka Greasley, Jon P Barker","doi":"10.1016/j.dib.2024.111199","DOIUrl":"10.1016/j.dib.2024.111199","url":null,"abstract":"<p><p>This paper presents the Cadenza Woodwind Dataset. This publicly available data is synthesised audio for woodwind quartets including renderings of each instrument in isolation. The data was created to be used as training data within Cadenza's second open machine learning challenge (CAD2) for the task on rebalancing classical music ensembles. The dataset is also intended for developing other music information retrieval (MIR) algorithms using machine learning. It was created because of the lack of large-scale datasets of classical woodwind music with separate audio for each instrument and permissive license for reuse. Music scores were selected from the OpenScore String Quartet corpus. These were rendered for two woodwind ensembles of (i) flute, oboe, clarinet and bassoon; and (ii) flute, oboe, alto saxophone and bassoon. This was done by a professional music producer using industry-standard software. Virtual instruments were used to create the audio for each instrument using software that interpreted expression markings in the score. Convolution reverberation was used to simulate a performance space and the ensembles mixed. The dataset consists of the audio and associated metadata.</p>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"57 ","pages":"111199"},"PeriodicalIF":1.0,"publicationDate":"2024-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11683209/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142906697","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Data in BriefPub Date : 2024-12-03eCollection Date: 2024-12-01DOI: 10.1016/j.dib.2024.111194
Lawrence McKnight, Chandra Jaiswal, Issa AlHmoud, Balakrishna Gokaraju
{"title":"A dataset of deep learning performance from cross-base data encoding on MNIST and MNIST-C.","authors":"Lawrence McKnight, Chandra Jaiswal, Issa AlHmoud, Balakrishna Gokaraju","doi":"10.1016/j.dib.2024.111194","DOIUrl":"https://doi.org/10.1016/j.dib.2024.111194","url":null,"abstract":"<p><p>Effective data representation in machine learning and deep learning is paramount. For an algorithm or neural network to capture patterns in data and be able to make reliable predictions, the data must appropriately describe the problem domain. Although there exists much literature on data preprocessing for machine learning and data science applications, novel data representation methods for enhancing machine learning model performance remain highly absent within the literature. This dataset is a compilation of convolutional neural network model performance trained and tested on a wide range of numerical base representations of the MNIST and MNIST-C datasets. This performance data can be further analysed by the research community to uncover trends in model performance against the numerical base of its data. This dataset can be used to produce more research of the same nature, testing cross-base data encoding on machine learning training and testing data for a wide range of real-world applications.</p>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"57 ","pages":"111194"},"PeriodicalIF":1.0,"publicationDate":"2024-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11697575/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142930887","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Developing a comprehensive BACnet attack dataset: A step towards improved cybersecurity in building automation systems.","authors":"Seyed Amirhossein Moosavi, Mojtaba Asgari, Seyed Reza Kamel","doi":"10.1016/j.dib.2024.111192","DOIUrl":"10.1016/j.dib.2024.111192","url":null,"abstract":"<p><p>With the development of smart buildings, the risks of cyber-attacks against them have also increased. One of the popular and evolving protocols used for communication between devices in smart buildings, especially HVAC systems, is the BACnet protocol. Machine learning algorithms and neural networks require datasets of normal traffic and real attacks to develop intrusion detection (IDS) and prevention (IPS) systems that can detect anomalies and prevent attacks. Real traffic datasets for these networks are often unavailable due to confidentiality reasons. To address this, we propose a framework that uses existing real datasets and converts them into BACnet protocol network traffic with detailed network behaviour. In this method, a virtual machine is prepared for each controller based on real scenarios, and by creating a simulator for the controller on the virtual machine, real data previously collected under real conditions from existing datasets is injected into the network with the same date and time during the simulation. We performed three types of attacks, including Falsifying, Modifying, and covert channel attacks on the network. For covert channel attacks, the message was modelled in three forms: Plain text, hashed using SHA3-256, and encrypted using AES-256. Network traffic was recorded using Wireshark software in pcap format. The advantage of the generated dataset is that since we used real data, the data behaviour aligns with real conditions.</p>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"57 ","pages":"111192"},"PeriodicalIF":1.0,"publicationDate":"2024-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11683266/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142906613","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Data in BriefPub Date : 2024-12-03eCollection Date: 2024-12-01DOI: 10.1016/j.dib.2024.111193
Quan Huu Nguyen, Trinh Van Nguyen, Thuy Thi Xuan Vi, Thuy Thi Thu Vu, Lan Thi Ngoc Nguyen, Yen Thi Hai Nguyen, Hung Duc Nguyen, Tan Quang Tu, Mau Hoang Chu
{"title":"Dataset on ITS and some chloroplast DNA regions of <i>Boehmeria holosericea</i> Blume in Vietnam.","authors":"Quan Huu Nguyen, Trinh Van Nguyen, Thuy Thi Xuan Vi, Thuy Thi Thu Vu, Lan Thi Ngoc Nguyen, Yen Thi Hai Nguyen, Hung Duc Nguyen, Tan Quang Tu, Mau Hoang Chu","doi":"10.1016/j.dib.2024.111193","DOIUrl":"10.1016/j.dib.2024.111193","url":null,"abstract":"<p><p>Species of the <i>Boehmeria</i> genus have the potential to be natural medicines and have industrial fibre production uses. Many species of this genus are morphologically similar and are difficult to distinguish, especially when their morphology is distorted. This dataset includes sequence information of several DNA regions isolated from the genome of <i>Boehmeria holosericea</i>, namely ITS (from the nuclear genome), <i>matK</i>, trnL-trnF, trnH-psbA, and <i>rpoC1</i> (from the chloroplast genome) and phylogenetic analysis results based on the isolated sequences. On the phylogenetic tree based on the matK gene sequence, B. holosericea is grouped with <i>B. umbrosa, B. clidemioides, B. spicata, and B. macrophylla</i> with a bootstrap coefficient of 100%. In the phylogenetic tree based on the trnH-psbA spacer region sequences, <i>B. holosericea</i> was grouped with B. clidemioides (a bootstrap coefficient of 96%). In the phylogenetic tree based on the <i>rpoC1</i> gene sequences, <i>B. holosericea</i> was grouped with <i>B. spicata</i> (a bootstrap coefficient of 100%). In the phylogenetic tree based on the ITS region sequences, <i>B. holosericea</i> was grouped with B<i>. macrophylla</i> (a bootstrap coefficient of 73%), and based on the trnL-trnF spacer region, <i>B. holosericea</i> was grouped with <i>B. pilociuscula</i> (a bootstrap coefficient of 16%). Two genes, <i>matK</i> and <i>rpoC1</i> and the trnH-psbA region from the chloroplast genome, are potential DNA barcode candidates that could aid in the species identification of <i>B. holosericea</i>. This dataset the first report on the ITS, <i>matK</i>, trnL-trnF, trnH-psbA, and <i>rpoC1</i> sequences and the phylogeny of <i>B. holosericea.</i></p>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"57 ","pages":"111193"},"PeriodicalIF":1.0,"publicationDate":"2024-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11683258/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142906590","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Data in BriefPub Date : 2024-12-02eCollection Date: 2024-12-01DOI: 10.1016/j.dib.2024.111171
Zacharias Dahl, Aleksanteri Hämäläinen, Aku Karhinen, Jesse Miettinen, Andre Böhme, Samuel Lillqvist, Sampo Haikonen, Raine Viitala
{"title":"Aalto Gear Fault datasets for deep-learning based diagnosis.","authors":"Zacharias Dahl, Aleksanteri Hämäläinen, Aku Karhinen, Jesse Miettinen, Andre Böhme, Samuel Lillqvist, Sampo Haikonen, Raine Viitala","doi":"10.1016/j.dib.2024.111171","DOIUrl":"10.1016/j.dib.2024.111171","url":null,"abstract":"<p><p>Accurate system health state prediction through deep learning requires extensive and varied data. However, real-world data scarcity poses a challenge for developing robust fault diagnosis models. This study introduces two extensive datasets, Aalto Shim Dataset and Aalto Gear Fault Dataset, collected under controlled laboratory conditions, aimed at advancing deep learning-based fault diagnosis. The datasets encompass a wide range of gear faults, including synthetic and realistic failure modes, replicated on a downsized azimuth thruster testbench equipped with multiple sensors. The data features various fault types and severities under different operating conditions. The comprehensive data collected, along with the methodologies for creating synthetic faults and replicating common gear failures, provide valuable resources for developing and testing intelligent fault diagnosis models, enhancing their generalization and robustness across diverse scenarios.</p>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"57 ","pages":"111171"},"PeriodicalIF":1.0,"publicationDate":"2024-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11683272/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142906663","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Data in BriefPub Date : 2024-12-02eCollection Date: 2024-12-01DOI: 10.1016/j.dib.2024.111184
Rotimi-Williams Bello, Pius A Owolawi, Etienne A van Wyk, Chunling Du
{"title":"Photovoltaic module dataset for automated fault detection and analysis in large photovoltaic systems using photovoltaic module fault detection.","authors":"Rotimi-Williams Bello, Pius A Owolawi, Etienne A van Wyk, Chunling Du","doi":"10.1016/j.dib.2024.111184","DOIUrl":"https://doi.org/10.1016/j.dib.2024.111184","url":null,"abstract":"<p><p>Solar energy has become the fastest growing renewable and alternative source of energy. However, there is little or no open-source datasets to advance research knowledge in photovoltaic related systems. The work presented in this article is a step towards deriving Photo-Voltaic Module Dataset (PVMD) of thermal images and ensuring they are publicly available. The work provides a PVMD dataset comprising a total of 1000 self-acquired and augmented images. The dataset includes both permanent and temporal anomalies, namely Hotspots, Cracks, and Shadings. The dataset was collected on September 5, 2024 at the Soshanguve South Campus, Tshwane University of Technology, South Africa using DJI Mavic 3 Thermal's high-resolution thermal and visual imaging capabilities. DJI Mavic 3 Thermal coupled with its advanced flight features makes it an excellent tool for precise and efficient inspections of PV systems. The laboratory experiment performed on the dataset lasted one week. The work aims to provide supervised dataset good enough to support research method in providing a comprehensive and efficient approach to monitoring and maintaining large PV systems. Extensive analysis of the thermal data reveals the anomalies as indicative of faults in the solar cells of PV module, thereby opening up advancement in solar energy research. Because the data comes from a single-day collection and one week laboratory experiment, it makes the data more suitable for testing algorithms designed for fault detection. The dataset is publicly and freely available to the scientific community at 10.17632/5ssmfpgrpc.1.</p>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"57 ","pages":"111184"},"PeriodicalIF":1.0,"publicationDate":"2024-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11683254/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142906689","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Data in BriefPub Date : 2024-12-02eCollection Date: 2024-12-01DOI: 10.1016/j.dib.2024.111183
Jorge Garate-Quispe, Ramiro Canahuire-Robles, Marx Herrera-Machaca, Sufer Baez-Quispe, Gabriel Alarcón-Aguirre
{"title":"Field data on diversity and vegetation structure of natural regeneration in a chronosequence of abandoned gold-mining lands in a tropical Amazon forest.","authors":"Jorge Garate-Quispe, Ramiro Canahuire-Robles, Marx Herrera-Machaca, Sufer Baez-Quispe, Gabriel Alarcón-Aguirre","doi":"10.1016/j.dib.2024.111183","DOIUrl":"10.1016/j.dib.2024.111183","url":null,"abstract":"<p><p>Anthropogenic activities (e.g., logging, gold-mining, agriculture, and uncontrolled urban expansion) threaten the forests in the southeast of the Peruvian Amazon, one of the most diverse ecosystems worldwide. However, gold-mining generates the most severe impacts on ecosystems and limits its resilience. The natural regeneration of degraded areas in the southeastern Peruvian Amazon have not been studied deeply. The dataset contains floristic inventories of previously uncharacterized or poorly studied secondary forests degraded and abandoned by goldmining activities and an intact forest in the Tres Islas indigenous community, Madre de Dios region, in southeastern Peru. The data presented was obtained from 12 plots (20 m × 60 m) established in three successional forests abandoned by gold mining and an intact forest (without mining impacts), where all trees with a stem diameter at breast height greater than 1 cm were inventoried. To the best of our knowledge, this is the only dataset in the southwest of the Peruvian Amazon that compares the natural colonization after gold-mining and intact forests. This dataset can be useful for long-term study and monitoring of structure and tree diversity in relatively understudied yet important secondary forests after gold-mining abandonment. Also, this dataset could be used to analyze the successional trajectory process of vegetation and the recovery of aboveground biomass. Furthermore, the data could be used to investigate the effects of functional traits and types of mining on vegetation recovery. Hence, understanding the successional processes will help to improve restoration, reforestation, or reclamation strategies for the recovery of degraded lands in the Amazon.</p>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"57 ","pages":"111183"},"PeriodicalIF":1.0,"publicationDate":"2024-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11665693/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142881783","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Data in BriefPub Date : 2024-12-01DOI: 10.1016/j.dib.2024.111125
Daniel Lee , Fernanda C C Oliveira , Richard T. Conant , Minjae Kim
{"title":"Microbial community assembly across agricultural soil mineral mesocosms revealed by 16S rRNA gene amplicon sequencing data","authors":"Daniel Lee , Fernanda C C Oliveira , Richard T. Conant , Minjae Kim","doi":"10.1016/j.dib.2024.111125","DOIUrl":"10.1016/j.dib.2024.111125","url":null,"abstract":"<div><div>Increasing atmospheric carbon dioxide (CO<sub>2</sub>) concentrations are impacting the global climate, resulting in significant interest in soil carbon sequestration as a mitigation strategy. While recognized that mineral-associated organic matter (MAOM) in soils is mainly formed through microbial activity, our understanding of microbial-derived MAOM formation processes remains limited due to the complexity of the soil environment. To gain insights into this issue, we incubated fresh soil samples for 45 days with one of three mineral additions: Sand, Kaolinite+Sand, or Illite+Sand. 16S rRNA V3/V4 gene amplicon sequencing was then conducted on samples using an Illumina NextSeq 2000 flow cell. The reads were analyzed and taxonomically assigned with QIIME2 v2023.5.1 and SILVA 138. The dataset has been made publicly available through NCBI GenBank under BioProject ID PRJNA1124235. This dataset is important and useful as it provides valuable insights into the interactions between soil minerals and microbial communities, which can inform strategies for enhancing soil carbon sequestration and mitigating climate change. Moreover, it serves as a crucial reference for future studies, offering a foundational understanding of microbial dynamics in soil systems and guiding further research in microbial ecology and carbon cycling.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"57 ","pages":"Article 111125"},"PeriodicalIF":1.0,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142745172","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}