Samuel Jackson;Saiful Khan;Nathan Cummings;James Hodson;Shaun de Witt;Stanislas Pamela;Rob Akers;Jeyan Thiyagalingam;MAST Team
{"title":"支持托卡马克数据机器学习研究的开放数据服务","authors":"Samuel Jackson;Saiful Khan;Nathan Cummings;James Hodson;Shaun de Witt;Stanislas Pamela;Rob Akers;Jeyan Thiyagalingam;MAST Team","doi":"10.1109/TPS.2025.3583419","DOIUrl":null,"url":null,"abstract":"The increasing complexity and volume of plasma fusion experimental data, coupled with the growing adoption of machine learning in fusion research, necessitate advanced and efficient data management solutions. We propose an open data service for fusion experiments operated by the UKAEA, designed to address the evolving needs of machine-learning-driven fusion research. Our system provides a framework to organize MAST, MAST upgrade (MAST-U), and Joint European Torus (JET) experimental data in accordance with findability, accessibility, interoperability, and reuse (FAIR) principles, using distributed object storage for scalability and a relational database for efficient metadata indexing. In addition, it offers simplified abstractions through an application programming interface (API), facilitating seamless data access and integration with data analysis and machine learning workflows. Performance evaluation of metrics such as data load time and throughput, across varying numbers of parallel workers, demonstrates the data pipeline’s optimization for efficient machine learning application development. Our solution significantly enhances support for data-driven research and machine learning applications in fusion by laying the groundwork for open, FAIR-compliant fusion data, which enables cross-machine analysis, prompts international collaboration, and potentially accelerates advancements in fusion energy research.","PeriodicalId":450,"journal":{"name":"IEEE Transactions on Plasma Science","volume":"53 9","pages":"2440-2449"},"PeriodicalIF":1.5000,"publicationDate":"2025-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11128905","citationCount":"0","resultStr":"{\"title\":\"An Open Data Service for Supporting Research in Machine Learning on Tokamak Data\",\"authors\":\"Samuel Jackson;Saiful Khan;Nathan Cummings;James Hodson;Shaun de Witt;Stanislas Pamela;Rob Akers;Jeyan Thiyagalingam;MAST Team\",\"doi\":\"10.1109/TPS.2025.3583419\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The increasing complexity and volume of plasma fusion experimental data, coupled with the growing adoption of machine learning in fusion research, necessitate advanced and efficient data management solutions. We propose an open data service for fusion experiments operated by the UKAEA, designed to address the evolving needs of machine-learning-driven fusion research. Our system provides a framework to organize MAST, MAST upgrade (MAST-U), and Joint European Torus (JET) experimental data in accordance with findability, accessibility, interoperability, and reuse (FAIR) principles, using distributed object storage for scalability and a relational database for efficient metadata indexing. In addition, it offers simplified abstractions through an application programming interface (API), facilitating seamless data access and integration with data analysis and machine learning workflows. Performance evaluation of metrics such as data load time and throughput, across varying numbers of parallel workers, demonstrates the data pipeline’s optimization for efficient machine learning application development. Our solution significantly enhances support for data-driven research and machine learning applications in fusion by laying the groundwork for open, FAIR-compliant fusion data, which enables cross-machine analysis, prompts international collaboration, and potentially accelerates advancements in fusion energy research.\",\"PeriodicalId\":450,\"journal\":{\"name\":\"IEEE Transactions on Plasma Science\",\"volume\":\"53 9\",\"pages\":\"2440-2449\"},\"PeriodicalIF\":1.5000,\"publicationDate\":\"2025-08-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11128905\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Plasma Science\",\"FirstCategoryId\":\"101\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/11128905/\",\"RegionNum\":4,\"RegionCategory\":\"物理与天体物理\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"PHYSICS, FLUIDS & PLASMAS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Plasma Science","FirstCategoryId":"101","ListUrlMain":"https://ieeexplore.ieee.org/document/11128905/","RegionNum":4,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"PHYSICS, FLUIDS & PLASMAS","Score":null,"Total":0}
An Open Data Service for Supporting Research in Machine Learning on Tokamak Data
The increasing complexity and volume of plasma fusion experimental data, coupled with the growing adoption of machine learning in fusion research, necessitate advanced and efficient data management solutions. We propose an open data service for fusion experiments operated by the UKAEA, designed to address the evolving needs of machine-learning-driven fusion research. Our system provides a framework to organize MAST, MAST upgrade (MAST-U), and Joint European Torus (JET) experimental data in accordance with findability, accessibility, interoperability, and reuse (FAIR) principles, using distributed object storage for scalability and a relational database for efficient metadata indexing. In addition, it offers simplified abstractions through an application programming interface (API), facilitating seamless data access and integration with data analysis and machine learning workflows. Performance evaluation of metrics such as data load time and throughput, across varying numbers of parallel workers, demonstrates the data pipeline’s optimization for efficient machine learning application development. Our solution significantly enhances support for data-driven research and machine learning applications in fusion by laying the groundwork for open, FAIR-compliant fusion data, which enables cross-machine analysis, prompts international collaboration, and potentially accelerates advancements in fusion energy research.
期刊介绍:
The scope covers all aspects of the theory and application of plasma science. It includes the following areas: magnetohydrodynamics; thermionics and plasma diodes; basic plasma phenomena; gaseous electronics; microwave/plasma interaction; electron, ion, and plasma sources; space plasmas; intense electron and ion beams; laser-plasma interactions; plasma diagnostics; plasma chemistry and processing; solid-state plasmas; plasma heating; plasma for controlled fusion research; high energy density plasmas; industrial/commercial applications of plasma physics; plasma waves and instabilities; and high power microwave and submillimeter wave generation.