None Junior Momo Ziazet, None Charles Boudreau, None Oscar Delgado, None Brigitte Jaumard
{"title":"设计具有有限样本和小网络大小的图神经网络训练数据","authors":"None Junior Momo Ziazet, None Charles Boudreau, None Oscar Delgado, None Brigitte Jaumard","doi":"10.52953/afyw5455","DOIUrl":null,"url":null,"abstract":"Machine learning is a data-driven domain, which means a learning model's performance depends on the availability of large volumes of data to train it. However, by improving data quality, we can train effective machine learning models with little data. This paper demonstrates this possibility by proposing a methodology to generate high-quality data in the networking domain. We designed a dataset to train a given Graph Neural Network (GNN) that not only contains a small number of samples, but whose samples also feature network graphs of a reduced size (10-node networks). Our evaluations indicate that the dataset generated by the proposed pipeline can train a GNN model that scales well to larger networks of 50 to 300 nodes. The trained model compares favorably to the baseline, achieving a mean absolute percentage error of 5-6%, while being significantly smaller at 90 samples total (vs. thousands of samples for the baseline).","PeriodicalId":93013,"journal":{"name":"ITU journal : ICT discoveries","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Designing graph neural networks training data with limited samples and small network sizes\",\"authors\":\"None Junior Momo Ziazet, None Charles Boudreau, None Oscar Delgado, None Brigitte Jaumard\",\"doi\":\"10.52953/afyw5455\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Machine learning is a data-driven domain, which means a learning model's performance depends on the availability of large volumes of data to train it. However, by improving data quality, we can train effective machine learning models with little data. This paper demonstrates this possibility by proposing a methodology to generate high-quality data in the networking domain. We designed a dataset to train a given Graph Neural Network (GNN) that not only contains a small number of samples, but whose samples also feature network graphs of a reduced size (10-node networks). Our evaluations indicate that the dataset generated by the proposed pipeline can train a GNN model that scales well to larger networks of 50 to 300 nodes. The trained model compares favorably to the baseline, achieving a mean absolute percentage error of 5-6%, while being significantly smaller at 90 samples total (vs. thousands of samples for the baseline).\",\"PeriodicalId\":93013,\"journal\":{\"name\":\"ITU journal : ICT discoveries\",\"volume\":\"25 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-09-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ITU journal : ICT discoveries\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.52953/afyw5455\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ITU journal : ICT discoveries","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.52953/afyw5455","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Designing graph neural networks training data with limited samples and small network sizes
Machine learning is a data-driven domain, which means a learning model's performance depends on the availability of large volumes of data to train it. However, by improving data quality, we can train effective machine learning models with little data. This paper demonstrates this possibility by proposing a methodology to generate high-quality data in the networking domain. We designed a dataset to train a given Graph Neural Network (GNN) that not only contains a small number of samples, but whose samples also feature network graphs of a reduced size (10-node networks). Our evaluations indicate that the dataset generated by the proposed pipeline can train a GNN model that scales well to larger networks of 50 to 300 nodes. The trained model compares favorably to the baseline, achieving a mean absolute percentage error of 5-6%, while being significantly smaller at 90 samples total (vs. thousands of samples for the baseline).