{"title":"Characterizing I/O in Machine Learning with MLPerf Storage","authors":"Oana Balmau","doi":"10.1145/3572751.3572765","DOIUrl":null,"url":null,"abstract":"Data is the driving force behind machine learning (ML) algorithms. The way we ingest, store, and serve data can impact the performance of end-to-end training and inference significantly [11]. However, efficient storage and pre-processing of training data has received far less focus in ML compared to efforts in building specialized software frameworks and hardware accelerators. The amount of data that we produce is growing exponentially, making it expensive and difficult to keep entire training datasets in main memory. Increasingly, ML algorithms will need to access data from persistent storage in an efficient way.","PeriodicalId":346332,"journal":{"name":"ACM SIGMOD Record","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM SIGMOD Record","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3572751.3572765","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Data is the driving force behind machine learning (ML) algorithms. The way we ingest, store, and serve data can impact the performance of end-to-end training and inference significantly [11]. However, efficient storage and pre-processing of training data has received far less focus in ML compared to efforts in building specialized software frameworks and hardware accelerators. The amount of data that we produce is growing exponentially, making it expensive and difficult to keep entire training datasets in main memory. Increasingly, ML algorithms will need to access data from persistent storage in an efficient way.