Mahmoud Ismail, Mikael Ronström, Seif Haridi, J. Dowling
{"title":"ePipe: Near Real-Time Polyglot Persistence of HopsFS Metadata","authors":"Mahmoud Ismail, Mikael Ronström, Seif Haridi, J. Dowling","doi":"10.1109/CCGRID.2019.00020","DOIUrl":null,"url":null,"abstract":"Distributed OLTP databases are now used to manage metadata for distributed file systems, but they cannot also efficiently support complex queries or aggregations. To solve this problem, we introduce ePipe, a databus that both creates a consistent change stream for a distributed, hierarchical file system (HopsFS) and eventually delivers the correctly ordered stream with low latency to downstream clients. ePipe can be used to provide polyglot storage for file system metadata, allowing metadata queries to be handled by the most efficient engine for that query. For file system notifications, we show that ePipe achieves up to 56X throughput improvement over HDFS INotify and Trumpet with up to 3 orders of magnitude lower latency. For Spotify's Hadoop workload, we show that ePipe can replicate all file system changes from HopsFS to Elasticsearch with an average replication lag of only 330 ms.","PeriodicalId":234571,"journal":{"name":"2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCGRID.2019.00020","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
Distributed OLTP databases are now used to manage metadata for distributed file systems, but they cannot also efficiently support complex queries or aggregations. To solve this problem, we introduce ePipe, a databus that both creates a consistent change stream for a distributed, hierarchical file system (HopsFS) and eventually delivers the correctly ordered stream with low latency to downstream clients. ePipe can be used to provide polyglot storage for file system metadata, allowing metadata queries to be handled by the most efficient engine for that query. For file system notifications, we show that ePipe achieves up to 56X throughput improvement over HDFS INotify and Trumpet with up to 3 orders of magnitude lower latency. For Spotify's Hadoop workload, we show that ePipe can replicate all file system changes from HopsFS to Elasticsearch with an average replication lag of only 330 ms.