{"title":"云上大容量交通日志数据集的高性能分布式索引和检索","authors":"Wen Yang, Yinan Dou","doi":"10.1109/IHMSC.2013.51","DOIUrl":null,"url":null,"abstract":"In this paper, we present a high-performance distributed system for storage, indexing and retrieval for large volume web traffic log datasets. This system is Based on the open source Map Reduce framework Hadoop and extends the functionality of Hadoop. We mainly focus on three noteworthy aspects of our work: the approach of large datasets storage on the Hadoop Distributed File System (HDFS), the appropriate indexing algorithm for large distributed datasets, a distributed retrieval architecture built on Hadoop. It has been proved that our system is efficient and the query response latency approach real time compared with HBase, a distributed, sparse, NoSQL database.","PeriodicalId":222375,"journal":{"name":"2013 5th International Conference on Intelligent Human-Machine Systems and Cybernetics","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"High-Performance Distributed Indexing and Retrieval for Large Volume Traffic Log Datasets on the Cloud\",\"authors\":\"Wen Yang, Yinan Dou\",\"doi\":\"10.1109/IHMSC.2013.51\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we present a high-performance distributed system for storage, indexing and retrieval for large volume web traffic log datasets. This system is Based on the open source Map Reduce framework Hadoop and extends the functionality of Hadoop. We mainly focus on three noteworthy aspects of our work: the approach of large datasets storage on the Hadoop Distributed File System (HDFS), the appropriate indexing algorithm for large distributed datasets, a distributed retrieval architecture built on Hadoop. It has been proved that our system is efficient and the query response latency approach real time compared with HBase, a distributed, sparse, NoSQL database.\",\"PeriodicalId\":222375,\"journal\":{\"name\":\"2013 5th International Conference on Intelligent Human-Machine Systems and Cybernetics\",\"volume\":\"7 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-08-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 5th International Conference on Intelligent Human-Machine Systems and Cybernetics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IHMSC.2013.51\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 5th International Conference on Intelligent Human-Machine Systems and Cybernetics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IHMSC.2013.51","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
High-Performance Distributed Indexing and Retrieval for Large Volume Traffic Log Datasets on the Cloud
In this paper, we present a high-performance distributed system for storage, indexing and retrieval for large volume web traffic log datasets. This system is Based on the open source Map Reduce framework Hadoop and extends the functionality of Hadoop. We mainly focus on three noteworthy aspects of our work: the approach of large datasets storage on the Hadoop Distributed File System (HDFS), the appropriate indexing algorithm for large distributed datasets, a distributed retrieval architecture built on Hadoop. It has been proved that our system is efficient and the query response latency approach real time compared with HBase, a distributed, sparse, NoSQL database.