{"title":"eHSim:基于MapReduce的高效混合相似度搜索","authors":"T. Phan, J. Küng, T. K. Dang","doi":"10.1109/AINA.2016.37","DOIUrl":null,"url":null,"abstract":"In this paper, we study the problems of scalability and performance for similarity search by proposing eHSim, an efficient hybrid similarity search with MapReduce. More specifically, we introduce clustering schemes that partition objects into different groups by their length. Additionally, we equip our proposed schemes with pruning strategies that quickly discard irrelevant objects before truly computing their similarity. Moreover, we design a hybrid MapReduce architecture that deals with challenges from big data. Furthermore, we implement our proposed methods with MapReduce and make them compatible with the hybrid MapReduce architecture. Last but not least, we evaluate the proposed methods with real datasets. Empirical experiments show that our approach is considerably more efficient than state-of-the-arts in terms of query processing, batch processing, and data storage.","PeriodicalId":438655,"journal":{"name":"2016 IEEE 30th International Conference on Advanced Information Networking and Applications (AINA)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-03-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"eHSim: An Efficient Hybrid Similarity Search with MapReduce\",\"authors\":\"T. Phan, J. Küng, T. K. Dang\",\"doi\":\"10.1109/AINA.2016.37\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we study the problems of scalability and performance for similarity search by proposing eHSim, an efficient hybrid similarity search with MapReduce. More specifically, we introduce clustering schemes that partition objects into different groups by their length. Additionally, we equip our proposed schemes with pruning strategies that quickly discard irrelevant objects before truly computing their similarity. Moreover, we design a hybrid MapReduce architecture that deals with challenges from big data. Furthermore, we implement our proposed methods with MapReduce and make them compatible with the hybrid MapReduce architecture. Last but not least, we evaluate the proposed methods with real datasets. Empirical experiments show that our approach is considerably more efficient than state-of-the-arts in terms of query processing, batch processing, and data storage.\",\"PeriodicalId\":438655,\"journal\":{\"name\":\"2016 IEEE 30th International Conference on Advanced Information Networking and Applications (AINA)\",\"volume\":\"23 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-03-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 IEEE 30th International Conference on Advanced Information Networking and Applications (AINA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/AINA.2016.37\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE 30th International Conference on Advanced Information Networking and Applications (AINA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AINA.2016.37","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
eHSim: An Efficient Hybrid Similarity Search with MapReduce
In this paper, we study the problems of scalability and performance for similarity search by proposing eHSim, an efficient hybrid similarity search with MapReduce. More specifically, we introduce clustering schemes that partition objects into different groups by their length. Additionally, we equip our proposed schemes with pruning strategies that quickly discard irrelevant objects before truly computing their similarity. Moreover, we design a hybrid MapReduce architecture that deals with challenges from big data. Furthermore, we implement our proposed methods with MapReduce and make them compatible with the hybrid MapReduce architecture. Last but not least, we evaluate the proposed methods with real datasets. Empirical experiments show that our approach is considerably more efficient than state-of-the-arts in terms of query processing, batch processing, and data storage.