{"title":"构建健壮且紧凑的搜索索引","authors":"Vladislav Savchuk, Stanislav Protasov","doi":"10.1109/NIR52917.2021.9666087","DOIUrl":null,"url":null,"abstract":"With exponential data growth search engines require more memory for storage and time for search. The data is indexed to increase search speed, which requires additional memory. In this study we develop a fully functional search engine for Wikipedia articles and compare different indexing techniques. Using vector quantization for compression we fit an index into a single machine’s RAM. Moreover, we show that by using metadata and additional search for the out-of-vocabulary words we improve the overall system’s quality.","PeriodicalId":333109,"journal":{"name":"2021 International Conference \"Nonlinearity, Information and Robotics\" (NIR)","volume":"50 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Building a robust and compact search index\",\"authors\":\"Vladislav Savchuk, Stanislav Protasov\",\"doi\":\"10.1109/NIR52917.2021.9666087\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With exponential data growth search engines require more memory for storage and time for search. The data is indexed to increase search speed, which requires additional memory. In this study we develop a fully functional search engine for Wikipedia articles and compare different indexing techniques. Using vector quantization for compression we fit an index into a single machine’s RAM. Moreover, we show that by using metadata and additional search for the out-of-vocabulary words we improve the overall system’s quality.\",\"PeriodicalId\":333109,\"journal\":{\"name\":\"2021 International Conference \\\"Nonlinearity, Information and Robotics\\\" (NIR)\",\"volume\":\"50 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-08-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 International Conference \\\"Nonlinearity, Information and Robotics\\\" (NIR)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/NIR52917.2021.9666087\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Conference \"Nonlinearity, Information and Robotics\" (NIR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NIR52917.2021.9666087","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
With exponential data growth search engines require more memory for storage and time for search. The data is indexed to increase search speed, which requires additional memory. In this study we develop a fully functional search engine for Wikipedia articles and compare different indexing techniques. Using vector quantization for compression we fit an index into a single machine’s RAM. Moreover, we show that by using metadata and additional search for the out-of-vocabulary words we improve the overall system’s quality.