{"title":"基于词典的多语言分词与搜索引擎","authors":"Weng Yu, Chen Wenyi","doi":"10.1109/ICINIS.2012.85","DOIUrl":null,"url":null,"abstract":"Because of the current search engines are mostly based on Chinese and English, the engines can provide the minority language search services are very small and the accuracy is low. We present a Dictionary-based multi-language Analyzer and Search Engine which can be used to analyze and search information on the Internet in minority languages such as Uighur, Tibetan, Mongol, Manchu etc. After preprocessed the Corpus we use Lucene to index, Segmentation. Segmentation is based on our dictionary and it depends much on it.","PeriodicalId":302503,"journal":{"name":"2012 Fifth International Conference on Intelligent Networks and Intelligent Systems","volume":"47 39 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Dictionary-based Multi-language Segmentation and Search Engine\",\"authors\":\"Weng Yu, Chen Wenyi\",\"doi\":\"10.1109/ICINIS.2012.85\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Because of the current search engines are mostly based on Chinese and English, the engines can provide the minority language search services are very small and the accuracy is low. We present a Dictionary-based multi-language Analyzer and Search Engine which can be used to analyze and search information on the Internet in minority languages such as Uighur, Tibetan, Mongol, Manchu etc. After preprocessed the Corpus we use Lucene to index, Segmentation. Segmentation is based on our dictionary and it depends much on it.\",\"PeriodicalId\":302503,\"journal\":{\"name\":\"2012 Fifth International Conference on Intelligent Networks and Intelligent Systems\",\"volume\":\"47 39 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2012 Fifth International Conference on Intelligent Networks and Intelligent Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICINIS.2012.85\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 Fifth International Conference on Intelligent Networks and Intelligent Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICINIS.2012.85","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Dictionary-based Multi-language Segmentation and Search Engine
Because of the current search engines are mostly based on Chinese and English, the engines can provide the minority language search services are very small and the accuracy is low. We present a Dictionary-based multi-language Analyzer and Search Engine which can be used to analyze and search information on the Internet in minority languages such as Uighur, Tibetan, Mongol, Manchu etc. After preprocessed the Corpus we use Lucene to index, Segmentation. Segmentation is based on our dictionary and it depends much on it.