{"title":"条件随机场模型下的网络新生词识别方法","authors":"J. Zhou","doi":"10.1109/ICVRIS.2018.00136","DOIUrl":null,"url":null,"abstract":"This paper proposes an approach of automatic detection of new words. It analyzes the webpages acquired from Internet on a large scale and detect new words. According to morphological rules it will perform further filtering on detection results to extract existed new words in the corpus. Our scheme adopts conditional random field and puts forward twp improvement: in the new word detection stage, a high efficient left (right) entropy calculation method is proposed to improve the detection speed, which effectively reduces the influence of unrelated characters in the calculation; then, a quantized model of missing logged words is also proposed, which is based on participle to extract repeated strings, and it can be used to evaluate the problem of missing words. The CRF based combination model proposed in this paper is proved to be a very effective new word detection method, whether from the open experiment effect or the generalization ability of the model.","PeriodicalId":152317,"journal":{"name":"2018 International Conference on Virtual Reality and Intelligent Systems (ICVRIS)","volume":"152 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Internet Newborn Word Recognition Method under Conditional Random Field Model\",\"authors\":\"J. Zhou\",\"doi\":\"10.1109/ICVRIS.2018.00136\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper proposes an approach of automatic detection of new words. It analyzes the webpages acquired from Internet on a large scale and detect new words. According to morphological rules it will perform further filtering on detection results to extract existed new words in the corpus. Our scheme adopts conditional random field and puts forward twp improvement: in the new word detection stage, a high efficient left (right) entropy calculation method is proposed to improve the detection speed, which effectively reduces the influence of unrelated characters in the calculation; then, a quantized model of missing logged words is also proposed, which is based on participle to extract repeated strings, and it can be used to evaluate the problem of missing words. The CRF based combination model proposed in this paper is proved to be a very effective new word detection method, whether from the open experiment effect or the generalization ability of the model.\",\"PeriodicalId\":152317,\"journal\":{\"name\":\"2018 International Conference on Virtual Reality and Intelligent Systems (ICVRIS)\",\"volume\":\"152 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 International Conference on Virtual Reality and Intelligent Systems (ICVRIS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICVRIS.2018.00136\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 International Conference on Virtual Reality and Intelligent Systems (ICVRIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICVRIS.2018.00136","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Internet Newborn Word Recognition Method under Conditional Random Field Model
This paper proposes an approach of automatic detection of new words. It analyzes the webpages acquired from Internet on a large scale and detect new words. According to morphological rules it will perform further filtering on detection results to extract existed new words in the corpus. Our scheme adopts conditional random field and puts forward twp improvement: in the new word detection stage, a high efficient left (right) entropy calculation method is proposed to improve the detection speed, which effectively reduces the influence of unrelated characters in the calculation; then, a quantized model of missing logged words is also proposed, which is based on participle to extract repeated strings, and it can be used to evaluate the problem of missing words. The CRF based combination model proposed in this paper is proved to be a very effective new word detection method, whether from the open experiment effect or the generalization ability of the model.