Xuefeng Xi, Lei Wang, Encen Zou, Cheng Zeng, Baochuan Fu
{"title":"非标准中文建筑地址标准化联合学习*","authors":"Xuefeng Xi, Lei Wang, Encen Zou, Cheng Zeng, Baochuan Fu","doi":"10.1109/ISC2.2018.8656953","DOIUrl":null,"url":null,"abstract":"Since there is no uniform specification for building address name in China, the same building address maybe has many different representations in Chinese natural language. The goal of the non-standard Chinese building address standardization task is to uniformly convert the non-standard building addresses from different social institutions to the standard building address defined by the public security organ, so that the spatial location information corresponding to the standard building address can be obtained. This plays an important role in the analysis and processing of big data in smart cities. Due to the large number of non-standard building addresses and the semantic ambiguity of addresses expressed in Chinese natural language, traditional methods based on string matching are difficult to meet the task requirements. To address these above problems, we propose an innovative joint learning approach based on hash map principle and word frequency theory for standardizing Chinese non-standard building addresses. Experimental results on the dataset constructed via crowdsourced technology show that approach has outstanding accuracy and adaptability to data from different sources.","PeriodicalId":344652,"journal":{"name":"2018 IEEE International Smart Cities Conference (ISC2)","volume":"357 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Joint Learning for Non-standard Chinese Building Address Standardization*\",\"authors\":\"Xuefeng Xi, Lei Wang, Encen Zou, Cheng Zeng, Baochuan Fu\",\"doi\":\"10.1109/ISC2.2018.8656953\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Since there is no uniform specification for building address name in China, the same building address maybe has many different representations in Chinese natural language. The goal of the non-standard Chinese building address standardization task is to uniformly convert the non-standard building addresses from different social institutions to the standard building address defined by the public security organ, so that the spatial location information corresponding to the standard building address can be obtained. This plays an important role in the analysis and processing of big data in smart cities. Due to the large number of non-standard building addresses and the semantic ambiguity of addresses expressed in Chinese natural language, traditional methods based on string matching are difficult to meet the task requirements. To address these above problems, we propose an innovative joint learning approach based on hash map principle and word frequency theory for standardizing Chinese non-standard building addresses. Experimental results on the dataset constructed via crowdsourced technology show that approach has outstanding accuracy and adaptability to data from different sources.\",\"PeriodicalId\":344652,\"journal\":{\"name\":\"2018 IEEE International Smart Cities Conference (ISC2)\",\"volume\":\"357 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 IEEE International Smart Cities Conference (ISC2)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISC2.2018.8656953\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE International Smart Cities Conference (ISC2)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISC2.2018.8656953","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Joint Learning for Non-standard Chinese Building Address Standardization*
Since there is no uniform specification for building address name in China, the same building address maybe has many different representations in Chinese natural language. The goal of the non-standard Chinese building address standardization task is to uniformly convert the non-standard building addresses from different social institutions to the standard building address defined by the public security organ, so that the spatial location information corresponding to the standard building address can be obtained. This plays an important role in the analysis and processing of big data in smart cities. Due to the large number of non-standard building addresses and the semantic ambiguity of addresses expressed in Chinese natural language, traditional methods based on string matching are difficult to meet the task requirements. To address these above problems, we propose an innovative joint learning approach based on hash map principle and word frequency theory for standardizing Chinese non-standard building addresses. Experimental results on the dataset constructed via crowdsourced technology show that approach has outstanding accuracy and adaptability to data from different sources.