Joint Learning for Non-standard Chinese Building Address Standardization*

Xuefeng Xi, Lei Wang, Encen Zou, Cheng Zeng, Baochuan Fu
{"title":"Joint Learning for Non-standard Chinese Building Address Standardization*","authors":"Xuefeng Xi, Lei Wang, Encen Zou, Cheng Zeng, Baochuan Fu","doi":"10.1109/ISC2.2018.8656953","DOIUrl":null,"url":null,"abstract":"Since there is no uniform specification for building address name in China, the same building address maybe has many different representations in Chinese natural language. The goal of the non-standard Chinese building address standardization task is to uniformly convert the non-standard building addresses from different social institutions to the standard building address defined by the public security organ, so that the spatial location information corresponding to the standard building address can be obtained. This plays an important role in the analysis and processing of big data in smart cities. Due to the large number of non-standard building addresses and the semantic ambiguity of addresses expressed in Chinese natural language, traditional methods based on string matching are difficult to meet the task requirements. To address these above problems, we propose an innovative joint learning approach based on hash map principle and word frequency theory for standardizing Chinese non-standard building addresses. Experimental results on the dataset constructed via crowdsourced technology show that approach has outstanding accuracy and adaptability to data from different sources.","PeriodicalId":344652,"journal":{"name":"2018 IEEE International Smart Cities Conference (ISC2)","volume":"357 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE International Smart Cities Conference (ISC2)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISC2.2018.8656953","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

Since there is no uniform specification for building address name in China, the same building address maybe has many different representations in Chinese natural language. The goal of the non-standard Chinese building address standardization task is to uniformly convert the non-standard building addresses from different social institutions to the standard building address defined by the public security organ, so that the spatial location information corresponding to the standard building address can be obtained. This plays an important role in the analysis and processing of big data in smart cities. Due to the large number of non-standard building addresses and the semantic ambiguity of addresses expressed in Chinese natural language, traditional methods based on string matching are difficult to meet the task requirements. To address these above problems, we propose an innovative joint learning approach based on hash map principle and word frequency theory for standardizing Chinese non-standard building addresses. Experimental results on the dataset constructed via crowdsourced technology show that approach has outstanding accuracy and adaptability to data from different sources.
非标准中文建筑地址标准化联合学习*
由于中国没有统一的建筑物地址名称规范,同一建筑物地址在中国自然语言中可能有许多不同的表示。中国非标准建筑地址标准化任务的目标是将不同社会机构的非标准建筑地址统一转换为公安机关确定的标准建筑地址,从而获得标准建筑地址对应的空间位置信息。这对智慧城市的大数据分析和处理起着重要的作用。由于大量非标准建筑地址和中文自然语言表达的地址存在语义歧义,传统的基于字符串匹配的方法难以满足任务要求。针对上述问题,本文提出了一种基于哈希图原理和词频理论的中文非标准建筑地址标准化联合学习方法。在众包技术构建的数据集上的实验结果表明,该方法对不同来源的数据具有出色的准确性和适应性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信