Binh T. Nguyen, Tung Tran Nguyen Doan, S. T. Huynh, An Tran-Hoai Le, An Trong Nguyen, K. Tran, N. Ho, Trung T. Nguyen, Dang T. Huynh
{"title":"ATPM-REAP:越南房地产广告帖子的简单有效地址跟踪和解析","authors":"Binh T. Nguyen, Tung Tran Nguyen Doan, S. T. Huynh, An Tran-Hoai Le, An Trong Nguyen, K. Tran, N. Ho, Trung T. Nguyen, Dang T. Huynh","doi":"10.1109/KSE56063.2022.9953770","DOIUrl":null,"url":null,"abstract":"Real estate is an enormous and essential field in many countries. Taking advantage of helpful information from real estate advertisement posts can help better understand the market condition and explore other vital insights, especially for the Vietnamese market. It is worth noting that in the representative information of real estate, the address or the location is required information. However, there are different ways to write down the address information in Vietnam. For this reason, detecting the relevant text representing the address information from real estate advertisement posts becomes an essential and challenging task. This paper investigates the address detecting and parsing task for the Vietnamese language. First, we create a dataset of real estate advertisements having 16 different attributes (entities) of each real estate and assign the correct label for each entity detected during the data annotation process. Then, we propose a practical approach for detecting locations of possible addresses inside one specific real estate advertisement post and then extract the localized address text into four different levels of the address information: City/Province, District/Town, Ward, and Street. The experiment results indicate that the ${\\mathrm {PhoBERT}}_{bas\\mathrm{e}}$ model achieves the best performance with an F1-score of 0.8195. Finally, we compare our proposed method with other approaches and achieve the highest accuracy results for all levels as follows: City/Province (0.952), District/Town (0.9482), Ward (0.9225), Street (0.8994), and the combined accuracy of correctly detecting all four levels is 0.8367.","PeriodicalId":330865,"journal":{"name":"2022 14th International Conference on Knowledge and Systems Engineering (KSE)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"ATPM-REAP: A Simple and Efficient Address Tracking and Parsing for Vietnamese Real Estate Advertisement Posts\",\"authors\":\"Binh T. Nguyen, Tung Tran Nguyen Doan, S. T. Huynh, An Tran-Hoai Le, An Trong Nguyen, K. Tran, N. Ho, Trung T. Nguyen, Dang T. Huynh\",\"doi\":\"10.1109/KSE56063.2022.9953770\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Real estate is an enormous and essential field in many countries. Taking advantage of helpful information from real estate advertisement posts can help better understand the market condition and explore other vital insights, especially for the Vietnamese market. It is worth noting that in the representative information of real estate, the address or the location is required information. However, there are different ways to write down the address information in Vietnam. For this reason, detecting the relevant text representing the address information from real estate advertisement posts becomes an essential and challenging task. This paper investigates the address detecting and parsing task for the Vietnamese language. First, we create a dataset of real estate advertisements having 16 different attributes (entities) of each real estate and assign the correct label for each entity detected during the data annotation process. Then, we propose a practical approach for detecting locations of possible addresses inside one specific real estate advertisement post and then extract the localized address text into four different levels of the address information: City/Province, District/Town, Ward, and Street. The experiment results indicate that the ${\\\\mathrm {PhoBERT}}_{bas\\\\mathrm{e}}$ model achieves the best performance with an F1-score of 0.8195. Finally, we compare our proposed method with other approaches and achieve the highest accuracy results for all levels as follows: City/Province (0.952), District/Town (0.9482), Ward (0.9225), Street (0.8994), and the combined accuracy of correctly detecting all four levels is 0.8367.\",\"PeriodicalId\":330865,\"journal\":{\"name\":\"2022 14th International Conference on Knowledge and Systems Engineering (KSE)\",\"volume\":\"25 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-10-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 14th International Conference on Knowledge and Systems Engineering (KSE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/KSE56063.2022.9953770\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 14th International Conference on Knowledge and Systems Engineering (KSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/KSE56063.2022.9953770","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
ATPM-REAP: A Simple and Efficient Address Tracking and Parsing for Vietnamese Real Estate Advertisement Posts
Real estate is an enormous and essential field in many countries. Taking advantage of helpful information from real estate advertisement posts can help better understand the market condition and explore other vital insights, especially for the Vietnamese market. It is worth noting that in the representative information of real estate, the address or the location is required information. However, there are different ways to write down the address information in Vietnam. For this reason, detecting the relevant text representing the address information from real estate advertisement posts becomes an essential and challenging task. This paper investigates the address detecting and parsing task for the Vietnamese language. First, we create a dataset of real estate advertisements having 16 different attributes (entities) of each real estate and assign the correct label for each entity detected during the data annotation process. Then, we propose a practical approach for detecting locations of possible addresses inside one specific real estate advertisement post and then extract the localized address text into four different levels of the address information: City/Province, District/Town, Ward, and Street. The experiment results indicate that the ${\mathrm {PhoBERT}}_{bas\mathrm{e}}$ model achieves the best performance with an F1-score of 0.8195. Finally, we compare our proposed method with other approaches and achieve the highest accuracy results for all levels as follows: City/Province (0.952), District/Town (0.9482), Ward (0.9225), Street (0.8994), and the combined accuracy of correctly detecting all four levels is 0.8367.