Jian Qu, Nattakarn Phaphoom, Chinorot Wangtragulsang, D. Tancharoen
{"title":"Social Media Contact Information Extraction","authors":"Jian Qu, Nattakarn Phaphoom, Chinorot Wangtragulsang, D. Tancharoen","doi":"10.23919/INCIT.2018.8584877","DOIUrl":null,"url":null,"abstract":"Extraction of personal information from unstructured data presents itself as a challenge, as location and context of the information are unpredictable. Especially in Thai language where there is no punctuation, capitalization or ending character that separate specific names from the rest of the sentence. We propose a system capable of automatically extracting named entity information from web site snippets, using Thai celebrities as the sample named entity group and then compare the system with popular celebrity websites. We have tested our method on Thai celebrities, and our method achieved better accuracy than MThai.","PeriodicalId":144271,"journal":{"name":"2018 International Conference on Information Technology (InCIT)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 International Conference on Information Technology (InCIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/INCIT.2018.8584877","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Extraction of personal information from unstructured data presents itself as a challenge, as location and context of the information are unpredictable. Especially in Thai language where there is no punctuation, capitalization or ending character that separate specific names from the rest of the sentence. We propose a system capable of automatically extracting named entity information from web site snippets, using Thai celebrities as the sample named entity group and then compare the system with popular celebrity websites. We have tested our method on Thai celebrities, and our method achieved better accuracy than MThai.