D. Benhaddouche, Mohamed Tekkouk, Abdelghani Chernnouf Youcef
{"title":"Extracting Geographic Knowledge from Wikipedia","authors":"D. Benhaddouche, Mohamed Tekkouk, Abdelghani Chernnouf Youcef","doi":"10.1145/3330089.3330128","DOIUrl":null,"url":null,"abstract":"GIS is becoming a necessity in a wide variety of application domains and the extraction of such geographic information has taken an important part in the computer science field. This thesis has the objective of extracting geographic data from Wikipedia to make it easier for users to obtain the information they want. One problematic aspect is the large volume XML file processing, we try to use text mining and machine learning techniques to solve this problem. In this work, we present and evaluate an approach to extract geographic data from Wikipedia from a very large XML file and create a geographic databae. Our technique is to extract infoboxes from geographic articles using the supervised machine learning (SVM) technique. We create after that tables containing geographic data (name, longitude, latitude ... etc) and we make the joins between different tables that will help us to structure our result.","PeriodicalId":251275,"journal":{"name":"Proceedings of the 7th International Conference on Software Engineering and New Technologies","volume":"51 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 7th International Conference on Software Engineering and New Technologies","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3330089.3330128","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
GIS is becoming a necessity in a wide variety of application domains and the extraction of such geographic information has taken an important part in the computer science field. This thesis has the objective of extracting geographic data from Wikipedia to make it easier for users to obtain the information they want. One problematic aspect is the large volume XML file processing, we try to use text mining and machine learning techniques to solve this problem. In this work, we present and evaluate an approach to extract geographic data from Wikipedia from a very large XML file and create a geographic databae. Our technique is to extract infoboxes from geographic articles using the supervised machine learning (SVM) technique. We create after that tables containing geographic data (name, longitude, latitude ... etc) and we make the joins between different tables that will help us to structure our result.