{"title":"面向用户行为挖掘的地学数据共享门户数据预处理框架","authors":"Mo Wang, Juanle Wang","doi":"10.1109/GEOINFORMATICS.2015.7378637","DOIUrl":null,"url":null,"abstract":"Science data sharing has many advantages for both scientific research and education. Knowing about behaviors of science data sharing participants is valuable to support informed decision making on data sharing policy and data sharing website design. Nowadays, data sharing is mainly carried through the Internet, and web usage mining provides an ideal approach to uncover user behaviors of data sharing. This paper presents a data preprocessing framework for further user behavior mining of a geoscience data sharing portal (geodata.cn). The preprocessing steps included data cleaning, user identification, session identification, and data modeling. Web server logs served as the major data source of this study. Heuristic algorithms were employed to accomplish data cleaning and user identification. Different session identification methods were applied for comparison. Users' geolocation were identified using an online Geo-IP lookup tool, which provides geographical coordinates of an IP address. On the basis of all the preprocessing procedures, a web usage data model of science data sharing portal were proposed for further user behavior mining, such as user classification and spatial association rules mining.","PeriodicalId":371399,"journal":{"name":"2015 23rd International Conference on Geoinformatics","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A data preprocessing framework of geoscience data sharing portal for user behavior mining\",\"authors\":\"Mo Wang, Juanle Wang\",\"doi\":\"10.1109/GEOINFORMATICS.2015.7378637\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Science data sharing has many advantages for both scientific research and education. Knowing about behaviors of science data sharing participants is valuable to support informed decision making on data sharing policy and data sharing website design. Nowadays, data sharing is mainly carried through the Internet, and web usage mining provides an ideal approach to uncover user behaviors of data sharing. This paper presents a data preprocessing framework for further user behavior mining of a geoscience data sharing portal (geodata.cn). The preprocessing steps included data cleaning, user identification, session identification, and data modeling. Web server logs served as the major data source of this study. Heuristic algorithms were employed to accomplish data cleaning and user identification. Different session identification methods were applied for comparison. Users' geolocation were identified using an online Geo-IP lookup tool, which provides geographical coordinates of an IP address. On the basis of all the preprocessing procedures, a web usage data model of science data sharing portal were proposed for further user behavior mining, such as user classification and spatial association rules mining.\",\"PeriodicalId\":371399,\"journal\":{\"name\":\"2015 23rd International Conference on Geoinformatics\",\"volume\":\"10 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-06-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 23rd International Conference on Geoinformatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/GEOINFORMATICS.2015.7378637\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 23rd International Conference on Geoinformatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/GEOINFORMATICS.2015.7378637","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A data preprocessing framework of geoscience data sharing portal for user behavior mining
Science data sharing has many advantages for both scientific research and education. Knowing about behaviors of science data sharing participants is valuable to support informed decision making on data sharing policy and data sharing website design. Nowadays, data sharing is mainly carried through the Internet, and web usage mining provides an ideal approach to uncover user behaviors of data sharing. This paper presents a data preprocessing framework for further user behavior mining of a geoscience data sharing portal (geodata.cn). The preprocessing steps included data cleaning, user identification, session identification, and data modeling. Web server logs served as the major data source of this study. Heuristic algorithms were employed to accomplish data cleaning and user identification. Different session identification methods were applied for comparison. Users' geolocation were identified using an online Geo-IP lookup tool, which provides geographical coordinates of an IP address. On the basis of all the preprocessing procedures, a web usage data model of science data sharing portal were proposed for further user behavior mining, such as user classification and spatial association rules mining.