Leila Hamdad, Amine Abdaoui, Nabila Belattar, Mohamed Al Chikha
{"title":"EasySDM: An integrated and easy to use Spatial Data Mining platform","authors":"Leila Hamdad, Amine Abdaoui, Nabila Belattar, Mohamed Al Chikha","doi":"10.5220/0005615903940401","DOIUrl":"https://doi.org/10.5220/0005615903940401","url":null,"abstract":"Spatial Data Mining allows users to extract implicit but valuable knowledge from spatial related data. Two main approaches have been used in the literature. The first one applies simple Data Mining algorithms after a spatial pre-processing step. While the second one consists of developing specific algorithms that considers the spatial relations inside the mining process. In this work, we first present a study of existing Spatial Data Mining tools according to the implemented tasks and specific characteristics. Then, we illustrate a new open source Spatial Data Mining platform (EasySDM) that integrates both approaches (pre-processing and dynamic mining). It proposes a set of algorithms belonging to clustering, classification and association rule mining tasks. Moreover and more importantly, it allows geographic visualization of both the data and the results. Either via an internal map display or using any external Geographic Information System.","PeriodicalId":102743,"journal":{"name":"2015 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130742078","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Distributed data replication and access optimization for LHCb storage system: A position paper","authors":"M. Hushchyn, P. Charpentier, A. Ustyuzhanin","doi":"10.5220/0005647105370540","DOIUrl":"https://doi.org/10.5220/0005647105370540","url":null,"abstract":"This paper presents how machine learning algorithms and methods of statistics can be implemented to data management in hybrid data storage systems. Basicly, two different storage types are used to store data in the hybrid data storage systems. Keeping rarely used data on cheap and slow storages of type one and often used data on fast and expensive storages of type two helps to achieve optimal performance/cost ratio for the system. We use classification algorithms to estimate probability that the data will often used in future. Then, using the risks analysis we define where the data should be stored. We show how to estimate optimal number of replicas of the data using regression algorithms and Hidden Markov Model. Based on the probability, risks and the optimal number of data replicas our system finds optimal data distribution in the hybrid data storage system. We present the results of simulation of our method for LHCb hybrid data storage.","PeriodicalId":102743,"journal":{"name":"2015 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125817931","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Automatic tag extraction from social media for visual labeling","authors":"Shuhua Liu, Thomas Forss","doi":"10.5220/0005638505040510","DOIUrl":"https://doi.org/10.5220/0005638505040510","url":null,"abstract":"Visual labeling or automated visual annotation is of great importance to the efficient access and management of multimedia content. Many methods and techniques have been proposed for image annotation in the last decade and they have shown reasonable performance on standard datasets. Great progress has been made especially in recent couple of years with the development of deep learning models for image content analysis and extraction of content-based concept labels. However, concept objects labels are much more friendly to machine than to users. We consider that more relevant and user-friendly visual labels need to include “context” descriptors. In this study we explore the possibilities to leverage social media content as a resource for visual labeling. We developed a tag extraction system that applies heuristic rules and term weighting method to extract image tags from associated Tweet. The system retrieves tweet-image pairs from public Twitter accounts, analyzes the Tweet, and generates labels for the images. We elaborate on different visual labeling methods, tag analysis and tag refinement methods.","PeriodicalId":102743,"journal":{"name":"2015 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K)","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126172404","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SERP-level disambiguation from search results","authors":"M. Alli","doi":"10.5220/0005628606270636","DOIUrl":"https://doi.org/10.5220/0005628606270636","url":null,"abstract":"Fast growth of search engines' popularity shows the users attraction to the Web engines. However there is a chance of misinterpretation for ambiguous queries. At this point, we propose a more adherence user interface which consist of a relevant visual content as well as generating new search snippet and title. Recent researches for meeting this aim are focused on a whole page thumbnail for assisting users to remember a recently visited Web page. Withal, this is not discussed yet that how a specific visual content of a page can allow users to distinguish between a useful and worthless page in the result page especially in an ambiguous search task. Our studya shows that the improvement in both textual search snippet and title as well as the additional thumbnail were helpful for users to clarify the Search Engine Result Page (SERP) in an ambiguous search task.","PeriodicalId":102743,"journal":{"name":"2015 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K)","volume":"59 Pt 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126389788","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
G. Amati, Simone Angelini, M. Bianchi, Gianmarco Fusco, G. Gambosi, Giancarlo Gaudino, G. Marcone, Gianlu a Rossi, P. Vocca
{"title":"Moving beyond the Twitter follow graph","authors":"G. Amati, Simone Angelini, M. Bianchi, Gianmarco Fusco, G. Gambosi, Giancarlo Gaudino, G. Marcone, Gianlu a Rossi, P. Vocca","doi":"10.5220/0005616906120619","DOIUrl":"https://doi.org/10.5220/0005616906120619","url":null,"abstract":"The study of the topological properties of graphs derived from social network platforms has a great importance both from the social and from the information point of view; furthermore, it has a big impact on designing new applications and in improving already existing services. Surprisingly, the research community seems to have mainly focused its efforts just on studying the most intuitive and explicit graphs, such as the follower graph of the Twitter platform, or the Facebook friends' graph: consequently, a lot of valuable information is still hidden and it is waiting to be explored and exploited. In this paper we introduce a new type of graph modeling behavior of Twitter users: the mention graph. Then we show how to easily build instances of this graphs starting from the Twitter stream, and we report the results of an experimentation aimed to compare the proposed graph with other graphs already analyzed in the literature, by using some standard social network analysis metrics.","PeriodicalId":102743,"journal":{"name":"2015 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116761827","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"POP: A Parallel Optimized Preparation of data for data mining","authors":"Christian Ernst, Youssef Hmamouche, Alain Casali","doi":"10.5220/0005594700360045","DOIUrl":"https://doi.org/10.5220/0005594700360045","url":null,"abstract":"In light of the fact that data preparation has a substantial impact on data mining results, we provide an original framework for automatically preparing the data of any given database. Our research focuses, for each attribute of the database, on two points: (i) Specifying an optimized outlier detection method, and (ii), Identifying the most appropriate discretization method. Concerning the former, we illustrate that the detection of an outlier depends on if data distribution is normal or not. When attempting to discern the best discretization method, what is important is the shape followed by the density function of its distribution law. For this reason, we propose an automatic choice for finding the optimized discretization method based on a multi-criteria (Entropy, Variance, Stability) evaluation. Processings are performed in parallel using multicore capabilities. Conducted experiments validate our approach, showing that it is not always the very same discretization method that is the best.","PeriodicalId":102743,"journal":{"name":"2015 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K)","volume":"71 4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128010189","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Data parsing using tier grammars","authors":"Alexander Sakharov, Timothy Sakharov","doi":"10.5220/0005632804630468","DOIUrl":"https://doi.org/10.5220/0005632804630468","url":null,"abstract":"Parsing turns unstructured data into structured data suitable for knowledge discovery and querying. The complexity of grammar notations and the difficulty of grammar debugging limit the availability of data parsers. Tier grammars are defined by simply dividing terminals into predefined classes and then splitting elements of some classes into multiple layered sub-groups. The set of predefined terminal classes can be easily extended. Tier grammars and their extensions are LL(1) grammars. Tier grammars are a tool for big data preprocessing.","PeriodicalId":102743,"journal":{"name":"2015 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129721141","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
E. Kergosien, Hugo Alatrista Salas, M. Gaio, Fabio Guttler, M. Roche, M. Teisseire
{"title":"When textual information becomes spatial information compatible with satellite images","authors":"E. Kergosien, Hugo Alatrista Salas, M. Gaio, Fabio Guttler, M. Roche, M. Teisseire","doi":"10.5220/0005606903010306","DOIUrl":"https://doi.org/10.5220/0005606903010306","url":null,"abstract":"With the amount of textual data available on the web, new methodologies of knowledge extraction domain are provided. Some original methods allow the users to combine different types of data in order to extract relevant information. In this context, we present the cornerstone of manipulations on textual documents and their preparation for extracting compatible spatial information with those contained in satellite images. The term footprint is defined and its extraction is performed. In this paper, we describe the general process and some experiments conducted in the ANIMITEX project, which aims to match the information coming from texts with those of satellite images.","PeriodicalId":102743,"journal":{"name":"2015 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123857141","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"News classifications with labeled LDA","authors":"Yiqi Bai, Jie Wang","doi":"10.5220/0005610600750083","DOIUrl":"https://doi.org/10.5220/0005610600750083","url":null,"abstract":"Automatically categorizing news articles with high accuracy is an important task in an automated quick news system. We present two classifiers to classify news articles based on Labeled Latent Dirichlet Allocation, called LLDA-C and SLLDA-C. To verify classification accuracy we compare classification results obtained by the classifiers with those by trained professionals. We show that, through extensive experiments, both LLDA-C and SLLDA-C outperform SVM (Support Vector Machine, our baseline classifier) on precisions, particularly when only a small training dataset is available. SSLDA-C is also much more efficient than SVM. In terms of recalls, we show that LLDA-C is better than SVM. In terms of average Macro-F1 and Micro-F1 scores, we show that LLDA classifiers are superior over SVM. To further explore classifications of news articles we introduce the notion of content complexity, and study how content complexity would affect classifications.","PeriodicalId":102743,"journal":{"name":"2015 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K)","volume":"134 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116338277","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"CIRA: A Competitive Intelligence reference Architecture for dynamic solutions","authors":"M. Spruit, A. Cepoi","doi":"10.5220/0005597602490258","DOIUrl":"https://doi.org/10.5220/0005597602490258","url":null,"abstract":"Competitive Intelligence (CI) solutions are the key to enabling companies to stay on top of the changes in today's competitive environment. It may come as a surprise, then, that although Competitive Intelligence solutions already exist for a few decades, there is still little knowledge available regarding the implementation of an automated Competitive Intelligence solution. This research focuses on designing a Competitive Intelligence reference Architecture (CIRA) for dynamic systems. We start by collecting Key Intelligence Topics (KITs) and functional requirements based on an extensive literature review and expert interviews with companies and Competitive Intelligence professionals. Next, we design the architecture itself based on an attribute-driven design method. Then, a prototype is implemented for a company active in the maritime & offshore industry. Finally, the architecture is evaluated by industry experts and their suggestions are incorporated back in the artefact.","PeriodicalId":102743,"journal":{"name":"2015 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K)","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115600974","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}