Marilena Ditta, Fabrizio Milazzo, V. Ravì, G. Pilato, A. Augello
{"title":"Data-driven relation discovery from unstructured texts","authors":"Marilena Ditta, Fabrizio Milazzo, V. Ravì, G. Pilato, A. Augello","doi":"10.5220/0005614205970602","DOIUrl":"https://doi.org/10.5220/0005614205970602","url":null,"abstract":"This work proposes a data driven methodology for the extraction of subject-verb-object triplets from a text corpus. Previous works on the field solved the problem by means of complex learning algorithms requiring hand-crafted examples; our proposal completely avoids learning triplets from a dataset and is built on top of a well-known baseline algorithm designed by Delia Rusu et al.. The baseline algorithm uses only syntactic information for generating triplets and is characterized by a very low precision i.e., very few triplets are meaningful. Our idea is to integrate the semantics of the words with the aim of filtering out the wrong triplets, thus increasing the overall precision of the system. The algorithm has been tested over the Reuters Corpus and has it as shown good performance with respect to the baseline algorithm for triplet extraction.","PeriodicalId":102743,"journal":{"name":"2015 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K)","volume":"121 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123579014","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Use of frequent itemset mining techniques to analyze business processes","authors":"Vladimír Bartík, Milan Pospísil","doi":"10.5220/0005598102730280","DOIUrl":"https://doi.org/10.5220/0005598102730280","url":null,"abstract":"Analysis of business process data can be used to discover reasons of delays and other problems in a business process. This paper presents an approach, which uses a simulator of production history. This simulator allows detecting problems at various production machines, e.g. extremely long queues of products waiting before a machine. After detection, data about products processed before the queue increased are collected. Frequent itemsets obtained from this dataset can be used to describe the problem and reasons of it. The whole process of frequent itemset mining will be described in this paper. It is also focused on description of several necessary modifications of basic methods usually used to discover frequent itemsets.","PeriodicalId":102743,"journal":{"name":"2015 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128526608","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jean-François Viaud, K. Bertet, C. Demko, R. Missaoui
{"title":"The reverse doubling construction","authors":"Jean-François Viaud, K. Bertet, C. Demko, R. Missaoui","doi":"10.5220/0005613203500357","DOIUrl":"https://doi.org/10.5220/0005613203500357","url":null,"abstract":"It is well known inside the Formal Concept Analysis (FCA) community that a concept lattice could have an exponential size in the data. Hence, the size of concept lattices is a critical issue in the presence of large real-life data sets. In this paper, we propose to investigate factor lattices as a tool to get meaningful parts of the whole lattice. These factor lattices have been widely studied from the early theory of lattices to more recent work in the FCA field. This paper contains two parts. The first one gives background about lattice theory and formal concept analysis, and mainly compatible sub-contexts, arrow-closed sub-contexts and congruence relations. The second part presents a new decomposition called “reverse doubling construction” that exploits the above three notions used for the doubling convex construction investigated by Day. Theoretical results and their proofs are given as well as an illustrative example.","PeriodicalId":102743,"journal":{"name":"2015 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K)","volume":"122 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132444748","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Early diagnosis of Alzheimer's disease using machine learning techniques: A review paper","authors":"Aunsia Khan, Muhammad Usman","doi":"10.5220/0005615203800387","DOIUrl":"https://doi.org/10.5220/0005615203800387","url":null,"abstract":"Alzheimer's, an irreparable brain disease, impairs thinking and memory while the aggregate mind size shrinks which at last prompts demise. Early diagnosis of AD is essential for the progress of more prevailing treatments. Machine learning (ML), a branch of artificial intelligence, employs a variety of probabilistic and optimization techniques that permits PCs to gain from vast and complex datasets. As a result, researchers focus on using machine learning frequently for diagnosis of early stages of AD. This paper presents a review, analysis and critical evaluation of the recent work done for the early detection of AD using ML techniques. Several methods achieved promising prediction accuracies, however they were evaluated on different pathologically unproven data sets from different imaging modalities making it difficult to make a fair comparison among them. Moreover, many other factors such as pre-processing, the number of important attributes for feature selection, class imbalance distinctively affect the assessment of the prediction accuracy. To overcome these limitations, a model is proposed which comprise of initial pre-processing step followed by imperative attributes selection and classification is achieved using association rule mining. Furthermore, this proposed model based approach gives the right direction for research in early diagnosis of AD and has the potential to distinguish AD from healthy controls.","PeriodicalId":102743,"journal":{"name":"2015 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126829687","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Bringing search engines to the cloud using open source components","authors":"K. Nagi","doi":"10.5220/0005632701160126","DOIUrl":"https://doi.org/10.5220/0005632701160126","url":null,"abstract":"The usage of search engines is nowadays extended to do intelligent analytics of petabytes of data. With Lucene being at the heart of the vast majority of information retrieval systems, several attempts are made to bring it to the cloud in order to scale to big data. Efforts include implementing scalable distribution of the search indices over the file system, storing them in NoSQL databases, and porting them to inherently distributed ecosystems, such as Hadoop. We evaluate the existing efforts in terms of distribution, high availability, fault tolerance, manageability, and high performance. We believe that the key to supporting search indexing capabilities for big data can only be achieved through the use of common open-source technology to be deployed on standard cloud platforms such as Amazon EC2, Microsoft Azure, etc. For each approach, we build a benchmarking system by indexing the whole Wikipedia content and submitting hundreds of simultaneous search requests. We measure the performance of both indexing and searching operations. We stimulate node failures and monitor the recoverability of the system. We show that a system built on top of Solr and Hadoop has the best stability and manageability; while systems based on NoSQL databases present an attractive alternative in terms of performance.","PeriodicalId":102743,"journal":{"name":"2015 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116556357","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
K. Nagao, Keisuke Inoue, Naoya Morita, S. Matsubara
{"title":"Automatic extraction of task statements from structured meeting content","authors":"K. Nagao, Keisuke Inoue, Naoya Morita, S. Matsubara","doi":"10.5220/0005609703070315","DOIUrl":"https://doi.org/10.5220/0005609703070315","url":null,"abstract":"We previously developed a discussion mining system that records face-to-face meetings in detail, analyzes their content, and conducts knowledge discovery. Looking back on past discussion content by browsing documents, such as minutes, is an effective means for conducting future activities. In meetings at which some research topics are regularly discussed, such as seminars in laboratories, the presenters are required to discuss future issues by checking urgent matters from the discussion records. We call statements including advice or requests proposed at previous meetings “task statements” and propose a method for automatically extracting them. With this method, based on certain semantic attributes and linguistic characteristics of statements, a probabilistic model is created using the maximum entropy method. A statement is judged whether it is a task statement according to its probability. A seminar-based experiment validated the effectiveness of the proposed extraction method.","PeriodicalId":102743,"journal":{"name":"2015 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127612970","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Arabic sentiment analysis using WEKA a hybrid learning approach","authors":"S. Alhumoud, Tarfa Albuhairi, Mawaheb Altuwaijri","doi":"10.5220/0005616004020408","DOIUrl":"https://doi.org/10.5220/0005616004020408","url":null,"abstract":"Data has become the currency of this era and it is continuing to massively increase in size and generation rate. Large data generated out of organisations' e-transactions or individuals through social networks could be of a great value when analysed properly. This research presents an implementation of a sentiment analyser for Twitter's tweets which is one of the biggest public and freely available big data sources. It analyses Arabic, Saudi dialect tweets to extract sentiments toward a specific topic. It used a dataset consisting of 3000 tweets collected from Twitter. The collected tweets were analysed using two machine learning approaches, supervised which is trained with the dataset collected and the proposed hybrid learning which is trained on a single words dictionary. Two algorithms are used, Support Vector Machine (SVM) and K-Nearest Neighbors (KNN). The obtained results by the cross validation on the same dataset clearly confirm the superiority of the hybrid learning approach over the supervised approach.","PeriodicalId":102743,"journal":{"name":"2015 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K)","volume":"144 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115562052","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Comparison of sampling size estimation techniques for association rule mining","authors":"Tugba Halici, U. Ketenci","doi":"10.5220/0005589801950202","DOIUrl":"https://doi.org/10.5220/0005589801950202","url":null,"abstract":"Fast and complete retrieval of individual customer needs and “to the point” product offers are crucial aspects of customer satisfaction in todays' highly competitive banking sector. Growing number of transactions and customers have excessively boosted the need for time and memory in market basket analysis. In this paper, sampling process is included into analysis aiming to increase the performance of a product offer system. The core logic of a sample, is to dig for smaller representative of the universe, that is to generate accurate association rules. A smaller sample of the universe reduces the elapsed time and the memory consumption devoted to market basket analysis. Based on this content; the sampling methods, the sampling size estimation techniques and the representativeness tests are examined. The technique, which gives complete set of association rules in a reduced amount of time, is suggested for sampling retail banking data.","PeriodicalId":102743,"journal":{"name":"2015 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K)","volume":"abs/1606.08164 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125243865","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Piecewise Chebyshev factorization based nearest neighbour classification for time series","authors":"Qinglin Cai, Ling Chen, Jianling Sun","doi":"10.5220/0005611900840091","DOIUrl":"https://doi.org/10.5220/0005611900840091","url":null,"abstract":"In the research field of time series analysis and mining, the nearest neighbour classifier (1NN) based on dynamic time warping distance (DTW) is well known for its high accuracy. However, the high computational complexity of DTW can lead to the expensive time consumption of classification. An effective solution is to compute DTW in the piecewise approximation space (PA-DTW), which transforms the raw data into the feature space based on segmentation, and extracts the discriminatory features for similarity measure. However, most of existing piecewise approximation methods need to fix the segment length, and focus on the simple statistical features, which would influence the precision of PA-DTW. To address this problem, we propose a novel piecewise factorization model for time series, which uses an adaptive segmentation method and factorizes the subsequences with the Chebyshev polynomials. The Chebyshev coefficients are extracted as features for PA-DTW measure (ChebyDTW), which are able to capture the fluctuation information of time series. The comprehensive experimental results show that ChebyDTW can support the accurate and fast 1NN classification.","PeriodicalId":102743,"journal":{"name":"2015 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132110828","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Word Sense Discrimination on tweets: A graph-based approach","authors":"F. M. Cecchini, E. Fersini, E. Messina","doi":"10.5220/0005640501380146","DOIUrl":"https://doi.org/10.5220/0005640501380146","url":null,"abstract":"In this paper we are going to detail an unsupervised, graph-based approach for word sense discrimination on tweets. We deal with this problem by constructing a word graph of co-occurrences. By defining a distance on this graph, we obtain a word metric space, on which we can apply an aggregative algorithm for word clustering. As a result, we will get word clusters representing contexts that discriminate the possible senses of a term. We present some experimental results both on a data set consisting of tweets we collected and on the data set of task 14 at SemEval-2010.","PeriodicalId":102743,"journal":{"name":"2015 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127928355","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}