Oluwaseyi Feyisetan, E. Simperl, Ramine Tinati, Markus Luczak-Rösch, N. Shadbolt
{"title":"Quick-and-clean extraction of linked data entities from microblogs","authors":"Oluwaseyi Feyisetan, E. Simperl, Ramine Tinati, Markus Luczak-Rösch, N. Shadbolt","doi":"10.1145/2660517.2660527","DOIUrl":"https://doi.org/10.1145/2660517.2660527","url":null,"abstract":"In this paper, we address the problem of finding Named Entities in very large micropost datasets. We propose methods to generate a sample of representative microposts by discovering tweets that are likely to refer to new entities. Our approach is able to significantly speed-up the semantic analysis process by discarding retweets, tweets without pre-identifiable entities, as well similar and redundant tweets, while retaining information content.\u0000 We apply the approach on a corpus of 1:4 billion microposts, using the IE services of AlchemyAPI, Calais, and Zemanta to identify more than 700,000 unique entities. For the evaluation we compare runtime and number of entities extracted based on the full and the downscaled version of a micropost set. We are able to demonstrate that for datasets of more than 10 million tweets we can achieve a reduction in size of more than 80% while maintaining up to 60% coverage on unique entities cumulatively discovered by the three IE tools.\u0000 We publish the resulting Twitter metadata as Linked Data using SIOC and an extension of the NERD core ontology.","PeriodicalId":344435,"journal":{"name":"Joint Conference on Lexical and Computational Semantics","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133834191","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A comparison of supervised learning classifiers for link discovery","authors":"Tommaso Soru, A. N. Ngomo","doi":"10.1145/2660517.2660532","DOIUrl":"https://doi.org/10.1145/2660517.2660532","url":null,"abstract":"The detection of links between resources is intrinsic to the vision of the Linked Data Web. Due to the mere size of current knowledge bases, this task is commonly addressed by using tools. In particular, manifold link discovery frameworks have been developed. These frameworks implement several different machine-learning approaches to discovering links. In this paper, we investigate which of the commonly used supervised machine-learning classifiers performs best on the link discovery task. To this end, we first present our evaluation pipeline. Then, we compare ten different approaches on three artificial and three real-world benchmark data sets. The classification outcomes are subsequently compared with several state-of-the-art frameworks. Our results suggest that while several algorithms perform well, multilayer perceptrons perform best on average. Moreover, logistic regression seems best suited for noisy data.","PeriodicalId":344435,"journal":{"name":"Joint Conference on Lexical and Computational Semantics","volume":"144 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125225731","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Milos Jovanovik, M. Petrov, Bojan Najdenov, D. Trajanov
{"title":"Linked music data from global music charts","authors":"Milos Jovanovik, M. Petrov, Bojan Najdenov, D. Trajanov","doi":"10.1145/2660517.2660536","DOIUrl":"https://doi.org/10.1145/2660517.2660536","url":null,"abstract":"Accessing data on the Web in order to obtain useful information has been a challenge in the past decade. The technologies of the Semantic Web have enabled the creation of the Linked Data Cloud, as a concrete materialization of the idea to transform the Web from a web of documents into a web of data. The Linked Data concept has introduced new ways of publishing, interlinking and using data from various distributed data sources, over the existing Web infrastructure. On the other hand, music represents a big part of the everyday life for many people in the world, and therefore, understandably, the Web contains loads of data from the music domain. Given the fact that Linked Data enables new, advanced use-case scenarios, the music domain and its users can also benefit from this new data concept. Besides being provided with additional information about their favorite artists and songs, the users can also potentially get an overview of the dynamics of the global music playlists and charts, from the aspects of artists, countries, genres, etc. In this paper, we describe the process of transforming one- and two-star music playlists and charts data from various global radio stations, into five-star Linked Data, in order to demonstrate these benefits. We also present the design of our Playlist Ontology necessary for our data model. We then demonstrate -- via SPARQL queries and a web application -- some of the new use-case scenarios for the users over the published linked dataset, which are otherwise not available over the isolated datasets on the Web.","PeriodicalId":344435,"journal":{"name":"Joint Conference on Lexical and Computational Semantics","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114722493","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Defining expressive access policies for linked data using the ODRL ontology 2.0","authors":"Simon Steyskal, A. Polleres","doi":"10.1145/2660517.2660530","DOIUrl":"https://doi.org/10.1145/2660517.2660530","url":null,"abstract":"Together with the latest efforts in publishing Linked (Open) Data, legal issues around publishing and consuming such data are gaining increased interest. Particular areas of interest include (i) how to define more expressive access policies which go beyond common licenses, (ii) how to introduce pricing models for online datasets (for non-open data) and (iii) how to realize (i)+(ii) while providing descriptions of respective meta data that is both human readable and machine processable. In this paper, we show based on different examples that the Open Digital Rights Language (ODRL) Ontology 2.0 is able to address all previous mentioned issues, i.e. is suitable to express a large variety of different access policies for Linked Data. By defining policies as ODRL in RDF we aim for (i) higher flexibility and simplicity in usage, (ii) machine/human readability and (iii) fine-grained policy expressions for Linked (Open) Data.","PeriodicalId":344435,"journal":{"name":"Joint Conference on Lexical and Computational Semantics","volume":"331 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122142318","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Timm Heuss, B. Humm, Christian Henninger, Thomas Rippl
{"title":"A comparison of NER tools w.r.t. a domain-specific vocabulary","authors":"Timm Heuss, B. Humm, Christian Henninger, Thomas Rippl","doi":"10.1145/2660517.2660520","DOIUrl":"https://doi.org/10.1145/2660517.2660520","url":null,"abstract":"In this paper we compare several state-of-the-art Linked Data Knowledge Extraction tools, with regard to their ability to recognise entities of a controlled, domain-specific vocabulary. This includes tools that offer APIs as a Service, locally installed platforms as well as an UIMA-based approach as reference. We evaluate under realistic conditions, with natural language source texts from keywording experts of the Städel Museum Frankfurt. The goal is to find first hints which tool approach or strategy is more convincing in case of a domain specific tagging/annotation, towards a working solution that is demanded by GLAMs world-wide.","PeriodicalId":344435,"journal":{"name":"Joint Conference on Lexical and Computational Semantics","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132243847","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SemCCM: course and competence management in learning management systems using semantic web technologies","authors":"Ana Gjorgjevik, Riste Stojanov, D. Trajanov","doi":"10.1145/2660517.2660535","DOIUrl":"https://doi.org/10.1145/2660517.2660535","url":null,"abstract":"The knowledge embedded into the Learning Management Systems (LMSs) contains great potential, but currently is not utilized well enough because the learning content is mainly tailored for human understanding and not for computer processing. The courses in the LMSs cover certain set of topics that are usually exposed through a few general keywords and areas. In this paper the SemCCM system that utilizes the state of the art Semantic Web tools, methods and datasets for automatic semantic annotation of LMS courses is presented. The SemCCM system complements the LMSs through extraction and ranking of the relevant DBpedia resources for each of the courses, and uses their Wikipedia categories for more general area determination. The extracted DBpedia resources, together with their categories represent the specific topics covered by the courses and provide more accurate course retrieval. Together with the users' completed courses, the extracted DBpedia resources are used for determination of the users' competencies. The SemCCM system presents the analysis results to the end users in several different perspectives, enabling semantically enhanced course and user search, graph based course and competence overview, as well as user comparison.","PeriodicalId":344435,"journal":{"name":"Joint Conference on Lexical and Computational Semantics","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126803599","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Towards question answering on statistical linked data","authors":"Konrad Höffner, Jens Lehmann","doi":"10.1145/2660517.2660521","DOIUrl":"https://doi.org/10.1145/2660517.2660521","url":null,"abstract":"As an increasing amount of statistical data is published as linked data, intuitive ways of satisfying information needs and getting new insights out of the data become more and more important. Question answering systems provide such an intuitive interface by translating natural language queries into SPARQL, which is the native query language of RDF knowledge bases. Statistical data, however, is structurally very different from other data and cannot be queried using existing approaches. We analyze the particularities of statistical data represented in the RDF Data Cube Vocabulary in relation to question answering and sketch a new question answering algorithm on statistical data. In order to estimate typical user questions, a statistical question corpus is compiled and its elements are categorized.","PeriodicalId":344435,"journal":{"name":"Joint Conference on Lexical and Computational Semantics","volume":"8 28","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113933131","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Representing dataset quality metadata using multi-dimensional views","authors":"Jeremy Debattista, C. Lange, S. Auer","doi":"10.1145/2660517.2660525","DOIUrl":"https://doi.org/10.1145/2660517.2660525","url":null,"abstract":"Data quality is commonly defined as fitness for use. The problem of identifying quality of data is faced by many data consumers. Data publishers often do not have the means to identify quality problems in their data. To make the task for both stakeholders easier, we have developed the Dataset Quality Ontology (daQ). daQ is a core vocabulary for representing the results of quality benchmarking of a linked dataset. It represents quality metadata as multi-dimensional and statistical observations using the Data Cube vocabulary. Quality metadata are organised as a self-contained graph, which can, e.g., be embedded into linked open datasets. We discuss the design considerations, give examples for extending daQ by custom quality metrics, and present use cases such as analysing data versions, browsing datasets by quality, and link identification. We finally discuss how data cube visualisation tools enable data publishers and consumers to analyse better the quality of their data.","PeriodicalId":344435,"journal":{"name":"Joint Conference on Lexical and Computational Semantics","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133033468","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Software Engineering and Middleware, 4th International Workshop, SEM 2004,Linz, Austria, September 20-21, 2004, Revised Selected Papers","authors":"T. Gschwind, C. Mascolo","doi":"10.1007/b107130","DOIUrl":"https://doi.org/10.1007/b107130","url":null,"abstract":"","PeriodicalId":344435,"journal":{"name":"Joint Conference on Lexical and Computational Semantics","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127088615","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An extensible, lightweight architecture for adaptive J2EE applications","authors":"I. Gorton, Y. Liu, Nihar Trivedi","doi":"10.1145/1210525.1210537","DOIUrl":"https://doi.org/10.1145/1210525.1210537","url":null,"abstract":"Server applications with adaptive behaviors can adapt their functionality in response to environmental changes, and significantly reduce the on-going costs of system deployment and administration. However, developing adaptive server applications is challenging due to the complexity of server technologies and highly dynamic application environments. This paper presents an architecture framework, known as the Adaptive Server Framework (ASF). ASF provides a clear separation between the implementation of adaptive behaviors and the server application business logic. This means a server application can be cost effectively extended with programmable adaptive features through the definition and implementation of control components defined in ASF. Furthermore, ASF is a lightweight architecture in that it incurs low CPU overhead and memory usage. We demonstrate the effectiveness of ASF through a case study, in which a server application dynamically determines the resolution and quality to scale an image based on the load of the server and network connection speed. The experimental evaluation demonstrates the performance gains possible by adaptive behaviors and the low overhead introduced by ASF.","PeriodicalId":344435,"journal":{"name":"Joint Conference on Lexical and Computational Semantics","volume":"128 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121190593","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}