Proceedings of the International Workshop on Semantic Big Data最新文献

What is the schema of your knowledge graph?: leveraging knowledge graph embeddings and clustering for expressive taxonomy learning 你的知识图谱的图式是什么?:利用知识图嵌入和聚类进行表达性分类学习

Proceedings of the International Workshop on Semantic Big Data Pub Date : 2020-06-14 DOI: 10.1145/3391274.3393637

A. Zouaq, Félix Martel

引用次数: 12

Relaxing global-as-view in mediated data integration from linked data 在关联数据的中介数据集成中放松全局即视图

Proceedings of the International Workshop on Semantic Big Data Pub Date : 2020-06-14 DOI: 10.1145/3391274.3393635

A. Adamou, M. d’Aquin

{"title":"Relaxing global-as-view in mediated data integration from linked data","authors":"A. Adamou, M. d’Aquin","doi":"10.1145/3391274.3393635","DOIUrl":"https://doi.org/10.1145/3391274.3393635","url":null,"abstract":"In scenarios where many different, independent and dynamic data sources need to be brought together, mediated data integration at runtime is rapidly gaining interest. In a global-as-view approach, schema mappings express how to get data from each data source according to the global schema of the mediator. Key issues include the effort required to include and map new data sources, and the very need of data sources for the global schema to be expressed. It has been argued that the principles of Linked Data can be used to spread the cost of adding new sources in a pay-as-you-go model. We contribute by describing a data integration framework able to mitigate these issues, by relating data sources under a global schema which is implicit and only partly known at the time a new data source joins. Mappings over a data source only require partial knowledge of it and of the part of the global schema that it will affect. Pay-as-you go can then be employed to guarantee eventual schema compliance. This approach was adopted in a large-scale data integration system for Smart Cities, where it allowed short time-to-publish for new data and iterative schema refinements.","PeriodicalId":210506,"journal":{"name":"Proceedings of the International Workshop on Semantic Big Data","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134313783","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Towards making sense of Spark-SQL performance for processing vast distributed RDF datasets 了解Spark-SQL处理大量分布式RDF数据集的性能

Proceedings of the International Workshop on Semantic Big Data Pub Date : 2020-06-14 DOI: 10.1145/3391274.3393632

Mohamed Ragab, Riccardo Tommasini, Sadiq Eyvazov, S. Sakr

{"title":"Towards making sense of Spark-SQL performance for processing vast distributed RDF datasets","authors":"Mohamed Ragab, Riccardo Tommasini, Sadiq Eyvazov, S. Sakr","doi":"10.1145/3391274.3393632","DOIUrl":"https://doi.org/10.1145/3391274.3393632","url":null,"abstract":"Recently, a wide range of Web applications (e.g. DBPedia, Uniprot, and Probase) are built on top of vast RDF knowledge bases and using the SPARQL query language. The continuous growth of these knowledge bases led to the investigation of new paradigms and technologies for storing, accessing, and querying RDF data. In practice, modern big data systems (e.g, Hadoop, Spark) can handle vast relational repositories, however, their application in the Semantic Web context is still limited. One possible reason is that such frameworks rely on distributed systems, which are good for relational data, however, their performance on dealing with graph data models like RDF has not been well-studied yet. In this paper, we present a systematic evaluation of the performance of SparkSQL engine for processing SPARQL queries. We stated it using three relevant RDF relational schemas, and two different storage backends, namely, Hive, and HDFS. In addition, we show the impact of using three different RDF-based partitioning techniques with our relational scenario. Additionally, we discuss the results of our experiments: (i) we present insights about the trade-offs that characterize different experimental configurations, and (ii) we identify the best and the worst ones for the SP2Bench's benchmark scenario.","PeriodicalId":210506,"journal":{"name":"Proceedings of the International Workshop on Semantic Big Data","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123491857","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Ten ways of leveraging ontologies for natural language processing and its enterprise applications 利用本体进行自然语言处理及其企业应用的十种方法

Proceedings of the International Workshop on Semantic Big Data Pub Date : 2020-06-14 DOI: 10.1145/3391274.3393639

T. Erekhinskaya, D. Strebkov, Sujal Patel, Mithun Balakrishna, M. Tatu, D. Moldovan

{"title":"Ten ways of leveraging ontologies for natural language processing and its enterprise applications","authors":"T. Erekhinskaya, D. Strebkov, Sujal Patel, Mithun Balakrishna, M. Tatu, D. Moldovan","doi":"10.1145/3391274.3393639","DOIUrl":"https://doi.org/10.1145/3391274.3393639","url":null,"abstract":"In the last years, Artificial Intelligence and Deep Learning have matured from a facinating research area to real-word applications across multiple domains. Enterprises adopt data-driven approaches for various use cases. With the increased adoption, such issues as governance of the models, deployment, scalability, reusablity and maintenance are widely addressed on the engineering side, but not so much on the knowledge side. In this paper, we demonstrate 10 ways of leveraging ontology for Natural Language Processing. Specifically, we explore the usage of ontologies and related standards for labeling schema, configuration, providing lexical data, powering rule engine and automated generation of rules, as well as providing a standard output format. Additionally, we discuss three NLP-based applications: semantic search, question answering and natural language querying and show how they can benefit from ontology usage. The paper summarizes our experience of using ontology in a number of projects for medical, enterprise, financial, legal and security domains.","PeriodicalId":210506,"journal":{"name":"Proceedings of the International Workshop on Semantic Big Data","volume":"104 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134251831","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Triag, a framework based on triangles of RDF triples Triag，一个基于RDF三元组三角形的框架

Proceedings of the International Workshop on Semantic Big Data Pub Date : 2020-06-14 DOI: 10.1145/3391274.3393634

Hubert Naacke, Olivier Curé

引用次数: 3

Automated ontology-based annotation of scientific literature using deep learning 使用深度学习的基于本体的科学文献自动注释

Proceedings of the International Workshop on Semantic Big Data Pub Date : 2020-05-25 DOI: 10.1145/3391274.3393636

Prashanti Manda, S. SayedAhmed, S. Mohanty

{"title":"Automated ontology-based annotation of scientific literature using deep learning","authors":"Prashanti Manda, S. SayedAhmed, S. Mohanty","doi":"10.1145/3391274.3393636","DOIUrl":"https://doi.org/10.1145/3391274.3393636","url":null,"abstract":"Representing scientific knowledge using ontologies enables data integration, consistent machine-readable data representation, and allows for large-scale computational analyses. Text mining approaches that can automatically process and annotate scientific literature with ontology concepts are necessary to keep up with the rapid pace of scientific publishing. Here, we present deep learning models (Gated Recurrent Units (GRU) and Long Short Term Memory (LSTM)) combined with different input encoding formats for automated Named Entity Recognition (NER) of ontology concepts from text. The Colorado Richly Annotated Full Text (CRAFT) gold standard corpus was used to train and test our models. Precision, Recall, F-1, and Jaccard semantic similarity were used to evaluate the performance of the models. We found that GRU-based models outperform LSTM models across all evaluation metrics. Surprisingly, considering the top two probabilistic predictions of the model for each instance instead of the top one resulted in a substantial increase in accuracy. Inclusion of ontology semantics via subsumption reasoning yielded modest performance improvement.","PeriodicalId":210506,"journal":{"name":"Proceedings of the International Workshop on Semantic Big Data","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116621820","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

QSGG

Proceedings of the International Workshop on Semantic Big Data Pub Date : 2020-05-25 DOI: 10.1145/3391274.3393638

S. Böttcher, Rita Hartel, S. Peeters

引用次数: 0

SustainOnt: an ontology for defining an index of neighborhood sustainability across domains SustainOnt:用于定义跨域邻居可持续性指数的本体

Proceedings of the International Workshop on Semantic Big Data Pub Date : 2020-05-25 DOI: 10.1145/3391274.3393640

Vatricia Edgar, Cecilia La Place, Julia Schmidt, A. Bansal, S. Bansal

{"title":"SustainOnt: an ontology for defining an index of neighborhood sustainability across domains","authors":"Vatricia Edgar, Cecilia La Place, Julia Schmidt, A. Bansal, S. Bansal","doi":"10.1145/3391274.3393640","DOIUrl":"https://doi.org/10.1145/3391274.3393640","url":null,"abstract":"Massive amounts of data, both structured and unstructured, are available to be harvested for competitive business advantage, sound government policies, and new insights in a broad array of applications. This paper specifically focuses on extraction, integration, and querying of open data available about environmental sustainability. The global trend toward urbanization has created a need for residents of urban neighborhoods to better understand the factors impacting the social, environmental, and economic sustainability of an area. To date, there is no concise representation of all aspects of sustainability. This paper aims to fill this gap. A version of sustainability resting on economic, societal, and environmental development as the three main indicators was chosen to inform an ontology called SustainOnt used to organize and analyze relevant data from various sources. The newly-linked data is made available through a dual-platform application aimed at reaching a wide array of audiences. An initial prototype has been designed, using data for a small region, to provide a sustainability index of each city and/or neighborhood area that can be more accessible to people without the means to directly analyze the available data.","PeriodicalId":210506,"journal":{"name":"Proceedings of the International Workshop on Semantic Big Data","volume":"225 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121477117","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Data placement strategies that speed-up distributed graph query processing 加速分布式图查询处理的数据放置策略

Proceedings of the International Workshop on Semantic Big Data Pub Date : 2020-04-05 DOI: 10.1145/3391274.3393633

Daniel Janke, Steffen Staab, Martin Leinberger

{"title":"Data placement strategies that speed-up distributed graph query processing","authors":"Daniel Janke, Steffen Staab, Martin Leinberger","doi":"10.1145/3391274.3393633","DOIUrl":"https://doi.org/10.1145/3391274.3393633","url":null,"abstract":"We consider the problem how to optimize the data distribution to improve the query performance in distributed RDF stores running on compute node clusters. When hash-based data distribution strategies are used, the query workload tends to be equally balanced among all compute nodes whereas graph-clustering-based approaches reduce the number of transferred intermediate results. Our hypothesis is that data distribution strategies that collocate entities in small sets of closely connected data items may be able to combine the advantages of both strategies. To investigate this hypothesis, we analyze two such data distribution strategies: 1. Overpartitioned minimal edge-cut cover. 2. Our novel molecule hash cover. Our analysis substantiates our hypothesis by explaining the causes for their good performance. Both strategies reduce query execution time on our set of test queries (between 5% and 98%). While overpartitioned minimal edge-cut cover fares best, when it can be computed, it may lack scalability for large datasets. Our novel molecule hash cover combines scalability and major improvements of query execution time against various baseline strategies.","PeriodicalId":210506,"journal":{"name":"Proceedings of the International Workshop on Semantic Big Data","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133749048","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Proceedings of the International Workshop on Semantic Big Data 语义大数据国际研讨会论文集

Proceedings of the International Workshop on Semantic Big Data Pub Date : 2018-06-10 DOI: 10.1145/3208352

引用次数: 0