Proceedings of the 24th Symposium on International Database Engineering & Applications最新文献

A practical application for sentiment analysis on social media textual data 情感分析在社交媒体文本数据中的实际应用

Proceedings of the 24th Symposium on International Database Engineering & Applications Pub Date : 2020-08-12 DOI: 10.1145/3410566.3410594

Colton Aarts, Fan Jiang, Liang Chen

{"title":"A practical application for sentiment analysis on social media textual data","authors":"Colton Aarts, Fan Jiang, Liang Chen","doi":"10.1145/3410566.3410594","DOIUrl":"https://doi.org/10.1145/3410566.3410594","url":null,"abstract":"With the amount of data that is available today in textual form, it is essential to be able to extract as much useful information as possible from them. While some textual documents are easy to be understood, other textual documents may need extra processes to discover the hidden information within it. For instance, how the author was feeling while writing this piece of text, or what emotions authors are expressing in this piece of text. The idea of discovering what emotions are expressed in a textual document is known as sentiment analysis. The interest in sentiment analysis has been steadily growing in the past decade. Being able to accurately detect and measure the different emotions present in a text has become more and more useful as the availability of online resources has increased. These resources can range from product reviews to social media content. Each of these resources presents their own distinct challenges while still sharing the core techniques and procedures. In this paper, we introduce an application that can detect four distinct emotions from social media posts. We will first outline the techniques we have used as well as our outcomes, then discuss the challenges that we faced, and finally, our proposed solutions for the continuation of this project.","PeriodicalId":137708,"journal":{"name":"Proceedings of the 24th Symposium on International Database Engineering & Applications","volume":"298 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123122700","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Organizing and compressing collections of files using differences 使用差异组织和压缩文件集合

Proceedings of the 24th Symposium on International Database Engineering & Applications Pub Date : 2020-08-12 DOI: 10.1145/3410566.3410584

S. Chawathe

{"title":"Organizing and compressing collections of files using differences","authors":"S. Chawathe","doi":"10.1145/3410566.3410584","DOIUrl":"https://doi.org/10.1145/3410566.3410584","url":null,"abstract":"A collection of related files often exhibits strong similarities among its constituents. These similarities, and the dual differences, may be used for both compressing the collection and for organizing it in a manner that reveals human-readable structure and relationships. This paper motivates and studies methods for such organizing and compression of file collections using inter-file differences. It presents an algorithm based on computing a minimum-weight spanning tree of a graph that has vertices corresponding to files and edges with weights corresponding to the size of the difference between the documents of its incident vertices. It describes the design and implementation of a prototype system called diboc (for difference-based organization and compression) that uses these methods to enable both compression and graphical organization and interactive exploration of a file collection. It illustrates the benefits of this system by presenting examples of its operation on a widely deployed and publicly available corpus of file collections (collections of PPD files used to configure the CUPS printing system as packaged by the Debian GNU/Linux distribution). In addition to these qualitative measures, some quantitative experimental results of applying the methods to the same corpus are also presented.","PeriodicalId":137708,"journal":{"name":"Proceedings of the 24th Symposium on International Database Engineering & Applications","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116033818","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Empowering big data analytics with polystore and strongly typed functional queries 通过多存储和强类型功能查询增强大数据分析能力

Proceedings of the 24th Symposium on International Database Engineering & Applications Pub Date : 2020-08-12 DOI: 10.1145/3410566.3410591

Annabelle Gillet, É. Leclercq, M. Savonnet, N. Cullot

{"title":"Empowering big data analytics with polystore and strongly typed functional queries","authors":"Annabelle Gillet, É. Leclercq, M. Savonnet, N. Cullot","doi":"10.1145/3410566.3410591","DOIUrl":"https://doi.org/10.1145/3410566.3410591","url":null,"abstract":"Polystores are of primary importance to tackle the diversity and the volume of Big Data, as they propose to store data according to specific use cases. Nevertheless, analytics frameworks often lack a uniform interface allowing to fully access and take advantage of the various models offered by the polystore. It also should be ensured that the typing of the algebraic expressions built with data manipulation operators can be checked and that schema can be inferred before starting to execute the operators (type-safe). Tensors are good candidates for supporting a pivot data model. They are powerful abstract mathematical objects which can embed complex relationships between entities and that are used in major analytics frameworks. However, they are far away from data models, and lack high level operators to manipulate their content, resulting in bad coding habits and less maintainability, and sometimes poor performances. With TDM (Tensor Data Model), we propose to join the best of both worlds, to take advantage of modeling capabilities of tensors by adding schema and data manipulation operators to them. We developed an implementation in Scala using Spark, providing users with a type-safe and schema inference mechanism that guarantees the technical and functional correctness of composed expressions on tensors at compile time. We show that this extension does not induce overhead and allows to outperform Spark query optimizer using bind join.","PeriodicalId":137708,"journal":{"name":"Proceedings of the 24th Symposium on International Database Engineering & Applications","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116160887","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Towards a universal approach for semantic interpretation of spreadsheets data 迈向电子表格数据语义解释的通用方法

Proceedings of the 24th Symposium on International Database Engineering & Applications Pub Date : 2020-08-12 DOI: 10.1145/3410566.3410609

N. Dorodnykh, A. Y. Yurin

{"title":"Towards a universal approach for semantic interpretation of spreadsheets data","authors":"N. Dorodnykh, A. Y. Yurin","doi":"10.1145/3410566.3410609","DOIUrl":"https://doi.org/10.1145/3410566.3410609","url":null,"abstract":"Spreadsheets are a popular way to represent and structure data and knowledge; in this connection semantic interpretation of spreadsheets data has become an active area of scientific research. In this paper, we propose a new approach for semantic interpretation of data extracted from spreadsheets with arbitrary layouts and styles. Analyzed spreadsheets are presented in the MS Excel format. In particular, our approach includes two stages: analyzing and transforming source spreadsheets to spreadsheets in a relational canonicalized form; annotating canonical spreadsheets by entities from a knowledge graph. At the first stage we use a rule-based approach implemented in the form of a domain-specific language called Cells Rule Language (CRL), and an original form of a canonical table. At the second stage we use an aggregated method for defining similarity between candidate entities and cell values that consists of the sequential application of five metrics and combining ranks obtained by each metric. Algorithms of each stage are implemented in the form of special software: TabbyXL and TabbyLD respectively. DBpedia is used as a knowledge graph. Experimental evaluations of our proposals are obtained for T2Dv2 and Troy200 corpuses, and they demonstrates the applicability of our approach and software for semantic spreadsheet data interpretation. The feature of the approach is its universality due to the use of the language for describing spreadsheets transformation rules, as well as an original canonical form. This feature provides processing large volumes of heterogeneous spreadsheets in various domains. This work is a part of the Tabby research project for software development of recognition, extraction, transformation and interpretation of data from spreadsheet tables with arbitrary layouts and styles.","PeriodicalId":137708,"journal":{"name":"Proceedings of the 24th Symposium on International Database Engineering & Applications","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128004340","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Lifting preferences to the semantic web: PreferenceSPARQL 将首选项提升到语义web: PreferenceSPARQL

Proceedings of the 24th Symposium on International Database Engineering & Applications Pub Date : 2020-08-12 DOI: 10.1145/3410566.3410590

M. Endres, Stefan Schödel, Klaus Emathinger

引用次数: 1

Implementation of dynamic page generation for stream data by SuperSQL 用SuperSQL实现流数据的动态页面生成

Proceedings of the 24th Symposium on International Database Engineering & Applications Pub Date : 2020-08-12 DOI: 10.1145/3410566.3410607

Keita Terui, Kento Goto, Motomichi Toyama

引用次数: 0

Hierarchical embedding for DAG reachability queries DAG可达性查询的分层嵌入

Proceedings of the 24th Symposium on International Database Engineering & Applications Pub Date : 2020-08-12 DOI: 10.1145/3410566.3410583

Giacomo Bergami, Flavio Bertini, D. Montesi

引用次数: 3

Detecting fake news by image analysis 基于图像分析的假新闻检测

Proceedings of the 24th Symposium on International Database Engineering & Applications Pub Date : 2020-08-12 DOI: 10.1145/3410566.3410599

E. Masciari, V. Moscato, A. Picariello, Giancarlo Sperlí

引用次数: 9

Pandemic and big tech 流行病和大型科技公司

Proceedings of the 24th Symposium on International Database Engineering & Applications Pub Date : 2020-08-12 DOI: 10.1145/3410566.3410585

B. Desai

引用次数: 5

A novel spatio-temporal interpolation algorithm and its application to the COVID-19 pandemic 一种新型时空插值算法及其在COVID-19大流行中的应用

Proceedings of the 24th Symposium on International Database Engineering & Applications Pub Date : 2020-08-12 DOI: 10.1145/3410566.3410602

Junzhe Cai, P. Revesz

引用次数: 2