Proceedings of the 24th Symposium on International Database Engineering & Applications最新文献

筛选
英文 中文
A practical application for sentiment analysis on social media textual data 情感分析在社交媒体文本数据中的实际应用
Colton Aarts, Fan Jiang, Liang Chen
{"title":"A practical application for sentiment analysis on social media textual data","authors":"Colton Aarts, Fan Jiang, Liang Chen","doi":"10.1145/3410566.3410594","DOIUrl":"https://doi.org/10.1145/3410566.3410594","url":null,"abstract":"With the amount of data that is available today in textual form, it is essential to be able to extract as much useful information as possible from them. While some textual documents are easy to be understood, other textual documents may need extra processes to discover the hidden information within it. For instance, how the author was feeling while writing this piece of text, or what emotions authors are expressing in this piece of text. The idea of discovering what emotions are expressed in a textual document is known as sentiment analysis. The interest in sentiment analysis has been steadily growing in the past decade. Being able to accurately detect and measure the different emotions present in a text has become more and more useful as the availability of online resources has increased. These resources can range from product reviews to social media content. Each of these resources presents their own distinct challenges while still sharing the core techniques and procedures. In this paper, we introduce an application that can detect four distinct emotions from social media posts. We will first outline the techniques we have used as well as our outcomes, then discuss the challenges that we faced, and finally, our proposed solutions for the continuation of this project.","PeriodicalId":137708,"journal":{"name":"Proceedings of the 24th Symposium on International Database Engineering & Applications","volume":"298 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123122700","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Organizing and compressing collections of files using differences 使用差异组织和压缩文件集合
S. Chawathe
{"title":"Organizing and compressing collections of files using differences","authors":"S. Chawathe","doi":"10.1145/3410566.3410584","DOIUrl":"https://doi.org/10.1145/3410566.3410584","url":null,"abstract":"A collection of related files often exhibits strong similarities among its constituents. These similarities, and the dual differences, may be used for both compressing the collection and for organizing it in a manner that reveals human-readable structure and relationships. This paper motivates and studies methods for such organizing and compression of file collections using inter-file differences. It presents an algorithm based on computing a minimum-weight spanning tree of a graph that has vertices corresponding to files and edges with weights corresponding to the size of the difference between the documents of its incident vertices. It describes the design and implementation of a prototype system called diboc (for difference-based organization and compression) that uses these methods to enable both compression and graphical organization and interactive exploration of a file collection. It illustrates the benefits of this system by presenting examples of its operation on a widely deployed and publicly available corpus of file collections (collections of PPD files used to configure the CUPS printing system as packaged by the Debian GNU/Linux distribution). In addition to these qualitative measures, some quantitative experimental results of applying the methods to the same corpus are also presented.","PeriodicalId":137708,"journal":{"name":"Proceedings of the 24th Symposium on International Database Engineering & Applications","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116033818","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Empowering big data analytics with polystore and strongly typed functional queries 通过多存储和强类型功能查询增强大数据分析能力
Annabelle Gillet, É. Leclercq, M. Savonnet, N. Cullot
{"title":"Empowering big data analytics with polystore and strongly typed functional queries","authors":"Annabelle Gillet, É. Leclercq, M. Savonnet, N. Cullot","doi":"10.1145/3410566.3410591","DOIUrl":"https://doi.org/10.1145/3410566.3410591","url":null,"abstract":"Polystores are of primary importance to tackle the diversity and the volume of Big Data, as they propose to store data according to specific use cases. Nevertheless, analytics frameworks often lack a uniform interface allowing to fully access and take advantage of the various models offered by the polystore. It also should be ensured that the typing of the algebraic expressions built with data manipulation operators can be checked and that schema can be inferred before starting to execute the operators (type-safe). Tensors are good candidates for supporting a pivot data model. They are powerful abstract mathematical objects which can embed complex relationships between entities and that are used in major analytics frameworks. However, they are far away from data models, and lack high level operators to manipulate their content, resulting in bad coding habits and less maintainability, and sometimes poor performances. With TDM (Tensor Data Model), we propose to join the best of both worlds, to take advantage of modeling capabilities of tensors by adding schema and data manipulation operators to them. We developed an implementation in Scala using Spark, providing users with a type-safe and schema inference mechanism that guarantees the technical and functional correctness of composed expressions on tensors at compile time. We show that this extension does not induce overhead and allows to outperform Spark query optimizer using bind join.","PeriodicalId":137708,"journal":{"name":"Proceedings of the 24th Symposium on International Database Engineering & Applications","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116160887","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Towards a universal approach for semantic interpretation of spreadsheets data 迈向电子表格数据语义解释的通用方法
N. Dorodnykh, A. Y. Yurin
{"title":"Towards a universal approach for semantic interpretation of spreadsheets data","authors":"N. Dorodnykh, A. Y. Yurin","doi":"10.1145/3410566.3410609","DOIUrl":"https://doi.org/10.1145/3410566.3410609","url":null,"abstract":"Spreadsheets are a popular way to represent and structure data and knowledge; in this connection semantic interpretation of spreadsheets data has become an active area of scientific research. In this paper, we propose a new approach for semantic interpretation of data extracted from spreadsheets with arbitrary layouts and styles. Analyzed spreadsheets are presented in the MS Excel format. In particular, our approach includes two stages: analyzing and transforming source spreadsheets to spreadsheets in a relational canonicalized form; annotating canonical spreadsheets by entities from a knowledge graph. At the first stage we use a rule-based approach implemented in the form of a domain-specific language called Cells Rule Language (CRL), and an original form of a canonical table. At the second stage we use an aggregated method for defining similarity between candidate entities and cell values that consists of the sequential application of five metrics and combining ranks obtained by each metric. Algorithms of each stage are implemented in the form of special software: TabbyXL and TabbyLD respectively. DBpedia is used as a knowledge graph. Experimental evaluations of our proposals are obtained for T2Dv2 and Troy200 corpuses, and they demonstrates the applicability of our approach and software for semantic spreadsheet data interpretation. The feature of the approach is its universality due to the use of the language for describing spreadsheets transformation rules, as well as an original canonical form. This feature provides processing large volumes of heterogeneous spreadsheets in various domains. This work is a part of the Tabby research project for software development of recognition, extraction, transformation and interpretation of data from spreadsheet tables with arbitrary layouts and styles.","PeriodicalId":137708,"journal":{"name":"Proceedings of the 24th Symposium on International Database Engineering & Applications","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128004340","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Lifting preferences to the semantic web: PreferenceSPARQL 将首选项提升到语义web: PreferenceSPARQL
M. Endres, Stefan Schödel, Klaus Emathinger
{"title":"Lifting preferences to the semantic web: PreferenceSPARQL","authors":"M. Endres, Stefan Schödel, Klaus Emathinger","doi":"10.1145/3410566.3410590","DOIUrl":"https://doi.org/10.1145/3410566.3410590","url":null,"abstract":"PreferenceSQL is an SQL extension for standard relational databases supporting soft constraints and is used to find relevant data intuitively. Meanwhile, the Semantic Web has interoperability advantages and helps to retrieve information with machine-readable data. We use the benefits of both technologies by combining preferences from SQL with SPARQL, the query language of the Semantic Web. This work provides implementation details in Apache Jena for the new composite called 'PreferenceSPARQL'. Furthermore, we contribute comprehensive benchmarks that show which preference algorithm is best suited for our approach.","PeriodicalId":137708,"journal":{"name":"Proceedings of the 24th Symposium on International Database Engineering & Applications","volume":"2013 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127310786","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Implementation of dynamic page generation for stream data by SuperSQL 用SuperSQL实现流数据的动态页面生成
Keita Terui, Kento Goto, Motomichi Toyama
{"title":"Implementation of dynamic page generation for stream data by SuperSQL","authors":"Keita Terui, Kento Goto, Motomichi Toyama","doi":"10.1145/3410566.3410607","DOIUrl":"https://doi.org/10.1145/3410566.3410607","url":null,"abstract":"SuperSQL is an extension of SQL that allows you to structure the output of relational databases by writing your own queries and to express various layouts. However, this method is not suitable for data with high update frequency, such as stream data, because the information in the database refers to the data at the time of SuperSQL execution. In this study, we propose an implementation of a web page generation function that asynchronously updates a web page with the latest information for frequently updated data, using PipelineDB and SuperSQL, both of which are DBMSs capable of processing streams. You can specify the dynamic part of the stream by specifying the stream in the \"decorator\" which is a feature of SuperSQL. At the same time, you can specify \"pull\" and \"push\" in the stream decorator to select how the dynamic part is updated. This makes it possible to create a web page that displays the latest stock prices at any time in a page that displays a list of stock prices.","PeriodicalId":137708,"journal":{"name":"Proceedings of the 24th Symposium on International Database Engineering & Applications","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127353452","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hierarchical embedding for DAG reachability queries DAG可达性查询的分层嵌入
Giacomo Bergami, Flavio Bertini, D. Montesi
{"title":"Hierarchical embedding for DAG reachability queries","authors":"Giacomo Bergami, Flavio Bertini, D. Montesi","doi":"10.1145/3410566.3410583","DOIUrl":"https://doi.org/10.1145/3410566.3410583","url":null,"abstract":"Current hierarchical embeddings are inaccurate in both reconstructing the original taxonomy and answering reachability queries over Direct Acyclic Graph. In this paper, we propose a new hierarchical embedding, the Euclidean Embedding (EE), that is correct by design due to its mathematical formulation and associated lemmas. Such embedding can be constructed during the visit of a taxonomy, thus making it faster to generate if compared to other learning-based embeddings. After proposing a novel set of metrics for determining the embedding accuracy with respect to the reachability queries, we compare our proposed embedding with state-of-the-art approaches using full trees from 3 to 1555 nodes and over a real-world Direct Acyclic Graph of 1170 nodes. The benchmark shows that EE outperforms our competitors in both accuracy and efficiency.","PeriodicalId":137708,"journal":{"name":"Proceedings of the 24th Symposium on International Database Engineering & Applications","volume":"84 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121066684","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Detecting fake news by image analysis 基于图像分析的假新闻检测
E. Masciari, V. Moscato, A. Picariello, Giancarlo Sperlí
{"title":"Detecting fake news by image analysis","authors":"E. Masciari, V. Moscato, A. Picariello, Giancarlo Sperlí","doi":"10.1145/3410566.3410599","DOIUrl":"https://doi.org/10.1145/3410566.3410599","url":null,"abstract":"The uncontrolled growth of fake news creation and dissemination we observed in recent years causes continuous threats to democracy, justice, and public trust. This problem has significantly driven the effort of both academia and industries for developing more accurate fake news detection strategies. Early detection of fake news is crucial, however the availability of information about news propagation is limited. Moreover, it has been shown that people tend to believe more fake news due to their features [10]. In this paper, we present our framework for fake news detection and we discuss in detail an approach based on deep learning that we implemented by using Google Bert features. Our experiments conducted on two well-known and widely used real-world datasets suggest that our method can outperform the state-of-the-art approaches and allows fake news accurate detection, even in the case of limited content information.","PeriodicalId":137708,"journal":{"name":"Proceedings of the 24th Symposium on International Database Engineering & Applications","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133822267","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Pandemic and big tech 流行病和大型科技公司
B. Desai
{"title":"Pandemic and big tech","authors":"B. Desai","doi":"10.1145/3410566.3410585","DOIUrl":"https://doi.org/10.1145/3410566.3410585","url":null,"abstract":"Having been an observer and user of computing devices from slide rules, analog computers, early monstrous digital machines, to sleek, hand held digital ones: seeing the shift of the computing and data 'ownership' paradigms over the last six decades one wonders at the enormous size, power and market capitalization of a fistful of companies that have existed for only a couple of decades. Now the world is groaning under the corona virus pandemic mismanaged by most governments, health officers and organizations. Are these not perfect examples, ad- infinitum of the Peter principle? At the same time big tech is benefiting from the pandemic and preparing to take a central role to harvest more data, to be mined in the future for more revenue streams. This paper looks at the recent push by big tech to push its agenda to reach into all aspects of human life. The current opportunity presented by the Covid-19 pandemic and the fear of future pandemics is being seized to lay the ground work, at the public's expense and their privacy.","PeriodicalId":137708,"journal":{"name":"Proceedings of the 24th Symposium on International Database Engineering & Applications","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134106028","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
A novel spatio-temporal interpolation algorithm and its application to the COVID-19 pandemic 一种新型时空插值算法及其在COVID-19大流行中的应用
Junzhe Cai, P. Revesz
{"title":"A novel spatio-temporal interpolation algorithm and its application to the COVID-19 pandemic","authors":"Junzhe Cai, P. Revesz","doi":"10.1145/3410566.3410602","DOIUrl":"https://doi.org/10.1145/3410566.3410602","url":null,"abstract":"This paper describes several interpolation methods for predicting the number of cases of the COVID-19 pandemic. The interpolation methods include some well-known temporal interpolation algorithms including Lagrange interpolation, cubic spline interpolation, and exponential decay interpolation. These temporal interpolation algorithms enable the interpolation of the COVID-19 cases at locations where measures on prior days are available. However, pandemics are not purely temporal but spatio-temporal phenomena. Therefore, the neighboring locations need to be considered too in order to derive accurate interpolation values for future days. This paper introduces a novel spatio-temporal interpolation algorithm that is shown to be better than any purely temporal interpolation algorithm in predicting the COVID-19 cases in the continental United States. In particular, the novel spatio-temporal interpolation method achieves a mean absolute error of 8.44 cases over a million people when predicting two days ahead the number of cases of the COVID-19 pandemic.","PeriodicalId":137708,"journal":{"name":"Proceedings of the 24th Symposium on International Database Engineering & Applications","volume":"97 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134514669","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信