Proceedings of the 25th International Database Engineering & Applications Symposium最新文献

筛选
英文 中文
Viral pneumonia images classification by Multiple Instance Learning: preliminary results 基于多实例学习的病毒性肺炎图像分类:初步结果
E. Zumpano, A. Fuduli, E. Vocaturo, Matteo Avolio
{"title":"Viral pneumonia images classification by Multiple Instance Learning: preliminary results","authors":"E. Zumpano, A. Fuduli, E. Vocaturo, Matteo Avolio","doi":"10.1145/3472163.3472170","DOIUrl":"https://doi.org/10.1145/3472163.3472170","url":null,"abstract":"At the end of 2019, the World Health Organization (WHO) referred that the Public Health Commission of Hubei Province, China, reported cases of severe and unknown pneumonia, characterized by fever, malaise, dry cough, dyspnoea and respiratory failure, which occurred in the urban area of Wuhan. A new coronavirus, SARS-CoV-2, was identified as responsible for the lung infection, now called COVID-19 (coronavirus disease 2019). Since then there has been an exponential growth of infections and at the beginning of March 2020 the WHO declared the epidemic a global emergency. An early diagnosis of those carrying the virus becomes crucial to contain the spread, morbidity and mortality of the pandemic. The definitive diagnosis is made through specific tests, among which imaging tests play an important role in the care path of the patient with suspected or confirmed COVID-19. Patients with serious COVID-19 typically experience viral pneumonia. In this paper we launch the idea to use the Multiple Instance Learning paradigm to classify pneumonia X-ray images, considering three different classes: radiographies of healthy people, radiographies of people with bacterial pneumonia and of people with viral pneumonia. The proposed algorithms, which are very fast in practice, appear promising especially if we take into account that no preprocessing technique has been used.","PeriodicalId":242683,"journal":{"name":"Proceedings of the 25th International Database Engineering & Applications Symposium","volume":"83 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124945308","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
Rigorous Measurement Model for Validity of Big Data: MEGA Approach 大数据有效性的严格度量模型:MEGA方法
Dave Bhardwaj, O. Ormandjieva
{"title":"Rigorous Measurement Model for Validity of Big Data: MEGA Approach","authors":"Dave Bhardwaj, O. Ormandjieva","doi":"10.1145/3472163.3472171","DOIUrl":"https://doi.org/10.1145/3472163.3472171","url":null,"abstract":"Big Data is becoming a substantial part of the decision-making processes in both industry and academia, especially in areas where Big Data may have a profound impact on businesses and society. However, as more data is being processed, data quality is becoming a genuine issue that negatively affects credibility of the systems we build because of the lack of visibility and transparency of the underlying data. Therefore, Big Data quality measurement is becoming increasingly necessary in assessing whether data can serve its purpose in a particular context (such as Big Data analytics, for example). This research addresses Big Data quality measurement modelling and automation by proposing a novel quality measurement framework for Big Data (MEGA) that objectively assesses the underlying quality characteristics of Big Data (also known as the V's of Big Data) at each step of the Big Data Pipelines. Five of the Big Data V's (Volume, Variety, Velocity, Veracity and Validity) are currently automated by the MEGA framework. In this paper, a new theoretically valid quality measurement model is proposed for an essential quality characteristic of Big Data, called Validity. The proposed measurement information model for Validity of Big Data is a hierarchy of 4 derived measures / indicators and 5 based measures. Validity measurement is illustrated on a running example.","PeriodicalId":242683,"journal":{"name":"Proceedings of the 25th International Database Engineering & Applications Symposium","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114934789","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Measuring Quality of Workers by Goodness-of-Fit of Machine Learning Model in Crowdsourcing 用众包中机器学习模型的拟合优度衡量员工素质
Yumiko Suzuki
{"title":"Measuring Quality of Workers by Goodness-of-Fit of Machine Learning Model in Crowdsourcing","authors":"Yumiko Suzuki","doi":"10.1145/3472163.3472279","DOIUrl":"https://doi.org/10.1145/3472163.3472279","url":null,"abstract":"In this paper, we propose a method for predicting the quality of crowdsourcing workers using the goodness-of-fit (GoF) of machine learning models. We assume a relationship between the quality of workers and the quality of machine-learning models using the outcomes of the workers as training data. This assumption means that if worker quality is high, a machine-learning classifier constructed using the worker’s outcomes can easily predict the outcomes of the worker. If this assumption is confirmed, we can measure the worker quality without using the correct answer sets, and then the requesters can reduce the time and effort. However, if the outcomes by workers are low quality, the input tweet does not correspond to the outcomes. Therefore, if we construct a tweet classifier using input tweets and the classified results by the worker, the prediction of the outcomes by the classifier and that by the workers should differ. We assume that the GoF scores, such as accuracy and F1 scores of the test set using this classifier, correlates to worker quality. Therefore, we can predict worker quality using the GoF scores. In our experiment, we did the tweet classification task using crowdsourcing. We confirmed that the GoF scores and the quality of workers correlate. These results show that we can predict the quality of workers using the GoF scores.","PeriodicalId":242683,"journal":{"name":"Proceedings of the 25th International Database Engineering & Applications Symposium","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121529149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Categorical Management of Multi-Model Data 多模型数据的分类管理
I. Holubová, Pavel Contos, M. Svoboda
{"title":"Categorical Management of Multi-Model Data","authors":"I. Holubová, Pavel Contos, M. Svoboda","doi":"10.1145/3472163.3472166","DOIUrl":"https://doi.org/10.1145/3472163.3472166","url":null,"abstract":"In this vision paper, we introduce an idea of a framework that would enable us to model, represent, and manage multi-model data in a unified and abstract way. Its core idea exploits constructs provided by category theory, which is sufficiently general but still simple enough to cover any of the logical data models used in contemporary databases. Focusing on promising features and taking into account mature and verified principles, we overview the key parts of the framework and outline open questions and research directions that need to be further investigated. The ultimate objective is to pursue the idea of a self-tuning system that would permit us to collapse the traditionally understood conceptual and logical layers into just a single model allowing for unified handling of schemas, data instances, as well as queries.","PeriodicalId":242683,"journal":{"name":"Proceedings of the 25th International Database Engineering & Applications Symposium","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130784257","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Colonization of the Internet 互联网的殖民化
B. Desai
{"title":"Colonization of the Internet","authors":"B. Desai","doi":"10.1145/3472163.3472179","DOIUrl":"https://doi.org/10.1145/3472163.3472179","url":null,"abstract":"The internet was introduced to connect computers and allow communication between these computers. It evolved to provide applications such as email, talk and file sharing with the associated system to search. The files were made available, freely, by users. However, the internet was out of the reach of most people since it required equipment and know-how as well as connection to a computer on the internet. One method of connection used an acoustic coupler and an analog phone. With the introduction of the personal computer and higher speed modems, accessing the internet became easier. The introduction of user-friendly graphical interfaces, as well as the convenience and portablility of laptops and smartphones made the internet much more widely accessible for a broad swath of users. A small number of newly established companies, supported by a large amount of venture capital and a lack of regulation have since established a stranglehold on the internet with billions of people using these applications. Their monopolistic practices and exploitation of the open nature of the internet has created a need in the ordinary person to replace the traditional way of communication with what they provide: in exchange for giving up personal information these persons have become dependent on the service provided. Due to the regulatory desert around privacy and ownership of personal electornic data, a handful of massive corporations have expropriated and exploited aggregated and disaggregated personal information. This amounts, we argue, to the colonization of the internet.","PeriodicalId":242683,"journal":{"name":"Proceedings of the 25th International Database Engineering & Applications Symposium","volume":"29 5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116349038","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Looking for Jobs? Matching Adults with Autism with Potential Employers for Job Opportunities 找工作?将自闭症成人与潜在雇主配对以获得工作机会
Joseph Thomas Bills, Yiu-Kai Ng
{"title":"Looking for Jobs? Matching Adults with Autism with Potential Employers for Job Opportunities","authors":"Joseph Thomas Bills, Yiu-Kai Ng","doi":"10.1145/3472163.3472270","DOIUrl":"https://doi.org/10.1145/3472163.3472270","url":null,"abstract":"Adults with autism face many difficulties when finding employment, such as struggling with interviews and needing accommodating environments for sensory issues. Autistic adults, however, also have unique skills to contribute to the workplace that companies have recently started to seek after, such as loyalty, close attention to detail, and trustworthiness. To work around these difficulties and help companies find the talent they are looking for we have developed a job-matching system. Our system is based around the stable matching of the Gale-Shapley algorithm to match autistic adults with employers after estimating how both adults with autism and employers would rank the other group. The system also uses filtering to approximate a stable matching even with a changing pool of users and employers, meaning the results are resistant to change as the result of competition. Such a system would be of benefit to both adults with autism and employers and would advance knowledge in recommender systems that match two parties.","PeriodicalId":242683,"journal":{"name":"Proceedings of the 25th International Database Engineering & Applications Symposium","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127969115","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Analysis-oriented Metadata for Data Lakes 面向分析的数据湖元数据
Yan Zhao, F. Ravat, Julien Aligon, C. Soulé-Dupuy, Gabriel Ferrettini, I. Megdiche
{"title":"Analysis-oriented Metadata for Data Lakes","authors":"Yan Zhao, F. Ravat, Julien Aligon, C. Soulé-Dupuy, Gabriel Ferrettini, I. Megdiche","doi":"10.1145/3472163.3472273","DOIUrl":"https://doi.org/10.1145/3472163.3472273","url":null,"abstract":"Data lakes are supposed to enable analysts to perform more efficient and efficacious data analysis by crossing multiple existing data sources, processes and analyses. However, it is impossible to achieve that when a data lake does not have a metadata governance system that progressively capitalizes on all the performed analysis experiments. The objective of this paper is to have an easily accessible, reusable data lake that capitalizes on all user experiences. To meet this need, we propose an analysis-oriented metadata model for data lakes. This model includes the descriptive information of datasets and their attributes, as well as all metadata related to the machine learning analyzes performed on these datasets. To illustrate our metadata solution, we implemented an application of data lake metadata management. This application allows users to find and use existing data, processes and analyses by searching relevant metadata stored in a NoSQL data store within the data lake. To demonstrate how to easily discover metadata with the application, we present two use cases, with real data, including datasets similarity detection and machine learning guidance.","PeriodicalId":242683,"journal":{"name":"Proceedings of the 25th International Database Engineering & Applications Symposium","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121057433","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Customized Eager-Lazy Data Cleansing for Satisfactory Big Data Veracity 定制急懒数据清理,满足大数据真实性
S. Sahri, Rim Moussa
{"title":"Customized Eager-Lazy Data Cleansing for Satisfactory Big Data Veracity","authors":"S. Sahri, Rim Moussa","doi":"10.1145/3472163.3472195","DOIUrl":"https://doi.org/10.1145/3472163.3472195","url":null,"abstract":"Big data systems are becoming mainstream for big data management either for batch processing or real-time processing. In order to extract insights from data, quality issues are very important to address, particularly. A veracity assessment model is consequently needed. In this paper, we propose a model which ties quality of datasets and quality of query resultsets. We particularly examine quality issues raised by a given dataset, order attributes along their fitness for use and correlate veracity metrics to business queries. We validate our work using the open dataset NYC taxi’ trips.","PeriodicalId":242683,"journal":{"name":"Proceedings of the 25th International Database Engineering & Applications Symposium","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123778167","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
COVID-19 Concerns in US: Topic Detection in Twitter 美国对COVID-19的担忧:推特上的话题检测
C. Comito
{"title":"COVID-19 Concerns in US: Topic Detection in Twitter","authors":"C. Comito","doi":"10.1145/3472163.3472169","DOIUrl":"https://doi.org/10.1145/3472163.3472169","url":null,"abstract":"COVID-19 pandemic is affecting the lives of the citizens worldwide. Epidemiologists, policy makers and clinicians need to understand public concerns and sentiment to make informed decisions and adopt preventive and corrective measures to avoid critical situations. In the last few years, social media become a tool for spreading the news, discussing ideas and comments on world events. In this context, social media plays a key role since represents one of the main source to extract insight into public opinion and sentiment. In particular, Twitter has been already recognized as an important source of health-related information, given the amount of news, opinions and information that is shared by both citizens and official sources. However, it is a challenging issue identifying interesting and useful content from large and noisy text-streams. The study proposed in the paper aims to extract insight from Twitter by detecting the most discussed topics regarding COVID-19. The proposed approach combines peak detection and clustering techniques. Tweets features are first modeled as time series. After that, peaks are detected from the time series, and peaks of textual features are clustered based on the co-occurrence in the tweets. Results, performed over real-world datasets of tweets related to COVID-19 in US, show that the proposed approach is able to accurately detect several relevant topics of interest, spanning from health status and symptoms, to government policy, economic crisis, COVID-19-related updates, prevention, vaccines and treatments.","PeriodicalId":242683,"journal":{"name":"Proceedings of the 25th International Database Engineering & Applications Symposium","volume":"93 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131487415","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Explainable Data Analytics for Disease and Healthcare Informatics 疾病和医疗保健信息学的可解释数据分析
C. Leung, Daryl L. X. Fung, Daniel Mai, Qi Wen, Jason Tran, Joglas Souza
{"title":"Explainable Data Analytics for Disease and Healthcare Informatics","authors":"C. Leung, Daryl L. X. Fung, Daniel Mai, Qi Wen, Jason Tran, Joglas Souza","doi":"10.1145/3472163.3472175","DOIUrl":"https://doi.org/10.1145/3472163.3472175","url":null,"abstract":"With advancements in technology, huge volumes of valuable data have been generated and collected at a rapid velocity from a wide variety of rich data sources. Examples of these valuable data include healthcare and disease data such as privacy-preserving statistics on patients who suffered from diseases like the coronavirus disease 2019 (COVID-19). Analyzing these data can be for social good. For instance, data analytics on the healthcare and disease data often leads to the discovery of useful information and knowledge about the disease. Explainable artificial intelligence (XAI) further enhances the interpretability of the discovered knowledge. Consequently, the explainable data analytics helps people to get a better understanding of the disease, which may inspire them to take part in preventing, detecting, controlling and combating the disease. In this paper, we present an explainable data analytics system for disease and healthcare informatics. Our system consists of two key components. The predictor component analyzes and mines historical disease and healthcare data for making predictions on future data. Although huge volumes of disease and healthcare data have been generated, volumes of available data may vary partially due to privacy concerns. So, the predictor makes predictions with different methods. It uses random forest With sufficient data and neural network-based few-shot learning (FSL) with limited data. The explainer component provides the general model reasoning and a meaningful explanation for specific predictions. As a database engineering application, we evaluate our system by applying it to real-life COVID-19 data. Evaluation results show the practicality of our system in explainable data analytics for disease and healthcare informatics.","PeriodicalId":242683,"journal":{"name":"Proceedings of the 25th International Database Engineering & Applications Symposium","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133687269","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信