Proceedings of the 25th International Database Engineering & Applications Symposium最新文献

Viral pneumonia images classification by Multiple Instance Learning: preliminary results 基于多实例学习的病毒性肺炎图像分类:初步结果

Proceedings of the 25th International Database Engineering & Applications Symposium Pub Date : 2021-07-14 DOI: 10.1145/3472163.3472170

E. Zumpano, A. Fuduli, E. Vocaturo, Matteo Avolio

{"title":"Viral pneumonia images classification by Multiple Instance Learning: preliminary results","authors":"E. Zumpano, A. Fuduli, E. Vocaturo, Matteo Avolio","doi":"10.1145/3472163.3472170","DOIUrl":"https://doi.org/10.1145/3472163.3472170","url":null,"abstract":"At the end of 2019, the World Health Organization (WHO) referred that the Public Health Commission of Hubei Province, China, reported cases of severe and unknown pneumonia, characterized by fever, malaise, dry cough, dyspnoea and respiratory failure, which occurred in the urban area of Wuhan. A new coronavirus, SARS-CoV-2, was identified as responsible for the lung infection, now called COVID-19 (coronavirus disease 2019). Since then there has been an exponential growth of infections and at the beginning of March 2020 the WHO declared the epidemic a global emergency. An early diagnosis of those carrying the virus becomes crucial to contain the spread, morbidity and mortality of the pandemic. The definitive diagnosis is made through specific tests, among which imaging tests play an important role in the care path of the patient with suspected or confirmed COVID-19. Patients with serious COVID-19 typically experience viral pneumonia. In this paper we launch the idea to use the Multiple Instance Learning paradigm to classify pneumonia X-ray images, considering three different classes: radiographies of healthy people, radiographies of people with bacterial pneumonia and of people with viral pneumonia. The proposed algorithms, which are very fast in practice, appear promising especially if we take into account that no preprocessing technique has been used.","PeriodicalId":242683,"journal":{"name":"Proceedings of the 25th International Database Engineering & Applications Symposium","volume":"83 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124945308","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 20

Rigorous Measurement Model for Validity of Big Data: MEGA Approach 大数据有效性的严格度量模型:MEGA方法

Proceedings of the 25th International Database Engineering & Applications Symposium Pub Date : 2021-07-14 DOI: 10.1145/3472163.3472171

Dave Bhardwaj, O. Ormandjieva

{"title":"Rigorous Measurement Model for Validity of Big Data: MEGA Approach","authors":"Dave Bhardwaj, O. Ormandjieva","doi":"10.1145/3472163.3472171","DOIUrl":"https://doi.org/10.1145/3472163.3472171","url":null,"abstract":"Big Data is becoming a substantial part of the decision-making processes in both industry and academia, especially in areas where Big Data may have a profound impact on businesses and society. However, as more data is being processed, data quality is becoming a genuine issue that negatively affects credibility of the systems we build because of the lack of visibility and transparency of the underlying data. Therefore, Big Data quality measurement is becoming increasingly necessary in assessing whether data can serve its purpose in a particular context (such as Big Data analytics, for example). This research addresses Big Data quality measurement modelling and automation by proposing a novel quality measurement framework for Big Data (MEGA) that objectively assesses the underlying quality characteristics of Big Data (also known as the V's of Big Data) at each step of the Big Data Pipelines. Five of the Big Data V's (Volume, Variety, Velocity, Veracity and Validity) are currently automated by the MEGA framework. In this paper, a new theoretically valid quality measurement model is proposed for an essential quality characteristic of Big Data, called Validity. The proposed measurement information model for Validity of Big Data is a hierarchy of 4 derived measures / indicators and 5 based measures. Validity measurement is illustrated on a running example.","PeriodicalId":242683,"journal":{"name":"Proceedings of the 25th International Database Engineering & Applications Symposium","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114934789","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Measuring Quality of Workers by Goodness-of-Fit of Machine Learning Model in Crowdsourcing 用众包中机器学习模型的拟合优度衡量员工素质

Proceedings of the 25th International Database Engineering & Applications Symposium Pub Date : 2021-07-14 DOI: 10.1145/3472163.3472279

Yumiko Suzuki

{"title":"Measuring Quality of Workers by Goodness-of-Fit of Machine Learning Model in Crowdsourcing","authors":"Yumiko Suzuki","doi":"10.1145/3472163.3472279","DOIUrl":"https://doi.org/10.1145/3472163.3472279","url":null,"abstract":"In this paper, we propose a method for predicting the quality of crowdsourcing workers using the goodness-of-fit (GoF) of machine learning models. We assume a relationship between the quality of workers and the quality of machine-learning models using the outcomes of the workers as training data. This assumption means that if worker quality is high, a machine-learning classifier constructed using the worker’s outcomes can easily predict the outcomes of the worker. If this assumption is confirmed, we can measure the worker quality without using the correct answer sets, and then the requesters can reduce the time and effort. However, if the outcomes by workers are low quality, the input tweet does not correspond to the outcomes. Therefore, if we construct a tweet classifier using input tweets and the classified results by the worker, the prediction of the outcomes by the classifier and that by the workers should differ. We assume that the GoF scores, such as accuracy and F1 scores of the test set using this classifier, correlates to worker quality. Therefore, we can predict worker quality using the GoF scores. In our experiment, we did the tweet classification task using crowdsourcing. We confirmed that the GoF scores and the quality of workers correlate. These results show that we can predict the quality of workers using the GoF scores.","PeriodicalId":242683,"journal":{"name":"Proceedings of the 25th International Database Engineering & Applications Symposium","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121529149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Categorical Management of Multi-Model Data 多模型数据的分类管理

Proceedings of the 25th International Database Engineering & Applications Symposium Pub Date : 2021-07-14 DOI: 10.1145/3472163.3472166

I. Holubová, Pavel Contos, M. Svoboda

引用次数: 2

Colonization of the Internet 互联网的殖民化

Proceedings of the 25th International Database Engineering & Applications Symposium Pub Date : 2021-07-14 DOI: 10.1145/3472163.3472179

B. Desai

{"title":"Colonization of the Internet","authors":"B. Desai","doi":"10.1145/3472163.3472179","DOIUrl":"https://doi.org/10.1145/3472163.3472179","url":null,"abstract":"The internet was introduced to connect computers and allow communication between these computers. It evolved to provide applications such as email, talk and file sharing with the associated system to search. The files were made available, freely, by users. However, the internet was out of the reach of most people since it required equipment and know-how as well as connection to a computer on the internet. One method of connection used an acoustic coupler and an analog phone. With the introduction of the personal computer and higher speed modems, accessing the internet became easier. The introduction of user-friendly graphical interfaces, as well as the convenience and portablility of laptops and smartphones made the internet much more widely accessible for a broad swath of users. A small number of newly established companies, supported by a large amount of venture capital and a lack of regulation have since established a stranglehold on the internet with billions of people using these applications. Their monopolistic practices and exploitation of the open nature of the internet has created a need in the ordinary person to replace the traditional way of communication with what they provide: in exchange for giving up personal information these persons have become dependent on the service provided. Due to the regulatory desert around privacy and ownership of personal electornic data, a handful of massive corporations have expropriated and exploited aggregated and disaggregated personal information. This amounts, we argue, to the colonization of the internet.","PeriodicalId":242683,"journal":{"name":"Proceedings of the 25th International Database Engineering & Applications Symposium","volume":"29 5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116349038","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Looking for Jobs? Matching Adults with Autism with Potential Employers for Job Opportunities 找工作?将自闭症成人与潜在雇主配对以获得工作机会

Proceedings of the 25th International Database Engineering & Applications Symposium Pub Date : 2021-07-14 DOI: 10.1145/3472163.3472270

Joseph Thomas Bills, Yiu-Kai Ng

引用次数: 3

Analysis-oriented Metadata for Data Lakes 面向分析的数据湖元数据

Proceedings of the 25th International Database Engineering & Applications Symposium Pub Date : 2021-07-14 DOI: 10.1145/3472163.3472273

Yan Zhao, F. Ravat, Julien Aligon, C. Soulé-Dupuy, Gabriel Ferrettini, I. Megdiche

{"title":"Analysis-oriented Metadata for Data Lakes","authors":"Yan Zhao, F. Ravat, Julien Aligon, C. Soulé-Dupuy, Gabriel Ferrettini, I. Megdiche","doi":"10.1145/3472163.3472273","DOIUrl":"https://doi.org/10.1145/3472163.3472273","url":null,"abstract":"Data lakes are supposed to enable analysts to perform more efficient and efficacious data analysis by crossing multiple existing data sources, processes and analyses. However, it is impossible to achieve that when a data lake does not have a metadata governance system that progressively capitalizes on all the performed analysis experiments. The objective of this paper is to have an easily accessible, reusable data lake that capitalizes on all user experiences. To meet this need, we propose an analysis-oriented metadata model for data lakes. This model includes the descriptive information of datasets and their attributes, as well as all metadata related to the machine learning analyzes performed on these datasets. To illustrate our metadata solution, we implemented an application of data lake metadata management. This application allows users to find and use existing data, processes and analyses by searching relevant metadata stored in a NoSQL data store within the data lake. To demonstrate how to easily discover metadata with the application, we present two use cases, with real data, including datasets similarity detection and machine learning guidance.","PeriodicalId":242683,"journal":{"name":"Proceedings of the 25th International Database Engineering & Applications Symposium","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121057433","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Customized Eager-Lazy Data Cleansing for Satisfactory Big Data Veracity 定制急懒数据清理，满足大数据真实性

Proceedings of the 25th International Database Engineering & Applications Symposium Pub Date : 2021-07-14 DOI: 10.1145/3472163.3472195

S. Sahri, Rim Moussa

引用次数: 3

COVID-19 Concerns in US: Topic Detection in Twitter 美国对COVID-19的担忧:推特上的话题检测

Proceedings of the 25th International Database Engineering & Applications Symposium Pub Date : 2021-07-14 DOI: 10.1145/3472163.3472169

C. Comito

{"title":"COVID-19 Concerns in US: Topic Detection in Twitter","authors":"C. Comito","doi":"10.1145/3472163.3472169","DOIUrl":"https://doi.org/10.1145/3472163.3472169","url":null,"abstract":"COVID-19 pandemic is affecting the lives of the citizens worldwide. Epidemiologists, policy makers and clinicians need to understand public concerns and sentiment to make informed decisions and adopt preventive and corrective measures to avoid critical situations. In the last few years, social media become a tool for spreading the news, discussing ideas and comments on world events. In this context, social media plays a key role since represents one of the main source to extract insight into public opinion and sentiment. In particular, Twitter has been already recognized as an important source of health-related information, given the amount of news, opinions and information that is shared by both citizens and official sources. However, it is a challenging issue identifying interesting and useful content from large and noisy text-streams. The study proposed in the paper aims to extract insight from Twitter by detecting the most discussed topics regarding COVID-19. The proposed approach combines peak detection and clustering techniques. Tweets features are first modeled as time series. After that, peaks are detected from the time series, and peaks of textual features are clustered based on the co-occurrence in the tweets. Results, performed over real-world datasets of tweets related to COVID-19 in US, show that the proposed approach is able to accurately detect several relevant topics of interest, spanning from health status and symptoms, to government policy, economic crisis, COVID-19-related updates, prevention, vaccines and treatments.","PeriodicalId":242683,"journal":{"name":"Proceedings of the 25th International Database Engineering & Applications Symposium","volume":"93 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131487415","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Explainable Data Analytics for Disease and Healthcare Informatics 疾病和医疗保健信息学的可解释数据分析

Proceedings of the 25th International Database Engineering & Applications Symposium Pub Date : 2021-07-14 DOI: 10.1145/3472163.3472175

C. Leung, Daryl L. X. Fung, Daniel Mai, Qi Wen, Jason Tran, Joglas Souza

{"title":"Explainable Data Analytics for Disease and Healthcare Informatics","authors":"C. Leung, Daryl L. X. Fung, Daniel Mai, Qi Wen, Jason Tran, Joglas Souza","doi":"10.1145/3472163.3472175","DOIUrl":"https://doi.org/10.1145/3472163.3472175","url":null,"abstract":"With advancements in technology, huge volumes of valuable data have been generated and collected at a rapid velocity from a wide variety of rich data sources. Examples of these valuable data include healthcare and disease data such as privacy-preserving statistics on patients who suffered from diseases like the coronavirus disease 2019 (COVID-19). Analyzing these data can be for social good. For instance, data analytics on the healthcare and disease data often leads to the discovery of useful information and knowledge about the disease. Explainable artificial intelligence (XAI) further enhances the interpretability of the discovered knowledge. Consequently, the explainable data analytics helps people to get a better understanding of the disease, which may inspire them to take part in preventing, detecting, controlling and combating the disease. In this paper, we present an explainable data analytics system for disease and healthcare informatics. Our system consists of two key components. The predictor component analyzes and mines historical disease and healthcare data for making predictions on future data. Although huge volumes of disease and healthcare data have been generated, volumes of available data may vary partially due to privacy concerns. So, the predictor makes predictions with different methods. It uses random forest With sufficient data and neural network-based few-shot learning (FSL) with limited data. The explainer component provides the general model reasoning and a meaningful explanation for specific predictions. As a database engineering application, we evaluate our system by applying it to real-life COVID-19 data. Evaluation results show the practicality of our system in explainable data analytics for disease and healthcare informatics.","PeriodicalId":242683,"journal":{"name":"Proceedings of the 25th International Database Engineering & Applications Symposium","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133687269","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13