Proceedings of the 7th ACM IKDD CoDS and 25th COMAD最新文献_第5页

Solar Energy Forecasting Using Machine Learning 利用机器学习进行太阳能预测

Proceedings of the 7th ACM IKDD CoDS and 25th COMAD Pub Date : 2020-01-05 DOI: 10.1145/3371158.3371212

Karan Kumar, Nipun Batra

{"title":"Solar Energy Forecasting Using Machine Learning","authors":"Karan Kumar, Nipun Batra","doi":"10.1145/3371158.3371212","DOIUrl":"https://doi.org/10.1145/3371158.3371212","url":null,"abstract":"Motivation From 2010 to 2040, the world’s total energy requirement will increase by 56% [1]. Solar energy is among the largest sources of renewable energy in the world. At the current rate, by 2050, solar energy will contribute approximately 20% of the total energy requirement in the world [2]. One of the drawbacks with solar energy is its high dependence on various meteorological conditions such as temperature, humidity, cloud cover; due to which the produced energy is highly volatile and intermittent. Accurately forecasting solar energy production is an important step towards reducing reliance on non-renewable resources. Problem Statement Our aim is to accurately forecast the solar produce yt+K , K timestamps in the future given historical solar produce {y1,y2, ...,yt } and historical and forecasted meteorological data, {M1,M2, ...,Mt , ..,Mt+K }, whereM ∈ Rd corresponding to d meteorological features. Related Work Most of the existing, solar forecasting models require physical information about the solar site such as the azimuth, zenith angle, etc. [4]. Given all these parameters and meteorological conditions of that particular site, one can forecast solar production. It is not easy to collect all these physical parameters manually for a given site, and thus feasibility of such approaches is a concern. Approach There are primarily three timeseries forecasting methods: (1.) Prediction using the historical values of the variable to be forecasted only, i.e. to predictyt+K using only {y1,y2, ...,yt }, (2.) Forecasting using the external features, i.e. to predictyt+K = f (M1 t+K , ...,M d t+K ), (3.) Applying the combination of both approaches, called dynamic regression model [3]. For forecasting the solar production, we are using first and third approaches. Evaluation: 1. Dataset We are using the solar energy sampled every 20 minutes collected from 10 (four of capacity 25 KWh and six of capacity 15 KWh) different stations inside the IIT Gandhinagar campus. With the help of Dark Sky API, we collected various meteorological conditions of the site such as temperature, humidity, wind speed, cloud cover, wind bearing, and dew point. 2. Evaluation Metric We use Root Mean Squared Error (RMSE) as our evaluation metric. where ŷt indicates the forecast at time t . Although most of the prior work has been done usingMean Absolute Percentage Error (MAPE), we do not use it, since when energy production is zero (at night), MAPE is undefined.","PeriodicalId":360747,"journal":{"name":"Proceedings of the 7th ACM IKDD CoDS and 25th COMAD","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132359249","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

A System for Analysis, Visualization and Retrieval of Crime Documents 犯罪文献分析、可视化与检索系统

Proceedings of the 7th ACM IKDD CoDS and 25th COMAD Pub Date : 2020-01-05 DOI: 10.1145/3371158.3371405

Rupsa Saha, Abir Naskar, Tirthankar Dasgupta, Lipika Dey

引用次数: 1

Fairness in Algorithmic Decision Making 算法决策中的公平性

Proceedings of the 7th ACM IKDD CoDS and 25th COMAD Pub Date : 2020-01-05 DOI: 10.1145/3371158.3371234

Abhijnan Chakraborty, K. Gummadi

{"title":"Fairness in Algorithmic Decision Making","authors":"Abhijnan Chakraborty, K. Gummadi","doi":"10.1145/3371158.3371234","DOIUrl":"https://doi.org/10.1145/3371158.3371234","url":null,"abstract":"Algorithmic (data-driven) decision making is increasingly being used to assist or replace human decision making in domains with high societal impact, such as banking (estimating creditworthiness), recruiting (ranking applicants), judiciary (offender profiling) and journalism (recommending news-stories). Consequently, in recent times, multiple research works have attempted to identify (measure) bias or unfairness in algorithmic decisions and propose mechanisms to control (mitigate) such biases. In this tutorial, we introduce the related literature to the cods-comad community. Moreover, going over the more prevalent works on fairness in classification or regression tasks, we explore fairness issues in decision making scenarios, where we need to account for preferences of multiple stakeholders. Specifically, we cover our own past and ongoing works on fairness in recommendation and matching systems. We discuss the notions of fairness in these contexts and propose techniques to achieve them. Additionally, we briefly touch upon the possibility of utilizing user interface of platforms (choice architecture) to achieve fair outcomes in certain scenarios. We conclude the tutorial with a list of open questions and directions for future work.","PeriodicalId":360747,"journal":{"name":"Proceedings of the 7th ACM IKDD CoDS and 25th COMAD","volume":"133 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127366877","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Harnessing Deep Cross-lingual Word Embeddings to Infer Accurate Phylogenetic Trees 利用深度跨语言词嵌入来推断准确的系统发育树

Proceedings of the 7th ACM IKDD CoDS and 25th COMAD Pub Date : 2020-01-05 DOI: 10.1145/3371158.3371210

Yashasvi Mantha, Diptesh Kanojia, Abhijeet Dubey, P. Bhattacharyya, Malhar A. Kulkarni

引用次数: 0

Towards Designing Accurate FISH Probe Detection using 3D U-Nets on Microscopic Blood Cell Images 利用三维u网对微小血细胞图像设计精确的FISH探针检测

Proceedings of the 7th ACM IKDD CoDS and 25th COMAD Pub Date : 2020-01-05 DOI: 10.1145/3371158.3371201

Chinmay Savadikar, S. Tahvilian, L. Baden, R. Reed, D. Leventon, P. Pagano, Bhushan Garware

引用次数: 2

Forecasting the Future: Leveraging RNN based Feature Concatenation for Tweet Outbreak Prediction 预测未来:利用基于RNN的特征拼接进行Tweet爆发预测

Proceedings of the 7th ACM IKDD CoDS and 25th COMAD Pub Date : 2020-01-05 DOI: 10.1145/3371158.3371190

Saswata Roy, B. Suman, Joydeep Chandra, Sourav Kumar Dandapat

{"title":"Forecasting the Future: Leveraging RNN based Feature Concatenation for Tweet Outbreak Prediction","authors":"Saswata Roy, B. Suman, Joydeep Chandra, Sourav Kumar Dandapat","doi":"10.1145/3371158.3371190","DOIUrl":"https://doi.org/10.1145/3371158.3371190","url":null,"abstract":"Cascade outbreak is a common phenomenon observed across different social networking platforms. Cascade outbreak might have severe implications in different scenarios like a fake news/rumour can spread across a significant number of people, or a hate news can be propagated, which may incite violence etc. Early prediction of cascade outbreak would help in taking proper remedial action and hence is an important research direction. Most of the existing approaches predicted the popularity of social networking post either by machine learning techniques or using statistical models. Simple machine learning based approaches may miss important features while statistical models use hard-coded functions which might not be suitable in a different scenario. With the availability of huge data, recently deep learning based models have also been applied in the prediction of cascade outbreak. This study identified the limitation of existing deep learning based approaches and proposed a Recurrent Neural Network based Hybrid Model with Feature Concatenation (RNN-HMFC) approach. RNN-HMFC captures important latent features of textual aspect and retweet information respectively by LSTM and GRU and also uses a set of handcrafted features like additional tweet information and user social information for prediction of virality. We achieve 2.7% - 6.45% higher accuracy compared to the state of the art methods on different datasets.","PeriodicalId":360747,"journal":{"name":"Proceedings of the 7th ACM IKDD CoDS and 25th COMAD","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117282719","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Estimation of PM2.5 using satellite and meteorological data 利用卫星和气象资料估算PM2.5

Proceedings of the 7th ACM IKDD CoDS and 25th COMAD Pub Date : 2020-01-05 DOI: 10.1145/3371158.3371216

Souvik Roy, Nipun Batra, Pawan Gupta

{"title":"Estimation of PM2.5 using satellite and meteorological data","authors":"Souvik Roy, Nipun Batra, Pawan Gupta","doi":"10.1145/3371158.3371216","DOIUrl":"https://doi.org/10.1145/3371158.3371216","url":null,"abstract":"Motivation: Air pollution is measured by the amount of PM2.5 the air contains. These are fine particles with a diameter less than 2.5 micrometres that can penetrate deep into the lungs and trigger severe respiratory diseases. The concentration of PM2.5 in the air can be measured using ground-based monitoring stations, but there is a considerable deficit in the number of stations required for reliable measurements as air quality varies spatially and temporally across a given region. Given the non-trivial costs of installing and maintaining ground-based PM2.5 sensors, previous research has looked at using satellite retrievals for estimating PM2.5 data from visual features. Problem Statement: The goal is to predict PM2.5 from aerosol optical thickness (AOT), which is a measure of how much light is attenuated by the aerosols (e.g. haze, smoke particles, desert dust) as it passes the atmosphere. Previous studies have shown that higher amount of PM2.5 reduces the light transmission and increases attenuation and thereby causes higher AOT [2].We further examine the addition of the meteorological factors as predictor variables and its effect on the correlation with PM2.5.","PeriodicalId":360747,"journal":{"name":"Proceedings of the 7th ACM IKDD CoDS and 25th COMAD","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134534779","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Feature Selection for High-Dimensional Data Through Instance Vote Combining 基于实例投票组合的高维数据特征选择

Proceedings of the 7th ACM IKDD CoDS and 25th COMAD Pub Date : 2020-01-05 DOI: 10.1145/3371158.3371177

Lily Chamakura, G. Saha

{"title":"Feature Selection for High-Dimensional Data Through Instance Vote Combining","authors":"Lily Chamakura, G. Saha","doi":"10.1145/3371158.3371177","DOIUrl":"https://doi.org/10.1145/3371158.3371177","url":null,"abstract":"Supervised feature selection (FS) is used to select a discriminative and non-redundant subset of features in classification problems dealing with high dimensional inputs. In this paper, feature selection is posed akin to the set-covering problem where the goal is to select a subset of features such that they cover the instances. To solve this formulation, we quantify the local relevance (i.e., votes assigned by instances) of each feature that captures the extent to which a given feature is useful to classify the individual instances correctly. In this work, we propose to combine the instance votes across features to infer their joint local relevance. The votes are combined on the basis of geometric principles underlying classification and feature spaces. Further, we show how such instance vote combining may be employed to derive a heuristic search strategy for selecting a relevant and non-redundant subset of features. We illustrate the effectiveness of our approach by evaluating the classification performance and robustness to data variations on publicly available benchmark datasets. We observed that the proposed method outperforms state-of-the-art mutual information based FS techniques and performs comparably to other heuristic approaches that solve the set-covering formulation of feature selection.","PeriodicalId":360747,"journal":{"name":"Proceedings of the 7th ACM IKDD CoDS and 25th COMAD","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130204624","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

CLUST CLUST

Proceedings of the 7th ACM IKDD CoDS and 25th COMAD Pub Date : 2020-01-05 DOI: 10.1145/3371158.3371159

S. Vengadeswaran, S. Balasundaram

{"title":"CLUST","authors":"S. Vengadeswaran, S. Balasundaram","doi":"10.1145/3371158.3371159","DOIUrl":"https://doi.org/10.1145/3371158.3371159","url":null,"abstract":"Currently most applications are data-intensive in nature and require the ability to process large data sets across a cluster of nodes. The Hadoop-an open-source implementation of MapReduce (MR) architecture has become the de facto processing platform for these applications. Even though Hadoop is considered as an ideal solution to analyse and gain insights from massive data, it has its own limitations when the data to be processed exhibits interest-locality (i.e. the data required for any query execution follows grouping behaviour wherein only a part of big data is accessed frequently). Since Hadoop data placement does not consider interest-locality, the dependent blocks required for execution may be concentrated within fewer computing nodes, resulting in severe degradation in MR performance. Hence in this paper, CLUST- Optimal data placement strategy based on grouping semantics is proposed, so that the query can be solved earlier. This paper harnesses the Hierarchical agglomerative clustering techniques in data placement for achieving improved MR performance during execution of interest-based queries. It has been validated by executing complex interest-based queries on NCDC weather dataset, distributed in two scalable heterogeneous Hadoop clusters deployed on the cloud. The CLUST significantly reduces execution time, improves data locality, CPU utilisation and proves to be an efficient solution for big data processing.","PeriodicalId":360747,"journal":{"name":"Proceedings of the 7th ACM IKDD CoDS and 25th COMAD","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129956267","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Adaptive GloVe and FastText Model for Hindi Word Embeddings 印地语词嵌入的自适应GloVe和FastText模型

Proceedings of the 7th ACM IKDD CoDS and 25th COMAD Pub Date : 2020-01-05 DOI: 10.1145/3371158.3371179

Vijay Gaikwad, Y. Haribhakta

{"title":"Adaptive GloVe and FastText Model for Hindi Word Embeddings","authors":"Vijay Gaikwad, Y. Haribhakta","doi":"10.1145/3371158.3371179","DOIUrl":"https://doi.org/10.1145/3371158.3371179","url":null,"abstract":"Today, a lot of research is carried out on word embeddings in NLP domain. The algorithms like GloVe, FastText are used to develop word embeddings. However, not enough work is done on Indian languages due to lack of resource availability. The datasets required for testing word embeddings are not available for Indian languages. In this paper, two algorithms are proposed - Adaptive GloVe model (AGM) and Adaptive FastText model (AFM). Adapting to the co-occurrence matrix generation process of the original GloVe model, AGM, leverages part of speech tags, morphological knowledge of the language. Assigning higher co-occurrence weight to words with same root, AGM, significantly improved accuracy of resultant word embeddings on syntactic datasets. Whereas, AFM improves the vocabulary building process of the original FastText model. The work involves generation of word embeddings for low resource language like Hindi using AGM and AFM and creation of necessary test datasets for evaluating word embeddings. AGM word embeddings showed morphological awareness, achieving 9% increase in accuracy on syntactic word analogy task, compared to original GloVe model. AFM outperformed FastText by 1% accuracy in word analogy task and 2 Spearman rank on word similarity task, providing state-of-the-art performance.","PeriodicalId":360747,"journal":{"name":"Proceedings of the 7th ACM IKDD CoDS and 25th COMAD","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114081642","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7