Proceedings of the 7th ACM IKDD CoDS and 25th COMAD最新文献

筛选
英文 中文
Solar Energy Forecasting Using Machine Learning 利用机器学习进行太阳能预测
Proceedings of the 7th ACM IKDD CoDS and 25th COMAD Pub Date : 2020-01-05 DOI: 10.1145/3371158.3371212
Karan Kumar, Nipun Batra
{"title":"Solar Energy Forecasting Using Machine Learning","authors":"Karan Kumar, Nipun Batra","doi":"10.1145/3371158.3371212","DOIUrl":"https://doi.org/10.1145/3371158.3371212","url":null,"abstract":"Motivation From 2010 to 2040, the world’s total energy requirement will increase by 56% [1]. Solar energy is among the largest sources of renewable energy in the world. At the current rate, by 2050, solar energy will contribute approximately 20% of the total energy requirement in the world [2]. One of the drawbacks with solar energy is its high dependence on various meteorological conditions such as temperature, humidity, cloud cover; due to which the produced energy is highly volatile and intermittent. Accurately forecasting solar energy production is an important step towards reducing reliance on non-renewable resources. Problem Statement Our aim is to accurately forecast the solar produce yt+K , K timestamps in the future given historical solar produce {y1,y2, ...,yt } and historical and forecasted meteorological data, {M1,M2, ...,Mt , ..,Mt+K }, whereM ∈ Rd corresponding to d meteorological features. Related Work Most of the existing, solar forecasting models require physical information about the solar site such as the azimuth, zenith angle, etc. [4]. Given all these parameters and meteorological conditions of that particular site, one can forecast solar production. It is not easy to collect all these physical parameters manually for a given site, and thus feasibility of such approaches is a concern. Approach There are primarily three timeseries forecasting methods: (1.) Prediction using the historical values of the variable to be forecasted only, i.e. to predictyt+K using only {y1,y2, ...,yt }, (2.) Forecasting using the external features, i.e. to predictyt+K = f (M1 t+K , ...,M d t+K ), (3.) Applying the combination of both approaches, called dynamic regression model [3]. For forecasting the solar production, we are using first and third approaches. Evaluation: 1. Dataset We are using the solar energy sampled every 20 minutes collected from 10 (four of capacity 25 KWh and six of capacity 15 KWh) different stations inside the IIT Gandhinagar campus. With the help of Dark Sky API, we collected various meteorological conditions of the site such as temperature, humidity, wind speed, cloud cover, wind bearing, and dew point. 2. Evaluation Metric We use Root Mean Squared Error (RMSE) as our evaluation metric. where ŷt indicates the forecast at time t . Although most of the prior work has been done usingMean Absolute Percentage Error (MAPE), we do not use it, since when energy production is zero (at night), MAPE is undefined.","PeriodicalId":360747,"journal":{"name":"Proceedings of the 7th ACM IKDD CoDS and 25th COMAD","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132359249","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A System for Analysis, Visualization and Retrieval of Crime Documents 犯罪文献分析、可视化与检索系统
Proceedings of the 7th ACM IKDD CoDS and 25th COMAD Pub Date : 2020-01-05 DOI: 10.1145/3371158.3371405
Rupsa Saha, Abir Naskar, Tirthankar Dasgupta, Lipika Dey
{"title":"A System for Analysis, Visualization and Retrieval of Crime Documents","authors":"Rupsa Saha, Abir Naskar, Tirthankar Dasgupta, Lipika Dey","doi":"10.1145/3371158.3371405","DOIUrl":"https://doi.org/10.1145/3371158.3371405","url":null,"abstract":"In this paper, we present the demonstration of a system that helps in analytics and visualization of crime information extracted from large text repositories. Extraction of crime indicators is performed using a CNN-BiLSTM based multi-classification network. The system is equipped with a query and retrieval system that provides a user with insights about the crime patterns and statistics extracted from the underlying repository. It allows a user to browse through functionally related articles linked automatically by the system. The system also allows the user to deep dive into the repository and view several aggregated statistic along various dimensions.","PeriodicalId":360747,"journal":{"name":"Proceedings of the 7th ACM IKDD CoDS and 25th COMAD","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123512440","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Fairness in Algorithmic Decision Making 算法决策中的公平性
Proceedings of the 7th ACM IKDD CoDS and 25th COMAD Pub Date : 2020-01-05 DOI: 10.1145/3371158.3371234
Abhijnan Chakraborty, K. Gummadi
{"title":"Fairness in Algorithmic Decision Making","authors":"Abhijnan Chakraborty, K. Gummadi","doi":"10.1145/3371158.3371234","DOIUrl":"https://doi.org/10.1145/3371158.3371234","url":null,"abstract":"Algorithmic (data-driven) decision making is increasingly being used to assist or replace human decision making in domains with high societal impact, such as banking (estimating creditworthiness), recruiting (ranking applicants), judiciary (offender profiling) and journalism (recommending news-stories). Consequently, in recent times, multiple research works have attempted to identify (measure) bias or unfairness in algorithmic decisions and propose mechanisms to control (mitigate) such biases. In this tutorial, we introduce the related literature to the cods-comad community. Moreover, going over the more prevalent works on fairness in classification or regression tasks, we explore fairness issues in decision making scenarios, where we need to account for preferences of multiple stakeholders. Specifically, we cover our own past and ongoing works on fairness in recommendation and matching systems. We discuss the notions of fairness in these contexts and propose techniques to achieve them. Additionally, we briefly touch upon the possibility of utilizing user interface of platforms (choice architecture) to achieve fair outcomes in certain scenarios. We conclude the tutorial with a list of open questions and directions for future work.","PeriodicalId":360747,"journal":{"name":"Proceedings of the 7th ACM IKDD CoDS and 25th COMAD","volume":"133 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127366877","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Harnessing Deep Cross-lingual Word Embeddings to Infer Accurate Phylogenetic Trees 利用深度跨语言词嵌入来推断准确的系统发育树
Proceedings of the 7th ACM IKDD CoDS and 25th COMAD Pub Date : 2020-01-05 DOI: 10.1145/3371158.3371210
Yashasvi Mantha, Diptesh Kanojia, Abhijeet Dubey, P. Bhattacharyya, Malhar A. Kulkarni
{"title":"Harnessing Deep Cross-lingual Word Embeddings to Infer Accurate Phylogenetic Trees","authors":"Yashasvi Mantha, Diptesh Kanojia, Abhijeet Dubey, P. Bhattacharyya, Malhar A. Kulkarni","doi":"10.1145/3371158.3371210","DOIUrl":"https://doi.org/10.1145/3371158.3371210","url":null,"abstract":"Establishing language relatedness by inferring phylogenetic trees has been a topic of interest in the area of diachronic linguistics. However, existing methods face meaning conflation deficiency due to the usage of lexical similarity-based measures. In this paper, we utilize fourteen linked Indian Wordnets to create inter-language distances using our novel approach to compute 'language distances'. Our pilot study uses deep cross-lingual word embeddings to compute inter-language distances and provide an effective distance matrix to infer phylogenetic trees. We also develop a baseline method using lexical similarity-based metrics for comparison and identify that our approach produces better phylogenetic trees which club related languages closer when compared to the baseline approach.","PeriodicalId":360747,"journal":{"name":"Proceedings of the 7th ACM IKDD CoDS and 25th COMAD","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131315013","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Towards Designing Accurate FISH Probe Detection using 3D U-Nets on Microscopic Blood Cell Images 利用三维u网对微小血细胞图像设计精确的FISH探针检测
Proceedings of the 7th ACM IKDD CoDS and 25th COMAD Pub Date : 2020-01-05 DOI: 10.1145/3371158.3371201
Chinmay Savadikar, S. Tahvilian, L. Baden, R. Reed, D. Leventon, P. Pagano, Bhushan Garware
{"title":"Towards Designing Accurate FISH Probe Detection using 3D U-Nets on Microscopic Blood Cell Images","authors":"Chinmay Savadikar, S. Tahvilian, L. Baden, R. Reed, D. Leventon, P. Pagano, Bhushan Garware","doi":"10.1145/3371158.3371201","DOIUrl":"https://doi.org/10.1145/3371158.3371201","url":null,"abstract":"Fluorescence in-situ hybridization (FISH) is a molecular cytogenetic technique developed to detect or localize the presence or absence of specific DNA sequences or chromosomes. Lung LB is a FISH based confirmatory diagnostic test for lung cancer which detects circulating tumor cells (CTC) in clinical patients with indeterminate lung nodules. In this paper, we propose a novel approach to segment FISH probes using 3D U-Nets and highlight the limitations of traditional Computer Vision based segmentation techniques for microscopic images. We observe a significant reduction in false positive rates without losing any real verified CTC, thus helping to improve the efficiency of the pathologists and accuracy of Lung LB. The proposed method results in a average percentage reduction of 62.875% in the number of falsely detected CTCs over the commercially available tool on 20 clinical cases (~1,86,901 cells), while achieving an average of 94.72% recall across the cases, showing an improvement over the the recall of 72.9% of the the commercial system.","PeriodicalId":360747,"journal":{"name":"Proceedings of the 7th ACM IKDD CoDS and 25th COMAD","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130877196","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Forecasting the Future: Leveraging RNN based Feature Concatenation for Tweet Outbreak Prediction 预测未来:利用基于RNN的特征拼接进行Tweet爆发预测
Proceedings of the 7th ACM IKDD CoDS and 25th COMAD Pub Date : 2020-01-05 DOI: 10.1145/3371158.3371190
Saswata Roy, B. Suman, Joydeep Chandra, Sourav Kumar Dandapat
{"title":"Forecasting the Future: Leveraging RNN based Feature Concatenation for Tweet Outbreak Prediction","authors":"Saswata Roy, B. Suman, Joydeep Chandra, Sourav Kumar Dandapat","doi":"10.1145/3371158.3371190","DOIUrl":"https://doi.org/10.1145/3371158.3371190","url":null,"abstract":"Cascade outbreak is a common phenomenon observed across different social networking platforms. Cascade outbreak might have severe implications in different scenarios like a fake news/rumour can spread across a significant number of people, or a hate news can be propagated, which may incite violence etc. Early prediction of cascade outbreak would help in taking proper remedial action and hence is an important research direction. Most of the existing approaches predicted the popularity of social networking post either by machine learning techniques or using statistical models. Simple machine learning based approaches may miss important features while statistical models use hard-coded functions which might not be suitable in a different scenario. With the availability of huge data, recently deep learning based models have also been applied in the prediction of cascade outbreak. This study identified the limitation of existing deep learning based approaches and proposed a Recurrent Neural Network based Hybrid Model with Feature Concatenation (RNN-HMFC) approach. RNN-HMFC captures important latent features of textual aspect and retweet information respectively by LSTM and GRU and also uses a set of handcrafted features like additional tweet information and user social information for prediction of virality. We achieve 2.7% - 6.45% higher accuracy compared to the state of the art methods on different datasets.","PeriodicalId":360747,"journal":{"name":"Proceedings of the 7th ACM IKDD CoDS and 25th COMAD","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117282719","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Estimation of PM2.5 using satellite and meteorological data 利用卫星和气象资料估算PM2.5
Proceedings of the 7th ACM IKDD CoDS and 25th COMAD Pub Date : 2020-01-05 DOI: 10.1145/3371158.3371216
Souvik Roy, Nipun Batra, Pawan Gupta
{"title":"Estimation of PM2.5 using satellite and meteorological data","authors":"Souvik Roy, Nipun Batra, Pawan Gupta","doi":"10.1145/3371158.3371216","DOIUrl":"https://doi.org/10.1145/3371158.3371216","url":null,"abstract":"Motivation: Air pollution is measured by the amount of PM2.5 the air contains. These are fine particles with a diameter less than 2.5 micrometres that can penetrate deep into the lungs and trigger severe respiratory diseases. The concentration of PM2.5 in the air can be measured using ground-based monitoring stations, but there is a considerable deficit in the number of stations required for reliable measurements as air quality varies spatially and temporally across a given region. Given the non-trivial costs of installing and maintaining ground-based PM2.5 sensors, previous research has looked at using satellite retrievals for estimating PM2.5 data from visual features. Problem Statement: The goal is to predict PM2.5 from aerosol optical thickness (AOT), which is a measure of how much light is attenuated by the aerosols (e.g. haze, smoke particles, desert dust) as it passes the atmosphere. Previous studies have shown that higher amount of PM2.5 reduces the light transmission and increases attenuation and thereby causes higher AOT [2].We further examine the addition of the meteorological factors as predictor variables and its effect on the correlation with PM2.5.","PeriodicalId":360747,"journal":{"name":"Proceedings of the 7th ACM IKDD CoDS and 25th COMAD","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134534779","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Feature Selection for High-Dimensional Data Through Instance Vote Combining 基于实例投票组合的高维数据特征选择
Proceedings of the 7th ACM IKDD CoDS and 25th COMAD Pub Date : 2020-01-05 DOI: 10.1145/3371158.3371177
Lily Chamakura, G. Saha
{"title":"Feature Selection for High-Dimensional Data Through Instance Vote Combining","authors":"Lily Chamakura, G. Saha","doi":"10.1145/3371158.3371177","DOIUrl":"https://doi.org/10.1145/3371158.3371177","url":null,"abstract":"Supervised feature selection (FS) is used to select a discriminative and non-redundant subset of features in classification problems dealing with high dimensional inputs. In this paper, feature selection is posed akin to the set-covering problem where the goal is to select a subset of features such that they cover the instances. To solve this formulation, we quantify the local relevance (i.e., votes assigned by instances) of each feature that captures the extent to which a given feature is useful to classify the individual instances correctly. In this work, we propose to combine the instance votes across features to infer their joint local relevance. The votes are combined on the basis of geometric principles underlying classification and feature spaces. Further, we show how such instance vote combining may be employed to derive a heuristic search strategy for selecting a relevant and non-redundant subset of features. We illustrate the effectiveness of our approach by evaluating the classification performance and robustness to data variations on publicly available benchmark datasets. We observed that the proposed method outperforms state-of-the-art mutual information based FS techniques and performs comparably to other heuristic approaches that solve the set-covering formulation of feature selection.","PeriodicalId":360747,"journal":{"name":"Proceedings of the 7th ACM IKDD CoDS and 25th COMAD","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130204624","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CLUST CLUST
Proceedings of the 7th ACM IKDD CoDS and 25th COMAD Pub Date : 2020-01-05 DOI: 10.1145/3371158.3371159
S. Vengadeswaran, S. Balasundaram
{"title":"CLUST","authors":"S. Vengadeswaran, S. Balasundaram","doi":"10.1145/3371158.3371159","DOIUrl":"https://doi.org/10.1145/3371158.3371159","url":null,"abstract":"Currently most applications are data-intensive in nature and require the ability to process large data sets across a cluster of nodes. The Hadoop-an open-source implementation of MapReduce (MR) architecture has become the de facto processing platform for these applications. Even though Hadoop is considered as an ideal solution to analyse and gain insights from massive data, it has its own limitations when the data to be processed exhibits interest-locality (i.e. the data required for any query execution follows grouping behaviour wherein only a part of big data is accessed frequently). Since Hadoop data placement does not consider interest-locality, the dependent blocks required for execution may be concentrated within fewer computing nodes, resulting in severe degradation in MR performance. Hence in this paper, CLUST- Optimal data placement strategy based on grouping semantics is proposed, so that the query can be solved earlier. This paper harnesses the Hierarchical agglomerative clustering techniques in data placement for achieving improved MR performance during execution of interest-based queries. It has been validated by executing complex interest-based queries on NCDC weather dataset, distributed in two scalable heterogeneous Hadoop clusters deployed on the cloud. The CLUST significantly reduces execution time, improves data locality, CPU utilisation and proves to be an efficient solution for big data processing.","PeriodicalId":360747,"journal":{"name":"Proceedings of the 7th ACM IKDD CoDS and 25th COMAD","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129956267","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Adaptive GloVe and FastText Model for Hindi Word Embeddings 印地语词嵌入的自适应GloVe和FastText模型
Proceedings of the 7th ACM IKDD CoDS and 25th COMAD Pub Date : 2020-01-05 DOI: 10.1145/3371158.3371179
Vijay Gaikwad, Y. Haribhakta
{"title":"Adaptive GloVe and FastText Model for Hindi Word Embeddings","authors":"Vijay Gaikwad, Y. Haribhakta","doi":"10.1145/3371158.3371179","DOIUrl":"https://doi.org/10.1145/3371158.3371179","url":null,"abstract":"Today, a lot of research is carried out on word embeddings in NLP domain. The algorithms like GloVe, FastText are used to develop word embeddings. However, not enough work is done on Indian languages due to lack of resource availability. The datasets required for testing word embeddings are not available for Indian languages. In this paper, two algorithms are proposed - Adaptive GloVe model (AGM) and Adaptive FastText model (AFM). Adapting to the co-occurrence matrix generation process of the original GloVe model, AGM, leverages part of speech tags, morphological knowledge of the language. Assigning higher co-occurrence weight to words with same root, AGM, significantly improved accuracy of resultant word embeddings on syntactic datasets. Whereas, AFM improves the vocabulary building process of the original FastText model. The work involves generation of word embeddings for low resource language like Hindi using AGM and AFM and creation of necessary test datasets for evaluating word embeddings. AGM word embeddings showed morphological awareness, achieving 9% increase in accuracy on syntactic word analogy task, compared to original GloVe model. AFM outperformed FastText by 1% accuracy in word analogy task and 2 Spearman rank on word similarity task, providing state-of-the-art performance.","PeriodicalId":360747,"journal":{"name":"Proceedings of the 7th ACM IKDD CoDS and 25th COMAD","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114081642","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信