2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science : IRI 2020 : proceedings : virtual conference, 11-13 August 2020. IEEE International Conference on Information Reuse and Integration (21st : 2...最新文献
Priyanka Annapureddy, Md Fitrat Hossain, Thomas Kissane, Wylie Frydrychowicz, Paromita Nitu, Joseph Coelho, Nadiyah Johnson, P. Madiraju, Zeno Franco, Katinka Hooyer, Niharika Jain, M. Flower, Sheikh Iqbal Ahamed
{"title":"Predicting PTSD Severity in Veterans from Self-reports for Early Intervention: A Machine Learning Approach","authors":"Priyanka Annapureddy, Md Fitrat Hossain, Thomas Kissane, Wylie Frydrychowicz, Paromita Nitu, Joseph Coelho, Nadiyah Johnson, P. Madiraju, Zeno Franco, Katinka Hooyer, Niharika Jain, M. Flower, Sheikh Iqbal Ahamed","doi":"10.1109/IRI49571.2020.00036","DOIUrl":"https://doi.org/10.1109/IRI49571.2020.00036","url":null,"abstract":"Early intervention for veterans in crisis represents a crucial area of study to reduce the psychological and health burdens for this population. Traumatic experiences associated with military service are associated with drug and alcohol abuse, suicidality, anger, and disrupted work and family relationships. This project used machine learning (ML) models to integrate data from sociodemographic, self-report baseline symptoms, weekly brief Ecological momentary assessment (EMA) survey of veterans in a community-based 12-week peer support program to predict the discharge PTSD severity level. The ML predictions place the participants into one of the three risk levels: low, medium, and high PCL-5 score. The models were evaluated at different timepoints (weekly intervals) of the program for identifying the earliest week to guide early intervention and reduce veterans’ engagement in risky behaviors. The best results were achieved from a voting classifier with an average f-score of 0.69 at week 4.","PeriodicalId":93159,"journal":{"name":"2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science : IRI 2020 : proceedings : virtual conference, 11-13 August 2020. IEEE International Conference on Information Reuse and Integration (21st : 2...","volume":"34 1","pages":"201-208"},"PeriodicalIF":0.0,"publicationDate":"2020-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87502409","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Dynamic image for micro-expression recognition on region-based framework","authors":"T. Le, T. Tran, M. Rege","doi":"10.1109/IRI49571.2020.00019","DOIUrl":"https://doi.org/10.1109/IRI49571.2020.00019","url":null,"abstract":"Facial micro-expressions are involuntary facial expressions with low intensity and short duration natures in which hidden emotions can be revealed. Micro-expression analysis has been increasingly received tremendous attention and become advanced in the field of computer vision. However, it appears to be very challenging and requires resources to a greater extent to study micro-expressions. Most of the recent works have attempted to improve the spontaneous facial micro-expression recognition with sophisticated and hand-crafted feature extraction techniques. The use of deep neural networks has also been adopted to leverage this task. In this paper, we present a compact framework where a rank pooling concept called dynamic image is employed as a descriptor to extract informative features on certain regions of interests along with a convolutional neural network (CNN) deployed on elicited dynamic images to recognize micro-expressions therein. Particularly, facial motion magnification technique is applied on input sequences to enhance the magnitude of facial movements in the data. Subsequently, rank pooling is implemented to attain dynamic images. Only a fixed number of localized facial areas are extracted on the dynamic images based on observed dominant muscular changes. CNN models are fit to the final feature representation for emotion classification task. The framework is simple compared to that of other findings, yet the logic behind it justifies the effectiveness by the experimental results we achieved throughout the study. The experiment is evaluated on three state-of-the-art databases CASMEII, SMIC and SAMM.","PeriodicalId":93159,"journal":{"name":"2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science : IRI 2020 : proceedings : virtual conference, 11-13 August 2020. IEEE International Conference on Information Reuse and Integration (21st : 2...","volume":"24 1","pages":"75-81"},"PeriodicalIF":0.0,"publicationDate":"2020-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83569994","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Forecasting Atmospheric Visibility Using Auto Regressive Recurrent Neural Network","authors":"Jahnavi Jonnalagadda, M. Hashemi","doi":"10.1109/IRI49571.2020.00037","DOIUrl":"https://doi.org/10.1109/IRI49571.2020.00037","url":null,"abstract":"Atmospheric visibility conditions not only affect traffic on roads, but also aviation operations. Poor visibility at the destination site can reduce airport capacity leading to ground delays, flight cancellations, flight diversions, and extra operating costs. Hence, timely forecast of visibility is important for safe operation in both airports and highways. Visibility is affected by meteorological weather variables such as precipitation, temperature, wind speed, humidity, smoke, fog, mist, and Particulate Matter (PM) concentrations in the atmosphere. This paper is an effort to forecast univariate weather variable visibility and explore the effect of highly correlated meteorological weather variables on visibility, using an Auto Regressive Recurrent Neural Network (ARRNN). By adjusting the number of epochs and the regression horizon, i.e. past time steps used in visibility prediction, we showed that ARRNN outperforms long-short term memory (LSTM) networks and vanilla recurrent neural network (Vanilla RNN) in terms of coefficient of determination (R2).","PeriodicalId":93159,"journal":{"name":"2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science : IRI 2020 : proceedings : virtual conference, 11-13 August 2020. IEEE International Conference on Information Reuse and Integration (21st : 2...","volume":"13 1","pages":"209-215"},"PeriodicalIF":0.0,"publicationDate":"2020-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88736007","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"NLP Relational Queries and Its Application","authors":"Andrei Stoica, K. Pu, Heidar Davoudi","doi":"10.1109/IRI49571.2020.00064","DOIUrl":"https://doi.org/10.1109/IRI49571.2020.00064","url":null,"abstract":"Recent advances in natural language processing have shown the effectiveness of statistical and neural networkbased algorithms in a deep understanding of textual data. We demonstrate that the result of NLP analysis on text documents can enrich relational data in a way so that structured queries can be used to derive further value from text data. In this paper, we present how we can perform analytics on a scientific research dataset based on both the relational data and NLP topic modeling. The integrated NLP features together with the classical relational query constructs allow one to explore the topic structure of the DBLP dataset with flexibility and precision.","PeriodicalId":93159,"journal":{"name":"2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science : IRI 2020 : proceedings : virtual conference, 11-13 August 2020. IEEE International Conference on Information Reuse and Integration (21st : 2...","volume":"48 1","pages":"395-398"},"PeriodicalIF":0.0,"publicationDate":"2020-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90886585","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Global Land Temperature Forecasting Using Long Short-Term Memory Network","authors":"Prashanti Maktala, M. Hashemi","doi":"10.1109/IRI49571.2020.00038","DOIUrl":"https://doi.org/10.1109/IRI49571.2020.00038","url":null,"abstract":"Based on NASA’s 40 years of satellite data, earth has experienced drastic climatic changes in the form of sea-level rise, an increase in oceanic and atmospheric temperatures, depletion of the Ozone layer, and decrease in sea ice and snow cover. These observations point to the fact that the world is getting warmer, which significantly impacts humans and ecological systems. Forecasting global land temperature could help to identify the extent of devasting consequences on the natural habitat and shed light on the impact of policies, designed to mitigate them. Previous studies have attempted to forecast regional temperatures using traditional machine learning models. This paper uses a standard multi-layer perceptron, a simple Recurrent Neural Network, and a Long Short-Term Memory network to forecast next month’s global land temperature. Our results show that deep learning outperforms traditional machine learning models, including decision tree, random forest, and ridge regression.","PeriodicalId":93159,"journal":{"name":"2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science : IRI 2020 : proceedings : virtual conference, 11-13 August 2020. IEEE International Conference on Information Reuse and Integration (21st : 2...","volume":"43 1","pages":"216-223"},"PeriodicalIF":0.0,"publicationDate":"2020-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85098533","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Medicare Fraud Detection using CatBoost","authors":"John T. Hancock, T. Khoshgoftaar","doi":"10.1109/IRI49571.2020.00022","DOIUrl":"https://doi.org/10.1109/IRI49571.2020.00022","url":null,"abstract":"In this study we investigate the performance of CatBoost in the task of identifying Medicare fraud. The Medicare claims data we use as input for CatBoost contain a number of categorical features. Some of these features, such as the procedure code and provider zip code, have thousands of possible values. One contribution we make in this study is to show how we use CatBoost to eliminate some data pre-processing steps that authors of related works take. A second contribution we make is to show improvements in CatBoost’s performance in terms of Area Under the Receiver Operating Characteristic Curve (AUC), when we include another one of the categorical features (provider state) as input to CatBoost. We show that CatBoost attains better performance than XGBoost in the task of Medicare fraud detection with respect to the AUC metric. At a 99% confidence level (with p-value 0) our experiments show that XGBoost obtains a mean AUC value of 0.7615 while CatBoost obtains a mean AUC value of 0.7851, validating the significance of CatBoost’s performance improvement over XGBoost. Moreover, when we include an additional categorical feature (healthcare provider state) in our data analysis, CatBoost yields a mean AUC value of 0.8902, which is also statistically signficant at a 99% confidence interval level (with p-value 0). Our empirical evidence clearly indicates CatBoost is a better alternative to XGBoost for Medicare fraud detection, especially when dealing with categorical features.","PeriodicalId":93159,"journal":{"name":"2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science : IRI 2020 : proceedings : virtual conference, 11-13 August 2020. IEEE International Conference on Information Reuse and Integration (21st : 2...","volume":"32 1","pages":"97-103"},"PeriodicalIF":0.0,"publicationDate":"2020-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86600576","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Message from the Program Co-Chairs - IRI 2020","authors":"","doi":"10.1109/iri49571.2020.00006","DOIUrl":"https://doi.org/10.1109/iri49571.2020.00006","url":null,"abstract":"","PeriodicalId":93159,"journal":{"name":"2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science : IRI 2020 : proceedings : virtual conference, 11-13 August 2020. IEEE International Conference on Information Reuse and Integration (21st : 2...","volume":"3 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78959663","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Latent Feature Modelling for Recommender Systems","authors":"Abdullah Alhejaili, S. Fatima","doi":"10.1109/IRI49571.2020.00057","DOIUrl":"https://doi.org/10.1109/IRI49571.2020.00057","url":null,"abstract":"Matrix factorization is one of the most successful model-based collaborative filtering approaches in recommender systems. Nevertheless, useful latent user features can lead to a more accurate recommendation. However, user privacy and cross-domains access restrictions challenge collection and analysis of such information. In this study, we propose a feature extraction method (WAFE) which leverages user-item interaction history to extract useful latent user features. We also propose a rating prediction approach that incorporates the local mean of users’ and items’ ratings. We evaluate our proposed model using two real-world benchmark datasets and compare its performance against the state-of-the-art matrix factorization collaborative filtering methods. Evaluation results show that proposed method outperforms the existing methods.","PeriodicalId":93159,"journal":{"name":"2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science : IRI 2020 : proceedings : virtual conference, 11-13 August 2020. IEEE International Conference on Information Reuse and Integration (21st : 2...","volume":"10 1","pages":"349-356"},"PeriodicalIF":0.0,"publicationDate":"2020-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73132614","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Natural Language-based Integration of Online Review Datasets for Identification of Sex Trafficking Businesses.","authors":"Maria Diaz, Anand Panangadan","doi":"10.1109/iri49571.2020.00044","DOIUrl":"https://doi.org/10.1109/iri49571.2020.00044","url":null,"abstract":"<p><p>There is increasing interest in automatically identifying advertisements related to sex trafficking in online review sites. The main challenge is to identify the changing patterns in text reviews that are used to indicate illegal businesses. This work describes a novel means of identifying illegal business advertisements using natural language processing and machine learning. The method relies on building a training set of reviews of known illegal businesses. This training data is created by integrating a small high precision set of known illegal businesses (Rubmaps) with a large collection of online reviews from a general purpose review site (Yelp). Standard natural language pre-processing techniques are then applied to the text reviews and converted into a bag-of-words model with Term frequency-inverse document weighting. The resulting Document-Term matrix is used to train a classifier and then to identify suspicious activity from the remaining reviews. This approach therefore leverages a high-precision, low-recall dataset to identify relevant instances from the large low-precision, high-recall dataset. The approach was evaluated on a collection of 456,050 reviews from the Yelp online forum with a variety of machine learning algorithms and different number of text features. The method achieved a f1-score of 0.77 with a random forests classifier. The number of text features could also be reduced from 1,473 to 447 for a compact classifier with only a small drop in accuracy.</p>","PeriodicalId":93159,"journal":{"name":"2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science : IRI 2020 : proceedings : virtual conference, 11-13 August 2020. IEEE International Conference on Information Reuse and Integration (21st : 2...","volume":"2020 ","pages":"259-264"},"PeriodicalIF":0.0,"publicationDate":"2020-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/iri49571.2020.00044","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39683511","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Foreword - IRI 2020","authors":"","doi":"10.1109/iri49571.2020.00005","DOIUrl":"https://doi.org/10.1109/iri49571.2020.00005","url":null,"abstract":"","PeriodicalId":93159,"journal":{"name":"2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science : IRI 2020 : proceedings : virtual conference, 11-13 August 2020. IEEE International Conference on Information Reuse and Integration (21st : 2...","volume":"35 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90084648","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}