{"title":"Encouraging Sustainability Practices through Entity Recognition of Food Items on Social Media","authors":"Eugene Lee, Brandon Chenze, A. Panangadan","doi":"10.1109/IRI51335.2021.00042","DOIUrl":"https://doi.org/10.1109/IRI51335.2021.00042","url":null,"abstract":"Food waste by end-consumers is a topic that is under-studied although this is an issue that has direct and indirect consequences on greenhouse gas emissions, water use, and health. One reason for food waste in homes in the United States is the perception that food is no longer safe after the stamped date on the product. Targeted outreach to correct these misconceptions could therefore reduce food waste. As large numbers of people publish their daily activities on social media networks, these messages can be automatically analyzed in real-time to identify situations involving specific foods and then respond to these cases with relevant information about food preparation and safety. This work describes a method to recognize mentions of food items on the Twitter social network. Since food items are not a standard entity class in existing entity recognition systems, a dataset of known food items is used to train an entity recognition framework to recognize these items in context. The training data is obtained from tweets that match a known database of keywords describing different foods. The accuracy of the proposed food entity recognition approach is evaluated on a hand-labeled dataset of randomly selected tweets. The method has a precision of 0.96 and a recall of 0.52 (f-score of 0.68) which is comparable to f-score of 0.75 for named-entity-recognition of drug-related terms on Reddit text data. After a specific food is recognized in a tweet, the corresponding information about its preparation and storage can be looked up in the FoodKeeper database of the U.S. Food Safety and Inspection Service. This message could then be tweeted as a reply to the original tweet.","PeriodicalId":293393,"journal":{"name":"2021 IEEE 22nd International Conference on Information Reuse and Integration for Data Science (IRI)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125261061","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Amine Hamza Yaker, Lydia Bouzar-Benlabiod, Anne Menendez, G. Paillet
{"title":"Mammograms Classification with NeuroMem® Chip for Breast Cancer Detection","authors":"Amine Hamza Yaker, Lydia Bouzar-Benlabiod, Anne Menendez, G. Paillet","doi":"10.1109/IRI51335.2021.00038","DOIUrl":"https://doi.org/10.1109/IRI51335.2021.00038","url":null,"abstract":"This paper presents an embedded AI model for breast cancer detection on mammogram images. Two different methods are defined and evaluated using the NeuroMem® neural network chip device. Both methods are tested on the CBIS-DDSM which is a public dataset of labeled mammograms and deliver a classification accuracy of 85 %.","PeriodicalId":293393,"journal":{"name":"2021 IEEE 22nd International Conference on Information Reuse and Integration for Data Science (IRI)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114296396","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
E. Mohammed, Esmaeil Shakeri, Zahra Shakeri Hossein Abad, Trafford Crump, B. Far
{"title":"Important Features Identification for Prostate Cancer Patients Stratification Using Isolation Forest and Interactive Clustering Method","authors":"E. Mohammed, Esmaeil Shakeri, Zahra Shakeri Hossein Abad, Trafford Crump, B. Far","doi":"10.1109/IRI51335.2021.00052","DOIUrl":"https://doi.org/10.1109/IRI51335.2021.00052","url":null,"abstract":"Prostate-specific Antigen (PSA) levels are commonly used to screen prostate cancer patients. However, because of the wide range of PSA levels in men, the classification results pertain to extensive false positives and false negatives that may impact the patient treatment. This paper presents a method to cluster prostate cancer patient clinical and demographics data into homogenous groups to support prostate cancer patients' classification with high accuracy. The proposed method is based on the isolation forest and interactive (two-step) clustering algorithm. We further analyze each group for commonalities and differences. The dataset used in this paper is collected from participants enrolled in the Alberta Prostate Cancer Research Initiative (APCaRI) study, which includes (after pre-processing) 2,878 patients with 20 clinical and demographics variables. The APCaRI study enrolled the population of men undergoing prostate cancer diagnosis in Calgary and Edmonton, Canada. These patients are referred for a diagnostic biopsy based on conventional clinical guidelines (e.g., elevated PSA or abnormal digital rectal examination). The data contains three different PSA levels measured at three follow-up times and the initial screening PSA level. The analysis results show that the PSA levels are a significant factor within each group, and there is a significant overlap between PSA levels between groups, and it may not be the best factor to classify prostate cancer patients. The data's majority group has PSA levels (10.83%, 10.44%, and 10.14%) smaller than the remaining groups. This paper concludes that it is maybe better to design an independent classifier per group to identify prostate cancer patients from clinical and demographics data.","PeriodicalId":293393,"journal":{"name":"2021 IEEE 22nd International Conference on Information Reuse and Integration for Data Science (IRI)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114515069","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chinmayee Rane, S. Subramanya, Devi Sandeep Endluri, Jian Wu, C. Lee Giles
{"title":"ChartReader: Automatic Parsing of Bar-Plots","authors":"Chinmayee Rane, S. Subramanya, Devi Sandeep Endluri, Jian Wu, C. Lee Giles","doi":"10.1109/IRI51335.2021.00050","DOIUrl":"https://doi.org/10.1109/IRI51335.2021.00050","url":null,"abstract":"Scientific figures such as bar graphs are a critical part of scientific research and a predominant method used to represent trends and relationships in data. However, manually interpreting and extracting information from graphs is often tedious. Since data consumption has exponentially evolved over the past few decades, there is a need for automated data inference from these bar graphs. ChartReader presents a fully automated end-to-end framework that extracts data from bar graphs in scientific research papers focusing on process engineering and environmental science journals. ChartReader uses a deep learning-based classifier to determine the chart type of a given chart image. We then develop novel heuristic methods for analyzing scientific figures (text detection, pixel grouping, object detection) and address prime challenges like axis detection, legend parsing, and label detection. Our framework achieves 98% and 68% accuracy in parsing x-axis and y-axis ticks, respectively. It achieves 83% accuracy in parsing legends and 42% accuracy in parsing data values in the testing corpus. We compare the proposed method with state-of-the-art methods and address its limitations.","PeriodicalId":293393,"journal":{"name":"2021 IEEE 22nd International Conference on Information Reuse and Integration for Data Science (IRI)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122062575","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"APD: An Autoencoder-based Prediction Model for Depression Diagnosis","authors":"Hyeseong Park, Myung Won Raymond Jung, Uran Oh","doi":"10.1109/IRI51335.2021.00058","DOIUrl":"https://doi.org/10.1109/IRI51335.2021.00058","url":null,"abstract":"Depression is one of the most common mental health problems, which can lead to significant mental disorders and suicidal behavior. To diagnose depression levels, patients with depressive disorders are required to complete self-assessment questionnaires. However, many depressed patients are misdiagnosed in clinical practice due to patients' missing data. In this paper, we introduce, APD, a novel data-driven approach based on autoencoder to predict the missing responses accurately. Inspired by existing autoencoder-based recommender systems, our autoencoder is based on collaborative filtering, which estimates unobserved data by cooperation with other patients' responses. Experimental results show that the proposed autoencoder-based prediction system outperforms the averaging and the linear models. We demonstrate that this model can be used to predict patients' depression status with a low error of 2.85%.","PeriodicalId":293393,"journal":{"name":"2021 IEEE 22nd International Conference on Information Reuse and Integration for Data Science (IRI)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130364118","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Eitan Flor, Ramazan S. Aygun, Suat Mercan, K. Akkaya
{"title":"PRNU-based Source Camera Identification for Multimedia Forensics","authors":"Eitan Flor, Ramazan S. Aygun, Suat Mercan, K. Akkaya","doi":"10.1109/IRI51335.2021.00029","DOIUrl":"https://doi.org/10.1109/IRI51335.2021.00029","url":null,"abstract":"With the increased development and reliance on multimedia data, the importance of attributing the device or camera of origin in the form of source camera identification (SCI) has gained traction in cybersecurity, specifically within digital multimedia forensics. Photo-Response Non-Uniformity (PRNU) is a popular and widely used method for extracting a unique and reliable sensor pattern fingerprint for SCI purposes. The usage of PRNU in distinguishing cameras across different manufacturers and models has proven to be successful; however, we demonstrate that current approaches fail to distinguish cameras amongst the same manufacturers and models. As such, in this paper, we propose a new algorithm that focuses on emphasizing the pixels that contribute to the sensor noise in the PRNU pattern to distinguish cameras of the same type. Unlike other similarity metrics used in the process of SCI, we utilize the Jaccard coefficient in order to provide a proportion value of matching pixel locations shared between the noise patterns of two devices. Our experimental results show that our method can successfully determine the device origin of an image from cameras of identical type across Apple iPhone 6, FujiFilm, Panasonic, and Sony cameras.","PeriodicalId":293393,"journal":{"name":"2021 IEEE 22nd International Conference on Information Reuse and Integration for Data Science (IRI)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130384908","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fernando H. Calderon Alvarado, Mau-Yun Ma, Yen-Hao Huang, Yi-Shin Chen
{"title":"CLAMP: Cross-Level Attention for Multi-Party Conversational Emotion Recognition","authors":"Fernando H. Calderon Alvarado, Mau-Yun Ma, Yen-Hao Huang, Yi-Shin Chen","doi":"10.1109/IRI51335.2021.00048","DOIUrl":"https://doi.org/10.1109/IRI51335.2021.00048","url":null,"abstract":"Emotion Recognition in Conversations (ERC) is the task of identifying the emotions of utterances from speakers in a conversation. The major challenge of ERC is how to aggregate useful contextual information from multiple utterances. Existing research in this area has mainly focused on (1) the context modeling of utterances and (2) the interaction between speakers in conversation. Another concern with ERC is that emphatically enhancing context modeling might over contextualize surrounding information and neglect the valuable influence of context-free semantic information from the utterance. In this work, we present the cross-level attention to preserve context-free semantics of utterances and prevent over-contextualizing. Additionally, multiparty attention masks are introduced to better model complex speaker interactions by separating the conversation into speakers and others. The proposed methods are integrated with the transformer-based architecture resulting in the Cross-Level Attention with Multi-Party mask model (CLAMP). The experimental results indicate that CLAMP empirically achieves competitive performance on the IEMOCAP dataset. The proposed method presents and improvement on context utilization when adding conversation utterances. Additionally the contribution of the proposed components is demonstrated in an ablation study.","PeriodicalId":293393,"journal":{"name":"2021 IEEE 22nd International Conference on Information Reuse and Integration for Data Science (IRI)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126660865","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Hierarchical Multimodal Fusion Network with Dynamic Multi-task Learning","authors":"Tianyi Wang, Shu‐Ching Chen","doi":"10.1109/IRI51335.2021.00034","DOIUrl":"https://doi.org/10.1109/IRI51335.2021.00034","url":null,"abstract":"Real-world data often contain multiple modalities and non-exclusive labels. Multimodal fusion is a vital step in mul-timodallearning that integrates features from various modalities in the vector space so that the classifier could utilize the fused vector to generate the final prediction score. Common multimodal fusion approaches rarely consider the cross-modality interactions which play an essential role in exploiting the inter-modality relationship and subsequently creating the joint modality embedding. In this paper, we propose a hierarchical multimodal fusion framework with dynamic multi-task learning. It focuses on modeling the joint embedding space for all cross-modality interactions and adjusting the task loss for optimal performance. The proposed model uses a novel hierarchical multimodal fusion network that learns the cross-modal interactions among all combinations of modalities and dynamically allocates the weights for each pair in a sample-aware fashion. Furthermore, a novel dynamic multi-task learning approach is applied to handle the multi-label problems by automatically adjusting the learning progress on both task level and sample level. We show that the proposed framework outperforms the baselines and some of the state-of-the-art methods. We also demonstrate the flexibility and modularity of the proposed hierarchical multimodal fusion and dynamic multi-task learning units, which can be applied to various types of networks.","PeriodicalId":293393,"journal":{"name":"2021 IEEE 22nd International Conference on Information Reuse and Integration for Data Science (IRI)","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129297291","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"CCFD-Net: A novel deep learning model for credit card fraud detection","authors":"Xiao Liu, Kuan Yan, L. Kara, Zhenguo Nie","doi":"10.1109/IRI51335.2021.00008","DOIUrl":"https://doi.org/10.1109/IRI51335.2021.00008","url":null,"abstract":"Credit card fraud can cause billions of dollars in financial losses to merchants and consumers each year. If fraud can be detected in time and corresponding measures can be taken, financial losses can be significantly alleviated and other derivative frauds can be prevented. Although traditional machine learning methods can achieve good precision and recall in credit card fraud detection, they cannot avoid the false positives effectively. In this paper, we propose a novel Credit Card Fraud Detection model called CCFD-Net that employs a modified residual network architecture. Based on a realworld dataset from Vesta's e-commerce transactions, we conduct comparative analysis on predictive models to evaluate and verify the effectiveness of the proposed method. The paper explores a hybrid architecture of 1D-Conv and the residual neural network (Res-net), evaluates the performance of different machine learning models based on K-fold cross-validation. The results prove the effectiveness and robustness of the model in credit card fraud detection. In practice, our proposed model can identify more fraudulent transactions than other compared models, and performs best on the evaluation metrics. We publicly share our full implementation with the dataset and trained models at https://github.com/zhenguonie/2021_CCFD_Net.","PeriodicalId":293393,"journal":{"name":"2021 IEEE 22nd International Conference on Information Reuse and Integration for Data Science (IRI)","volume":"252 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134600203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mona Assarandarban, Tanmay Bhowmik, A. Q. Do, S. Chekuri, Wentao Wang, Nan Niu
{"title":"Foraging-Theoretic Tool Composition: An Empirical Study on Vulnerability Discovery","authors":"Mona Assarandarban, Tanmay Bhowmik, A. Q. Do, S. Chekuri, Wentao Wang, Nan Niu","doi":"10.1109/IRI51335.2021.00025","DOIUrl":"https://doi.org/10.1109/IRI51335.2021.00025","url":null,"abstract":"Discovering vulnerabilities is an information-intensive task that requires a developer to locate the defects in the code that have security implications. The task is difficult due to the growing code complexity and some developer's lack of security expertise. Although tools have been created to ease the difficulty, no single one is sufficient. In practice, developers often use a combination of tools to uncover vulnerabilities. Yet, the basis on which different tools are composed is under explored. In this paper, we examine the composition base by taking advantage of the tool design patterns informed by foraging theory. We follow a design science methodology and carry out a three-step empirical study: mapping 34 foraging-theoretic patterns in a specific vulnerability discovery tool, formulating hypotheses about the value and cost of foraging when considering two composition scenarios, and performing a human-subject study to test the hypotheses. Our work offers insights into guiding developers' tool usage in detecting software vulnerabilities.","PeriodicalId":293393,"journal":{"name":"2021 IEEE 22nd International Conference on Information Reuse and Integration for Data Science (IRI)","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133244837","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}