{"title":"A Novel Deep Learning Based Model for Classification of Rice Leaf Diseases","authors":"A. Bhattacharya","doi":"10.1109/SweDS53855.2021.9638278","DOIUrl":"https://doi.org/10.1109/SweDS53855.2021.9638278","url":null,"abstract":"Rice is the primary source of food for a vast population worldwide, especially for most Asian countries. Diseases in rice leaves can have disastrous outcomes and cause massive losses in the agricultural sector. Thus, there is a need for early automatic detection of rice leaf diseases. Many methods have been proposed before in order to solve this task which involves the use of deep learning because of its good results. In this work, a novel transfer learning-based model has been suggested for the automatic classification of 5 different classes of diseases. DenseNet 201 has been used as the base model with weights from ImageNet. Instead of assigning random weights, the weights from the pre-trained network have been set but the layers have been trained from scratch on the given dataset in order to produce results. The proposed deep learning-based model shows better performance than the other existing state-of-the-art algorithms by achieving the training accuracy of 97.04 % and an accuracy of 95.44 % on the validation dataset respectively. Although the dataset has noises present and no effective preprocessing steps were done, the model performed quite well. This work provides a new method for deep learning-based classification of rice diseases.","PeriodicalId":194514,"journal":{"name":"2021 Swedish Workshop on Data Science (SweDS)","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122999479","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Classifying Fake and Real Neurally Generated News","authors":"Anitha Govindaraju, J. Griffith","doi":"10.1109/SweDS53855.2021.9638268","DOIUrl":"https://doi.org/10.1109/SweDS53855.2021.9638268","url":null,"abstract":"In this data era, with Natural Language Processing (NLP) techniques such as “Language Modelling” showing great progress, it is observed that the idea of “Automated Journalism” i.e., generating news articles using computer programs based on existing news headlines, or the body of a news article, is emerging. Such advancements not only lead to progress but also to certain disadvantages. Specifically, adversaries are using these techniques to create fake news articles called “Neural fake news”. Such news imitates the style and appearance of real news to generate targeted propaganda which is used to confuse people. Humans find this neural fake news to be more trustworthy than human- written disinformation [1]. The goal of this research is to classify various types of neurally generated news as real or fake based on its genuineness. In a real world scenario, humans evaluate the genuineness of news by relying on a model of the world, i.e., evaluating whether the content in the news is the same as the content from a reliable news source (e.g., Associated Press). In this work we use a Recurrent Neural Network (RNN), specifically a Siamese Bi-directional LSTM (BiLSTM), to act as a Semantic Textual Similarity (STS) model which compares the real news with neural news to determine whether it is fake or not. In order to train and test the model, 3 datasets have been created: One containing real news extracted from a common crawl; the second comprises a neural fake news dataset generated using language modelling techniques; the third comprises a neural real news dataset generated using textual data augmentation techniques. It is found that the Siamese BiLSTM model can accurately find the similarity scores between real news and neural news to allow the neural news to be classified as neural real or neural fake news.","PeriodicalId":194514,"journal":{"name":"2021 Swedish Workshop on Data Science (SweDS)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128361596","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SweDS 2021 Cover Page","authors":"","doi":"10.1109/sweds53855.2021.9637714","DOIUrl":"https://doi.org/10.1109/sweds53855.2021.9637714","url":null,"abstract":"","PeriodicalId":194514,"journal":{"name":"2021 Swedish Workshop on Data Science (SweDS)","volume":"46 5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129766882","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"DeepGRASS: Graph, Sequence and Scaled Embeddings on large scale transactions data","authors":"Mahesh Balan Umaithanu, Vignesh Ravichandran, M. Rohith Srinivaas, Venkat Subramanian Selvaraj","doi":"10.1109/SweDS53855.2021.9638270","DOIUrl":"https://doi.org/10.1109/SweDS53855.2021.9638270","url":null,"abstract":"Representation learning has redefined large scale data mining applications. The high dimensional embeddings learn complex associations that transcend the human cognitive understanding and have achieved great success in different business applications that encounter the curse of dimensionality, including fin-tech. Different algorithms learn embeddings that capture different types of associations, and it would be useful to learn embeddings that holistically learn multi-dimensional associations. In this paper, we propose DeepGRASS – an algorithm that embeds financial transactions using graph and sequence-based topologies. Our results show that these embeddings learn associations that are very comprehensive, holistic, and multi-dimensional.We deploy DeepGRASS in PayPal, and train it on multitude of transaction data with multi-dimensional features. The algorithm is two-fold: it embeds a bipartite graph with customer and merchant nodes and parallelly learns sequential associations using historical transactions along with other transactional features. These embeddings are then scaled and combined to learn multidimensional associations. We tested this on different predictive applications and find that the learning is generic and shows benchmarking performance in different predictive contexts. Based on offline metrics, back-tests, and sensitivity analysis on offline transaction data, we find very strong evidence to suggest that these embeddings provide the highest AUC score in predictive applications, highest co-efficient of determination in explaining variance and the features explain different types of associations. To our knowledge, this is the first application of embeddings that learn both graph and sequence-based associations on large scale financial transaction data and paves the way for a new generation of feature engineering in fin-tech.","PeriodicalId":194514,"journal":{"name":"2021 Swedish Workshop on Data Science (SweDS)","volume":"18 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123508086","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Self-Similarity of Twitter Users","authors":"M. Fatemi, K. Kucher, M. Laitinen, P. Fränti","doi":"10.1109/SweDS53855.2021.9638288","DOIUrl":"https://doi.org/10.1109/SweDS53855.2021.9638288","url":null,"abstract":"Earlier studies have established that the (perceived) similarity of users is highly subjective and reflects more on how people respect/admire others rather than their characteristics or behavioral similarities. We study this phenomenon among Twitter users, and while confirm that it is indeed the case, we further explore the components of similarity by investigating it using data from three categories (interactions between egos and alters, profile-based activity history, and linguistic content in the messages). We use interactions as estimation for admiration and observe that it has more impact and a higher correlation to the perceived similarity than other objective measures, including similarity based on user profiles and their use of hashtags.","PeriodicalId":194514,"journal":{"name":"2021 Swedish Workshop on Data Science (SweDS)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116983682","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Machine Learning for Social Sciences: Stance Classification of User Messages on a Migrant-Critical Discussion Forum","authors":"Victoria Yantseva, K. Kucher","doi":"10.1109/SweDS53855.2021.9637718","DOIUrl":"https://doi.org/10.1109/SweDS53855.2021.9637718","url":null,"abstract":"In this paper, we present our methodology for supervised stance classification of sparse and imbalanced social media data. We test our framework on a manually labeled dataset of 5700 messages about immigration in the Swedish language posted on the Flashback forum, a controversial online discussion platform. Our proposed approach currently achieves a macro- averaged F1-score of 0.72 for test data on a two-class problem compared against 0.27 for a baseline four-class model. Since effective classification of imbalanced and sparse textual data in under-resourced languages presents certain methodological challenges, our study contributes to a discussion on the best pathways to achieve highest model performance given the character of the data and unavailability of large training datasets for this task. Moreover, this work exemplifies the application of ML methodology to social media data, which can be particularly relevant for social scientists working in this area and interested in leveraging the possibilities of machine learning in their research field. This methodology and the obtained results provide a foundation for further in-depth analyses of social media texts in the Swedish language following a data-driven approach.","PeriodicalId":194514,"journal":{"name":"2021 Swedish Workshop on Data Science (SweDS)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126439480","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SBGTool: Similarity-Based Grouping Tool for Students’ Learning Outcomes","authors":"Zeynab Mohseni, R. M. Martins, Italo Masiello","doi":"10.1109/SweDS53855.2021.9638263","DOIUrl":"https://doi.org/10.1109/SweDS53855.2021.9638263","url":null,"abstract":"With the help of Visual Learning Analytics (VLA) tools, teachers can construct meaningful groups of students that can, for example, collaborate and be engaged in productive discussions. However, finding similar samples in large educational databases requires effective similarity measures that capture the teacher’s intent. In this paper we propose a web-based VLA tool called Similarity-Based Grouping (SBGTool), to assist teachers in categorizing students into different groups based on their similar learning outcomes and activities. By using SBGTool, teachers may compare individual students by considering the number of answers (correct and incorrect) in different question categories and time ranges, find the most difficult question categories considering the percentage of similarity to the correct answers, determine the degree of similarity and dissimilarity across students, and find the relationship between students’ activity and success. To demonstrate the tool’s efficacy, we used 10,000 random samples from the EdNet dataset, a large-scale hierarchical educational dataset consisting of student-system interactions from multiple platforms, at university level, collected over a period of two years. The results point to the conclusion that the tool is efficient, can be adapted to different learning domains, and has the potential to assist teachers in maximizing the collaborative learning potential in their classrooms.","PeriodicalId":194514,"journal":{"name":"2021 Swedish Workshop on Data Science (SweDS)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116973273","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kenneth Lewenhagen, Martin Boldt, Anton Borg, Manne Gerell, Johan Dahlén
{"title":"An Interdisciplinary Web-based Framework for Data-driven Placement Analysis of CCTV Cameras","authors":"Kenneth Lewenhagen, Martin Boldt, Anton Borg, Manne Gerell, Johan Dahlén","doi":"10.1109/SweDS53855.2021.9637719","DOIUrl":"https://doi.org/10.1109/SweDS53855.2021.9637719","url":null,"abstract":"This paper describes work in progress of an interdisciplinary research project that focuses on the placement and analysis of public close-circuit television (CCTV) cameras using data-driven analysis of crime data. A novel web-based prototype that acts as a framework for the camera placement analysis with regards to historical crime occurrence is presented. The web-based prototype enables various analyses involving public CCTV cameras e.g., to determine suitable locations for both stationary CCTV cameras as well as temporary cameras that are moved around after a few months to address crime seasonality. The framework also opens up for other analyses, e.g. automatically highlighting crimes that are carried out closed by at least one camera. The research also investigates to what extent it is possible to generate estimates on the amount of detail captured by a camera given the distance to the crime light conditions. The research project includes interdisciplinary competences from various areas such as criminology, computer and data science as well as the Swedish Police.","PeriodicalId":194514,"journal":{"name":"2021 Swedish Workshop on Data Science (SweDS)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114469205","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sundarakrishnan Ganesh, Tobias Ohlsson, Francis Palma
{"title":"Predicting Security Vulnerabilities using Source Code Metrics","authors":"Sundarakrishnan Ganesh, Tobias Ohlsson, Francis Palma","doi":"10.1109/SweDS53855.2021.9638301","DOIUrl":"https://doi.org/10.1109/SweDS53855.2021.9638301","url":null,"abstract":"Large open-source systems generate and operate on a plethora of sensitive enterprise data. Thus, security threats or vulnerabilities must not be present in open-source systems and must be resolved as early as possible in the development phases to avoid catastrophic consequences. One way to recognize security vulnerabilities is to predict them while developers write code to minimize costs and resources. This study examines the effectiveness of machine learning algorithms to predict potential security vulnerabilities by analyzing the source code of a system. We obtained the security vulnerabilities dataset from Apache Tomcat security reports for version 4.x to 10.x. We also collected the source code of Apache Tomcat 4.x to 10.x to compute 43 object-oriented metrics. We assessed four traditional supervised learning algorithms, i.e., Naive Bayes (NB), Decision Tree (DT), K-Nearest Neighbors (KNN), and Logistic Regression (LR), to understand their efficacy in predicting security vulnerabilities. We obtained the highest accuracy of 80.6% using the KNN. Thus, the KNN classifier was demonstrated to be the most effective of all the models we built. The DT classifier also performed well but under-performed when it came to multi-class classification.","PeriodicalId":194514,"journal":{"name":"2021 Swedish Workshop on Data Science (SweDS)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128742936","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Piotr Tomaszewski, Shuyuan Yu, M. Borg, Jerk Ronnols
{"title":"Machine Learning-Assisted Analysis of Small Angle X-ray Scattering","authors":"Piotr Tomaszewski, Shuyuan Yu, M. Borg, Jerk Ronnols","doi":"10.1109/SweDS53855.2021.9638297","DOIUrl":"https://doi.org/10.1109/SweDS53855.2021.9638297","url":null,"abstract":"Small angle X-ray scattering (SAXS) is extensively used in materials science as a way of examining nanostructures. The analysis of experimental SAXS data involves mapping a rather simple data format to a vast amount of structural models. Despite various scientific computing tools to assist the model selection, the activity heavily relies on the SAXS analysts’ experience, which is recognized as an efficiency bottleneck by the community. To cope with this decision-making problem, we develop and evaluate the open-source, Machine Learning-based tool SCAN (SCattering Ai aNalysis) to provide recommendations on model selection. SCAN exploits multiple machine learning algorithms and uses models and a simulation tool implemented in the SasView package for generating a well defined set of datasets. Our evaluation shows that SCAN delivers an overall accuracy of 95%-97%. The XGBoost Classifier has been identified as the most accurate method with a good balance between accuracy and training time. With eleven predefined structural models for common nanostructures and an easy draw-drop function to expand the number and types training models, SCAN can accelerate the SAXS data analysis workflow.","PeriodicalId":194514,"journal":{"name":"2021 Swedish Workshop on Data Science (SweDS)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125275971","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}