Amal Algefes, Nouf Aldossari, Fatma Masmoudi, Elham Kariri
{"title":"A Text-mining approach for crime tweets in Saudi Arabia: From analysis to prediction","authors":"Amal Algefes, Nouf Aldossari, Fatma Masmoudi, Elham Kariri","doi":"10.1109/CDMA54072.2022.00023","DOIUrl":"https://doi.org/10.1109/CDMA54072.2022.00023","url":null,"abstract":"Social networks have proven to be a massive hub for investigating contextual and individual behavior of people. Most recently micro-blogging sites like Twitter are indicating to researchers that their content can be aggregated and used to effectively predict forecast, and infer outcomes of real-world events. The crime-related tweets analysis research in Saudi Arabia set off with an ultimate goal of gathering a deeper understanding of what kinds of criminal weapons are people frequently talking about. In this paper, we aim at dealing with tweets mentioning different weapons, analyzing them to gather facts such as annual variation of percentage tweets mentioning different weapons, recognizing the impact of events such as the Covid-19 pandemic on crime social discussions. In the following step, we develop a number of classifiers to predict which weapon is mentioned in a tweet. In order to perform our tasks, the Python programming language is used in the majority of the cases.","PeriodicalId":313042,"journal":{"name":"2022 7th International Conference on Data Science and Machine Learning Applications (CDMA)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123799570","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Deep Learning for Classifying of White Blood Cancer","authors":"Asad Ullah, Tufail Muhammad","doi":"10.1109/CDMA54072.2022.00043","DOIUrl":"https://doi.org/10.1109/CDMA54072.2022.00043","url":null,"abstract":"Automated classification of cells is an essential but challenging task for computer vision with significant biomedical advantages. Numerous studies have attempted to construct a cell classifier based on artificial intelligence using label-free cellular images obtained from an optical microscope in recent years. While these studies showed promising results, different cell types' biological complexity could not be represented by such classifiers. However, it is well-known that intracellular actin filaments are significantly modified in terms of the malignant cell. This is believed to be closely linked to tumor cells' distinctive growth characteristics, their tendency to invade tissues around them, and metastasize. It is also more beneficial to identify various cell types based on their biological activities using an automated technique. This paper shows the differentiation between normal White Blood Cells and cancer, which can provide new knowledge on malignant changes and be used as an additional diagnostic marker. Since human eyes can not observe the features, we proposed the application of a convolutional neural network (CNN) based on malignant and normal WBCs classification. The Inception- V3Cnn model was validated on various WBCs normal and malignant cell images on regular normal and blood cancer cell lines with differing aggression levels. The study showed that CNN performed better in accuracy and efficiency than a human expert in the cell classification system","PeriodicalId":313042,"journal":{"name":"2022 7th International Conference on Data Science and Machine Learning Applications (CDMA)","volume":"255 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115592210","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Students Personality Assessment using Deep Learning from University Admission Statement of Purpose","authors":"Salma Kulsoom, Seemab Latif, T. Saba, R. Latif","doi":"10.1109/CDMA54072.2022.00042","DOIUrl":"https://doi.org/10.1109/CDMA54072.2022.00042","url":null,"abstract":"Statement of Purpose (SOP) plays a vital role in the university admissions process as reviewers assess the personality of the students by reading their SOPs. In past, the Big Five personality traits of the students are assessed to predict their future academic performance. An exciting application of machine learning is the personality assessment using personality traits and behavior. In this paper, our focus is on developing a deep learning-based personality assessment model for the detection of Big Five Personality traits from SOP and mapping them to speculate a student's academic performance at the university. Our proposed model uses Long-Short Term Memory (LSTM), Convolutional Neural Network (CNN) and Bi-Directional LSTM (Bi- LSTM) architectures to extract features and predict ratios of Big Five traits in the SOP. The proposed model has been trained and tested on an essays' dataset and 400 students' SOP collected from computer science undergraduate students. Maximum accuracy achieved for essays dataset is 88.2 % and for student's personal statement is 67.0 % with FastText Embedding.","PeriodicalId":313042,"journal":{"name":"2022 7th International Conference on Data Science and Machine Learning Applications (CDMA)","volume":"16 6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116851947","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
N. Mahmoodian, Harshita Thadesar, Marilena Georgiades, M. Pech, C. Hoeschen
{"title":"Liver Texture Classification on CT Images of Microwave Ablation Therapy","authors":"N. Mahmoodian, Harshita Thadesar, Marilena Georgiades, M. Pech, C. Hoeschen","doi":"10.1109/CDMA54072.2022.00028","DOIUrl":"https://doi.org/10.1109/CDMA54072.2022.00028","url":null,"abstract":"Microwave ablation (MWA) therapy with image guidance by computed tomography (CT) is used for liver tumor destruction. However, because of the noise and therefore low contrast, CT images are not good enough for therapy control and need additional magnetic resonance imaging after the ther-apy. The ablation process itself is facing two significant chal-lenges: Firstly insufficient tumor ablation, which leads to tumor recurrence. Secondary, total ablated area was significantly larger than the tumor size which causes damaging of healthy tissue. To minimize the impact, it is crucial for the radiologist to perform the therapy well to prevent tumor recurrence. Therefore, it is essential to differentiate among healthy, tumor, and ablated tissue textures in the CT scan images. This research contributes to the understanding of tissue characterization for the reduction of the recurrence rate. In this regard, four machine-learning (ML) algorithms of Naive-Bayesian, Logistic-Regression, Decision-Tree, and Random-Forest were employed for liver tissues classification. In this paper, we propose higher order spectral particularly bispectrum analysis for extracting features from the CT images. Then classifiers were trained by ten new features extracted from the bispectrum analysis. For that, the images were divided into small patches, they were labeled as healthy, tumor, and ablated tissue. A maximum accuracy of 90.5% was obtained. The approach shows that the bispectral analysis provides valuable information that can be used during the MWA therapy for tissue characterization of CT scan even in the presence of noise.","PeriodicalId":313042,"journal":{"name":"2022 7th International Conference on Data Science and Machine Learning Applications (CDMA)","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130110989","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Supervised Multi-tree XGBoost Model for an Earlier COVID-19 Diagnosis Based on Clinical Symptoms","authors":"A. H. Syed, Tabrej Khan","doi":"10.1109/CDMA54072.2022.00041","DOIUrl":"https://doi.org/10.1109/CDMA54072.2022.00041","url":null,"abstract":"Efficient screening of Severe Acute Respiratory Syndrome Corona Virus 2 (SARS-CoV-2) enables quick and efficient diagnosis of SARS-CoV-2 and can mitigate the burden on healthcare systems. The aim was to assist the medical team globally in triaging incoming patients, especially in countries with limited healthcare infrastructure. In this context, the features with imminent infection risk (Test Indication, Fever, and Headache) were obtained using a multi-tree XGBoost algorithm. Based on their feature importance, the top three clinically relevant earlier clinical symptoms (attributes) were employed to create a Multi-tree XGBoost-based model for an earlier prediction of SARS-CoV-2. Overall, our Multi-tree XGBoost model predicted SARS-CoV-2 infection status with a high F1-score (0.9920 $pm boldsymbol{0.008)}$ and AUC value (0. 9974 ± 0.0026) only by assessing the primary three clinical symptoms related to COVID-19 infection. Thus our multi-tree XGBoost - based model suggests a simple and accurate method for earlier detection of SARS-CoV-2 cases and initiating proper treatment protocol for SARS-CoV-2 positive patients. Therefore, we can conclude that our model will allow the health organizations to potentially reduce the infection rate and mortality in masses with COVID-19 infection and fatality due to SARS-CoV-2.","PeriodicalId":313042,"journal":{"name":"2022 7th International Conference on Data Science and Machine Learning Applications (CDMA)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127651403","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Olatunji, Aisha Alansari, Heba Alkhorasani, Meelaf Alsubaii, Rasha Sakloua, Reem Alzahrani, Yasmeen Alsaleem, Reem A. Alassaf, Mehwash Farooqui, M. I. B. Ahmed
{"title":"Machine Learning Based Preemptive Diagnosis of Lung Cancer Using Clinical Data","authors":"S. Olatunji, Aisha Alansari, Heba Alkhorasani, Meelaf Alsubaii, Rasha Sakloua, Reem Alzahrani, Yasmeen Alsaleem, Reem A. Alassaf, Mehwash Farooqui, M. I. B. Ahmed","doi":"10.1109/CDMA54072.2022.00024","DOIUrl":"https://doi.org/10.1109/CDMA54072.2022.00024","url":null,"abstract":"Lung cancer is a malignant disease that im-poses serious complications restricting patients from performing daily tasks in the early stages and eventu-ally cause their death. The prevalence of this disease has been highlighted by numerous statistics worldwide. The preemptive diagnosis of individuals with lung can-cer can enhance chances of prevention and treatment. Therefore, the purpose of this study is to predict lung cancer preemptively utilizing simple clinical and demo-graphical features obtained from the “data world” website. The experiment was conducted using Support Vector Machine (SVM), K-Nearest Neighbor (K-NN), and Logistic Regression (LR) classifiers. To improve models' accuracy, SMOTETomek was employed along with GridsearchCV to tune hyperparameters. The Re-cursive Feature Elimination method was also utilized to find the best feature subset. Results indicated that SVM achieved the best performance with 98.33% recall, 96.72% precision, and an accuracy of 97.27% using 15 attributes.","PeriodicalId":313042,"journal":{"name":"2022 7th International Conference on Data Science and Machine Learning Applications (CDMA)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115997576","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Improve the Accuracy of Students Admission at Universities Using Machine Learning Techniques","authors":"Basem Assiri, M. Bashraheel, Ala Alsuri","doi":"10.1109/CDMA54072.2022.00026","DOIUrl":"https://doi.org/10.1109/CDMA54072.2022.00026","url":null,"abstract":"The advancement of technology contributes in the development of many field of life. One of the major fields to focus on is the field of higher education. Actually, Saudi's universities provide free education to the students, so large number of students apply to the universities. In response to that, universities usually maintain admission policies. Universities' admission policies and procedures focus on students Grade Point Average in high school (GPAH), General Aptitude Test (GAT) and Achievement Test (AT). In fact, guiding students to the suitable major improves students' achievements and success. This paper studies the admission criteria for universities in Saudi Arabia. This paper investigates the hidden details that lies behind students' GP AH, GAT and AT. Those details influence the process of students' major selection at universities. Indeed, this research uses machine learning models to include more features such as the grades of high school courses to predict the suitable majors for the students. We use K-Nearest Neighbor (KNN), Decision Tree (DT) and Support Vector Machine (SVM) to classify students into suitable majors. This process enhances the enrollments of applicants in appropriate majors. Furthermore, the experiments show that KNN gives the highest accuracy rate as it reaches 100%, while DT's accuracy rate is 81 % and SVM's accuracy rate is 75%.","PeriodicalId":313042,"journal":{"name":"2022 7th International Conference on Data Science and Machine Learning Applications (CDMA)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128002484","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"iReader: An Intelligent Reader System for the Visually Impaired","authors":"J. G, A. Azar, B. Qureshi, Nashwa Ahmad Kamal","doi":"10.1109/CDMA54072.2022.00036","DOIUrl":"https://doi.org/10.1109/CDMA54072.2022.00036","url":null,"abstract":"For visually impaired persons, it is quite difficult to read printed text. Non-visual forms of reading materials, such as Braille, are available as Blind Aiding Technology amoung many others. In recent times, many devices and assistive equipment have been developed and technologies made available to assist visually impaired persons with reading. Most of these research works and products support reading from printed text-based manuscripts only. Due to this limitation, it may not be possible for a visually impaired person to describe and comprehend a printed image. In this paper, we develop iReader, an Intelligent Reader system that not only helps a visually impaired reader to read but also vocally describes an image available in the printed text. The Convolution Neural Network (CNN) is employed to collect features from the printed image and its caption. The Long Short- Term Memory (LSTM) network is used to train the model for describing the image data. The resulting data is sent as a voice message using Text- To-Speech to be read out loud to the user. The efficiency of the LSTM model is examined using the ResNet50 and VGG16. The experimental results show that the LSTM-based training model delivers the best prediction of a picture's description with an accuracy of 83","PeriodicalId":313042,"journal":{"name":"2022 7th International Conference on Data Science and Machine Learning Applications (CDMA)","volume":"25 8","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133650869","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Machine Learning Algorithms for Detection of Noisy/Artifact-Corrupted Epochs of Visual Oddball Paradigm ERP Data","authors":"Rafia Akhter, F. Beyette","doi":"10.1109/CDMA54072.2022.00033","DOIUrl":"https://doi.org/10.1109/CDMA54072.2022.00033","url":null,"abstract":"Electroencephalography (EEG) is a non-invasive monitoring method that tracks and records the neural activities of the brain. The time-locked capture of the EEG to the external stimuli is known as Event-Related Potential (ERP) and it can help elucidate how the brain responds to the stimuli. In general, EEG is an uneven mixture of neural and non-neural sources of activities and these non-neural (non-EEG) signals produce artifacts in the EEG that can decrease the SNR in experiments and may lead to erroneous conclusions about the effects of experimental manipulation. Thus, it is very important to remove artifacts from the recorded EEG prior to analysis. The most common artifacts impacting ERPs are eye-blink, eye-movement, and body-movement. These artifacts-corrupted data can be removed by visual inspection or by computer-automated signal processing methods. While these methods are suitable for post-processing of collected ERP applications, they not well-suited for real-time processing of continuous ERP data. This project seeks to address the challenges associated with real-time identification of artifacts by introducing a machine learning model that can screen ERP, detect and reject artifact-corrupted data epochs prior to signal analysis. In addition to enabling real-time pre-processing of streaming ERP data, the DBScan machine-learning methods explored here can provide up to 90% accuracy in the identification of artifacts-mixed ERP epochs. As a result, the findings of this study will help to improve the signal quality of ERP trials and will enable ERP to be used as a biomarker in real-world applications where streaming EEG data collection and analysis are required.","PeriodicalId":313042,"journal":{"name":"2022 7th International Conference on Data Science and Machine Learning Applications (CDMA)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115475503","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Asma Z. Yamani, Klemens Katterbauer, A. Alshehri, A. Marsala, Rabah A. Al-Zaidy
{"title":"Denoising Electromagnatic Surveys Using LSTMs","authors":"Asma Z. Yamani, Klemens Katterbauer, A. Alshehri, A. Marsala, Rabah A. Al-Zaidy","doi":"10.1109/CDMA54072.2022.00018","DOIUrl":"https://doi.org/10.1109/CDMA54072.2022.00018","url":null,"abstract":"Resistivity readings obtained from electromagnetic crosswell surveys provide insight for reservoir water saturation prediction. Although high resistivity values should map to low water saturation and vice versa, in many cases the readings may not be consistent with this correlation. This is due to factors that add noise to the resistivity reading, such as the borehole effect and the salinity of the injected water. Here, we attempt to treat the resistivity reading to negatively correlate with water saturation, enhancing the accuracy and interperability of water saturation prediction models. We utilize the resistivity readings from locations further from sources of noise to correct the inconsistencies in the resistivity readings using a Long-Short Term Memory (LSTM) Neural Network approach. Our results demonstrate that by addressing noisy inconsistencies in the data, the performance of the water saturation model increases in terms of R2 from 0.62 to 0.70. Moreover, upon deploying model interpretation method, namely, SHAP TreeExplainer, we show that the resistivity-based features in the water saturation prediction model posses higher importance values than before the enhancement, in comparison with porosity features.","PeriodicalId":313042,"journal":{"name":"2022 7th International Conference on Data Science and Machine Learning Applications (CDMA)","volume":"4 15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130563832","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}