2022 International Conference on Data Science and Its Applications (ICoDSA)最新文献

筛选
英文 中文
Separating Hate Speech from Abusive Language on Indonesian Twitter 区分印尼推特上的仇恨言论和辱骂语言
2022 International Conference on Data Science and Its Applications (ICoDSA) Pub Date : 2022-07-06 DOI: 10.1109/ICoDSA55874.2022.9862850
Muhammad Amien Ibrahim, Noviyanti Tri Maretta Sagala, S. Arifin, R. Nariswari, N. Murnaka, P. W. Prasetyo
{"title":"Separating Hate Speech from Abusive Language on Indonesian Twitter","authors":"Muhammad Amien Ibrahim, Noviyanti Tri Maretta Sagala, S. Arifin, R. Nariswari, N. Murnaka, P. W. Prasetyo","doi":"10.1109/ICoDSA55874.2022.9862850","DOIUrl":"https://doi.org/10.1109/ICoDSA55874.2022.9862850","url":null,"abstract":"Social media is an effective tool for connecting with people and distributing information. However, many people often use social media to spread hate speech and abusive languages. In contrast to hate speech, abusive languages are frequently used as jokes with no purpose of offending individuals or groups, even though they may contain profanities. As a result, the distinction between hate speech and abusive language is often blurred. In many cases, individuals who spread hate speech may be prosecuted as it has legal implications. Previous research has focused on binary classification of hate speech and normal tweets. This study aims to classify hate speech, abusive language, and normal messages on Indonesian Twitter. Several machine learning models, such as logistic regression and BERT models, are utilized to accomplish text classification tasks. The model's performance is assessed using the F1-Score evaluation metric. The results show that BERT models outperform other models in terms of F1-Score, with the BERT-indobenchmark model, which was pretrained on social media text data, achieving the highest F1-Score of 85.59. This also demonstrates that pretraining the BERT model using social media data improves the classification model significantly. Developing such classification model that can distinguish between hate speech and abusive language would help individuals in preventing the spread of hate speech that has legal implications.","PeriodicalId":339135,"journal":{"name":"2022 International Conference on Data Science and Its Applications (ICoDSA)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124111137","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Caries Level Classification using K-Nearest Neighbor, Support Vector Machine, and Decision Tree using Zernike Moment Invariant Features 基于k近邻、支持向量机和Zernike矩不变特征的决策树的龋齿级分类
2022 International Conference on Data Science and Its Applications (ICoDSA) Pub Date : 2022-07-06 DOI: 10.1109/ICoDSA55874.2022.9862879
Y. Jusman, Muhammad Ahdan Fawwaz Nurkholid, Muhammad Fajrul Faiz, Sartika Puspita, Lady Olivia Evellyne, Kahfi Muhammad
{"title":"Caries Level Classification using K-Nearest Neighbor, Support Vector Machine, and Decision Tree using Zernike Moment Invariant Features","authors":"Y. Jusman, Muhammad Ahdan Fawwaz Nurkholid, Muhammad Fajrul Faiz, Sartika Puspita, Lady Olivia Evellyne, Kahfi Muhammad","doi":"10.1109/ICoDSA55874.2022.9862879","DOIUrl":"https://doi.org/10.1109/ICoDSA55874.2022.9862879","url":null,"abstract":"Dental caries is the most common disease and is reported as one of the oldest diseases. To avoid the occurrence of dental caries, there are four ways; maintaining oral hygiene, consuming healthy food, adequate fluoride and giving fracture sealers. Regular dental check-ups can also reduce the risk of developing this disease. In detecting this disease, dentists often fail. This failure was due to the inability to detect early enamel lesions that had not yet developed into cavitation. In this regard, new techniques were developed to help detect this disease. This method uses 10-folds cross validation. This cross validation divides 90% (1256 images) for the train data and 10% (132 images) for the test. In this research using the Zernike moment method for feature extraction. The average results of training accuracy are 94.55%, 84.24%, and 88.46% and the average results of training times are 0.74, 1.63, and 0.77 seconds for K-Nearest Neighbor (KNN), Support Vector Machine (SVM), and Decision Tree (DT), respectively. This research has obtained perfect performances of classification which are represented with AUC values more than 0.95 for each model.","PeriodicalId":339135,"journal":{"name":"2022 International Conference on Data Science and Its Applications (ICoDSA)","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134270827","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Deep Learning CNN Implementation on Packed Malware for Cloud Cross Domain Solution Filters 深度学习CNN在打包恶意软件的云跨域解决方案过滤器上的实现
2022 International Conference on Data Science and Its Applications (ICoDSA) Pub Date : 2022-07-06 DOI: 10.1109/ICoDSA55874.2022.9862936
Leonardo Aguilera, Doug Jacobson
{"title":"Deep Learning CNN Implementation on Packed Malware for Cloud Cross Domain Solution Filters","authors":"Leonardo Aguilera, Doug Jacobson","doi":"10.1109/ICoDSA55874.2022.9862936","DOIUrl":"https://doi.org/10.1109/ICoDSA55874.2022.9862936","url":null,"abstract":"This research focuses on Windows Portable Executable (PE) packed malware detection and Deep Learning (DL) using the Convolutional Neural Network (CNN) algorithm. Our primary goal is to improve the usage of DL techniques in Cybersecurity to strengthen the defenses against cyberattacks on U.S. Department of Defense (DoD) systems. According to our hypothesis, existing Cross Domain Solutions (CDSs) can be upgraded to include built-in DL-CNN algorithms for identifying well-crafted packed malware. To put this into perspective, implementing DL-CNN into the Cross Domain Solution (CDS) filter software will significantly enhance the effectiveness and detection of packed malware. CDSs are strategically positioned between unclassified and classified systems, and with DL-CNN capabilities, the CDS virus detection filter will learn to detect malware on its own, regardless of whether the malware is well-crafted, packed, or encrypted. Using our trained model, we were able to identify Windows packed PE malicious executables from Windows packed PE benign executables with an average training accuracy of 94 percent and a validation accuracy of 93 percent. Although the DL-CNN algorithm’s results could be enhanced through further development and refinement using KerasTuner, this research provides a solid foundation. Our experiments were conducted on our lab computer system and in the Amazon SageMaker Studio Lab and Google Collab cloud environments.","PeriodicalId":339135,"journal":{"name":"2022 International Conference on Data Science and Its Applications (ICoDSA)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125052180","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Filter-Based Feature Selection Method for Predicting Students’ Academic Performance 基于滤波器的特征选择方法预测学生学业成绩
2022 International Conference on Data Science and Its Applications (ICoDSA) Pub Date : 2022-07-06 DOI: 10.1109/ICoDSA55874.2022.9862883
Dafid, Ermatita
{"title":"Filter-Based Feature Selection Method for Predicting Students’ Academic Performance","authors":"Dafid, Ermatita","doi":"10.1109/ICoDSA55874.2022.9862883","DOIUrl":"https://doi.org/10.1109/ICoDSA55874.2022.9862883","url":null,"abstract":"Generally, almost all higher education often face the same problem of improving their quality according to students' academic performance. The need to get early information about the poor students' academic performance has forced higher education to find the best solution that the prediction model could achieve. Data mining offers various algorithms for predicting. Therefore, constructing an accurate prediction model becomes a challenging task for higher education. Two factors that drive the accuracy of the prediction model are classifiers and feature selection. Each classifier gives the best result if it meets the appropriate categorized data on a dataset. A few research has provided excellent results in predicting students' academic performance. But, the research only focuses on the classification technique rather than the right feature selection. Vice versa, a few research have reported excellent results increasing the prediction model accuracy. But the research only focuses on feature selection techniques rather than carrying out the right classifier on the right data. Therefore, the prediction model has not given the best accuracy yet. Unlike than existing framework to build a model and select the features ignoring the categorized data on a dataset, this research proposes the right filter-based feature selection methods and the right classifiers based on categorized data. The result will help the researcher find the best combination of filter-based feature selection methods and classifiers. Various classification algorithms and various feature selections that have been tested show classification with appropriate classifiers for specific categorized data and proper feature selection increase the prediction model's accuracy.","PeriodicalId":339135,"journal":{"name":"2022 International Conference on Data Science and Its Applications (ICoDSA)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126900794","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
ICoDSA 2022 Committee
2022 International Conference on Data Science and Its Applications (ICoDSA) Pub Date : 2022-07-06 DOI: 10.1109/icodsa55874.2022.9862823
{"title":"ICoDSA 2022 Committee","authors":"","doi":"10.1109/icodsa55874.2022.9862823","DOIUrl":"https://doi.org/10.1109/icodsa55874.2022.9862823","url":null,"abstract":"","PeriodicalId":339135,"journal":{"name":"2022 International Conference on Data Science and Its Applications (ICoDSA)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126092100","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Social Commerce from Seller and Region Perspective: A Data Mining for Indonesian E-commerce 卖家与区域视角下的社交商务:印尼电子商务的数据挖掘
2022 International Conference on Data Science and Its Applications (ICoDSA) Pub Date : 2022-07-06 DOI: 10.1109/ICoDSA55874.2022.9862835
Gunawan
{"title":"Social Commerce from Seller and Region Perspective: A Data Mining for Indonesian E-commerce","authors":"Gunawan","doi":"10.1109/ICoDSA55874.2022.9862835","DOIUrl":"https://doi.org/10.1109/ICoDSA55874.2022.9862835","url":null,"abstract":"As a subset of e-commerce, social commerce grows fast in many countries. Studies on social commerce primarily focused on consumer behavior, especially purchase intention. This study takes a different perspective by focusing on the e-commerce sellers as an aggregate in regions within a country. The study object is provinces in Indonesia, a country with the most prominent e-commerce and social commerce among Southeast Asian countries. The general objective of this study is to characterize social commerce firms across regions in Indonesia. The specific objectives are (1) to group provinces based on the e-commerce and social commerce-related variables and (2) to specify a group of provinces based on business and e-commerce profiles. This secondary and quantitative research adopts a data mining approach to analyze the official data from the BPS-Statistics Indonesia. The Cross-Industry Standard Process for Data Mining framework was adopted as a methodology and the Knime Analytics Platform as a computational software. The result classifies provinces into two: high and low social commerce. Provinces with high social commerce firms are characterized by younger entrepreneurs, more entrepreneurs with university backgrounds, newer e-commerce establishments, more fashion and beauty products, more resellers, and more revenue from the social media channel. Local governments might consider the finding to understand their province's position in the cluster and make policies to increase social commerce entrepreneurs.","PeriodicalId":339135,"journal":{"name":"2022 International Conference on Data Science and Its Applications (ICoDSA)","volume":"4193 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127571261","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Boosting Algorithm for Classifying Heart Disease Diagnose 心脏疾病诊断分类的增强算法
2022 International Conference on Data Science and Its Applications (ICoDSA) Pub Date : 2022-07-06 DOI: 10.1109/ICoDSA55874.2022.9862861
Patrik Gunti Pratama, Dedy Rahman Wijaya, Heru Nugroho, Rathimala Kannan
{"title":"Boosting Algorithm for Classifying Heart Disease Diagnose","authors":"Patrik Gunti Pratama, Dedy Rahman Wijaya, Heru Nugroho, Rathimala Kannan","doi":"10.1109/ICoDSA55874.2022.9862861","DOIUrl":"https://doi.org/10.1109/ICoDSA55874.2022.9862861","url":null,"abstract":"The heart is a component of the human body that is responsible for pumping blood and distributing oxygen throughout the body. Hospitals and doctors are still checking heart disease diagnoses manually at this time. However, this method is expensive and time-consuming. In this study, the Gradient Tree Boosting (GTB) algorithm was used to detect patients diagnosed with heart disease (disease and no disease). The purpose of the method is to provide convenience to obtain early information on heart health. With the dataset provided from the UCI Machine Learning Repository, there are 13 supporting features to detect heart disease with a total of 304 data. This study uses the GTB model with the best four parameters and utilizes feature selection which is used to classify. From the results of the study to get a recall score of 0.98, the proposed method succeeded in classifying patients who were diagnosed with heart disease correctly.","PeriodicalId":339135,"journal":{"name":"2022 International Conference on Data Science and Its Applications (ICoDSA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127631158","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Named Entity Recognition for Drone Forensic Using BERT and DistilBERT 基于BERT和DistilBERT的无人机取证命名实体识别
2022 International Conference on Data Science and Its Applications (ICoDSA) Pub Date : 2022-07-06 DOI: 10.1109/ICoDSA55874.2022.9862916
Swardiantara Silalahi, T. Ahmad, H. Studiawan
{"title":"Named Entity Recognition for Drone Forensic Using BERT and DistilBERT","authors":"Swardiantara Silalahi, T. Ahmad, H. Studiawan","doi":"10.1109/ICoDSA55874.2022.9862916","DOIUrl":"https://doi.org/10.1109/ICoDSA55874.2022.9862916","url":null,"abstract":"The increase in UAV usage and popularity in many fields opens new opportunities and challenges. Many business sectors are benefiting from the UAV device employment. The wide range of drone implementation is varied, from business purposes to crime. Hence, further mechanisms are needed to deal with drone crime and attacks both administratively and technically. From a technical view, the security protocol is needed to keep the drone safe from various logical or physical attacks. In case a drone experiences incidents, a forensic protocol is needed to perform analysis and investigation to uncover the incident, understand the attack behavior, and mitigate the incident risk. Among the existing drone forensic research efforts, there is limited attempt to utilize specific drone artifacts to perform forensic analysis. Therefore, this paper investigates the potential of NER (Named Entity Recognition) as an initial step to perform information extraction from drone flight logs data. We use Transformers-based techniques to perform NER and assist the forensic investigation. BERT and DistilBERT pre-trained models are fine-tuned using the annotated data and get the F1 scores of 98.63% and of 95.9%, respectively.","PeriodicalId":339135,"journal":{"name":"2022 International Conference on Data Science and Its Applications (ICoDSA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130947668","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Sellybot: Conversational Recommender System Based on Functional Requirements Sellybot:基于功能需求的会话推荐系统
2022 International Conference on Data Science and Its Applications (ICoDSA) Pub Date : 2022-07-06 DOI: 10.1109/ICoDSA55874.2022.9862908
Nurani Solechah, Z. Baizal, N. Ikhsan
{"title":"Sellybot: Conversational Recommender System Based on Functional Requirements","authors":"Nurani Solechah, Z. Baizal, N. Ikhsan","doi":"10.1109/ICoDSA55874.2022.9862908","DOIUrl":"https://doi.org/10.1109/ICoDSA55874.2022.9862908","url":null,"abstract":"Recently, high-tech products are very fast in issuing new types. For example, smartphones have various brands and types with different specifications. This condition triggers doubts among the public to buy the product due to limited knowledge about the technical specifications that suit their needs. Therefore, it is necessary to develop a recommender system based on product functional requirements. In our prior work, a Conversational Recommender System (CRS) has been developed to recommend smartphones based on high-level requirements (product functional requirements) by combining Navigation by Asking (NBA) and Navigation by Proposing (NBP). Thus, users who are unfamiliar with the technical features of the product can express their needs more easily. However, the system uses a dialog form, so users are still less flexible in expressing their needs. In this study, we further develop this research by building Sellybot, a CRS that uses natural language in its interactions with users. We built Sellybot using the RASA framework. Evaluation is done by observing the accuracy and user satisfaction. The evaluation results show that the system has an accuracy of 84.8% and for the questionnaire, it is found that 80.3% of users choose Sellybot, where users feel more flexible in using the system, and get a better experience.","PeriodicalId":339135,"journal":{"name":"2022 International Conference on Data Science and Its Applications (ICoDSA)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133692110","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evaluation of The Quality of The School Website using WEBUSE and IPA 用WEBUSE和IPA评价学校网站的质量
2022 International Conference on Data Science and Its Applications (ICoDSA) Pub Date : 2022-07-06 DOI: 10.1109/ICoDSA55874.2022.9862534
Eric Reynara Karoza, S. Widowati, Arfive Gandhi
{"title":"Evaluation of The Quality of The School Website using WEBUSE and IPA","authors":"Eric Reynara Karoza, S. Widowati, Arfive Gandhi","doi":"10.1109/ICoDSA55874.2022.9862534","DOIUrl":"https://doi.org/10.1109/ICoDSA55874.2022.9862534","url":null,"abstract":"A website is a collection of interconnected web pages that can be accessed by the public and share a single domain. Individuals, clubs, enterprises, and organizations can construct and maintain websites for a variety of objectives. The website offers an almost limitless number of options that may be used anywhere and at any time. One of them is in the field of education. SMA Harapan 1 Medan is one of the schools that use the internet as a source of information. However, according to the website administrator, the website (sma1.harapan.ac.id) continues to have issues with features, appearance, and insufficient information. The WEBUSE (Website Usability Evaluation Tools) method can be used to tackle the problem on the website. WEBUSE is a questionnaire-based usability evaluation approach for assessing the usability of a website. WEBUSE was chosen because website problems are categorized in WEBUSE, and it can assess usability across all types of websites and domains. The evaluation's findings will be examined using Importance Performance Analysis (IPA). The Importance Performance Analysis approach to determine the level of conformance and how satisfied users are with the website is also used to determine which parts need to be fixed or maintained. The results became insight to formulate improvement strategies. The analysis findings will be considered while creating the improvement plan.","PeriodicalId":339135,"journal":{"name":"2022 International Conference on Data Science and Its Applications (ICoDSA)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123882369","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信