Ahmed Sedky Eldeeb, Muhammad Ahsan-ul-Haq, Ayesha Babar
{"title":"A new discrete XLindley distribution: theory, actuarial measures, inference, and applications","authors":"Ahmed Sedky Eldeeb, Muhammad Ahsan-ul-Haq, Ayesha Babar","doi":"10.1007/s41060-023-00395-8","DOIUrl":"https://doi.org/10.1007/s41060-023-00395-8","url":null,"abstract":"","PeriodicalId":45667,"journal":{"name":"International Journal of Data Science and Analytics","volume":"2 1","pages":""},"PeriodicalIF":2.4,"publicationDate":"2023-05-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84524380","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Machine learning with big data to solve real-world problems","authors":"M. Rahmaty","doi":"10.59615/jda.2.1.9","DOIUrl":"https://doi.org/10.59615/jda.2.1.9","url":null,"abstract":"Machine learning algorithms use big data to learn future trends and predict them for businesses. Machine learning can be very efficient for deciphering data in industries where understanding consumer patterns can lead to big improvements. The use of machine learning can be a giant leap for businesses and cannot simply be integrated as the top layer. This requires redefining workflow, architecture, data collection and storage, analytics, and other modules. The magnitude of the system overhaul should be assessed and clearly communicated to the appropriate stakeholders. The main focus of machine learning is to develop computer programs that can access data and use it to learn. The learning process starts with observations or data, to find a pattern in the data and make better decisions. The main goal of data analysis using machine learning is that it allows the computer to learn automatically without human intervention and help and can adjust its actions accordingly. Considering the many applications that data analysis has found in the real world, therefore, in this article, a review of the basic applications of machine learning as one of the tools of artificial intelligence has been done with an emphasis on big data analysis. The purpose of this article is to understand the dimensions, components and applications, and challenges of using machine learning in the real world.","PeriodicalId":45667,"journal":{"name":"International Journal of Data Science and Analytics","volume":"1 1","pages":""},"PeriodicalIF":2.4,"publicationDate":"2023-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87588189","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Concepts and applications of data mining and analysis of social networks","authors":"Azam Hajiaghajani","doi":"10.59615/jda.2.1.1","DOIUrl":"https://doi.org/10.59615/jda.2.1.1","url":null,"abstract":"Social media has become an important reference for information during the last few decades. They have been able to be effective in various fields such as business, entertainment, science, crisis management, politics, etc. For this reason, a social media analysis has become very important for researchers and large companies. The widespread use of social media leads to a complex problem called \"accumulation of data\". Many data science specialists seek to analyze this data in order to identify the behavioral characteristics of users, analyze interests and needs, and improve marketing processes. Different social media platforms have the ability to use all kinds of media, including text data, video, video, audio, and location information, etc. Therefore, data analysis in social networks is very important. In this research, the concepts and applications of data analysis in social networks will be investigated.","PeriodicalId":45667,"journal":{"name":"International Journal of Data Science and Analytics","volume":"20 1","pages":""},"PeriodicalIF":2.4,"publicationDate":"2023-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82679483","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
P. Miasnikof, Alexander Y. Shestopaloff, A. Raigorodskii
{"title":"Statistical power, accuracy, reproducibility and robustness of a graph clusterability test","authors":"P. Miasnikof, Alexander Y. Shestopaloff, A. Raigorodskii","doi":"10.1007/s41060-023-00389-6","DOIUrl":"https://doi.org/10.1007/s41060-023-00389-6","url":null,"abstract":"","PeriodicalId":45667,"journal":{"name":"International Journal of Data Science and Analytics","volume":"13 1","pages":"379-390"},"PeriodicalIF":2.4,"publicationDate":"2023-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87684302","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
E. Hassannayebi, Ali Farjad, A. Azadnia, Mehrdad Javidi, R. Chunduri
{"title":"A data analytics framework for reliable bus arrival time prediction using artificial neural networks","authors":"E. Hassannayebi, Ali Farjad, A. Azadnia, Mehrdad Javidi, R. Chunduri","doi":"10.1007/s41060-023-00391-y","DOIUrl":"https://doi.org/10.1007/s41060-023-00391-y","url":null,"abstract":"","PeriodicalId":45667,"journal":{"name":"International Journal of Data Science and Analytics","volume":"23 1","pages":""},"PeriodicalIF":2.4,"publicationDate":"2023-04-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81239072","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nina Errey, Jie Liang, Tuck Wah Leong, Didar Zowghi
{"title":"Evaluating narrative visualization: a survey of practitioners.","authors":"Nina Errey, Jie Liang, Tuck Wah Leong, Didar Zowghi","doi":"10.1007/s41060-023-00394-9","DOIUrl":"10.1007/s41060-023-00394-9","url":null,"abstract":"<p><p>Narrative visualization is characterized by the integration of data visualization and storytelling techniques. These characteristics provide challenges in its evaluation. Little is known about how these evaluation challenges are addressed by narrative visualization practitioners. We surveyed experienced narrative visualization practitioners to investigate their methods of evaluation. To gain deeper insight we conducted a series of semi-structured interviews with practitioners. We found that there is usually an informal approach to narrative visualization evaluation, where practitioners rely on prior experience and their peers for evaluation. Our study also revealed novel approaches to evaluation. We introduce a practice-led heuristic framework to aid practitioners to evaluate narrative visualization systematically. Our practice-led heuristic framework couples first-hand practitioner experience with recent research literature. This work sheds light on how to address narrative visualization evaluation to better inform both academic research and practice.</p>","PeriodicalId":45667,"journal":{"name":"International Journal of Data Science and Analytics","volume":" ","pages":"1-16"},"PeriodicalIF":2.4,"publicationDate":"2023-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10064970/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10093964","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Identity2Vec: learning mesoscopic structural identity representations via Poisson probability metric","authors":"I. V. Oluigbo, H. Seba, Mohammed Haddad","doi":"10.1007/s41060-023-00390-z","DOIUrl":"https://doi.org/10.1007/s41060-023-00390-z","url":null,"abstract":"","PeriodicalId":45667,"journal":{"name":"International Journal of Data Science and Analytics","volume":"97 1","pages":""},"PeriodicalIF":2.4,"publicationDate":"2023-03-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77463197","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A data science approach to risk assessment for automobile insurance policies","authors":"Patrick Hosein","doi":"10.1007/s41060-023-00392-x","DOIUrl":"https://doi.org/10.1007/s41060-023-00392-x","url":null,"abstract":"In order to determine a suitable automobile insurance policy premium, one needs to take into account three factors: the risk associated with the drivers and cars on the policy, the operational costs associated with management of the policy and the desired profit margin. The premium should then be some function of these three values. We focus on risk assessment using a data science approach. Instead of using the traditional frequency and severity metrics, we instead predict the total claims that will be made by a new customer using historical data of current and past policies. Given multiple features of the policy (age and gender of drivers, value of car, previous accidents, etc.), one can potentially try to provide personalized insurance policies based specifically on these features as follows. We can compute the average claims made per year of all past and current policies with identical features and then take an average over these claim rates. Unfortunately there may not be sufficient samples to obtain a robust average. We can instead try to include policies that are “similar” to obtain sufficient samples for a robust average. We therefore face a trade-off between personalization (only using closely similar policies) and robustness (extending the domain far enough to capture sufficient samples). This is known as the bias–variance trade-off. We model this problem and determine the optimal trade-off between the two (i.e., the balance that provides the highest prediction accuracy) and apply it to the claim rate prediction problem. We demonstrate our approach using real data.","PeriodicalId":45667,"journal":{"name":"International Journal of Data Science and Analytics","volume":"120 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136196365","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Privacy preserving cold-start recommendation for out-of-matrix users via content baskets","authors":"Michael Sun, Andrew Wang","doi":"10.1007/s41060-023-00388-7","DOIUrl":"https://doi.org/10.1007/s41060-023-00388-7","url":null,"abstract":"","PeriodicalId":45667,"journal":{"name":"International Journal of Data Science and Analytics","volume":"14 1","pages":"1-17"},"PeriodicalIF":2.4,"publicationDate":"2023-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82039816","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fake news detection: deep semantic representation with enhanced feature engineering.","authors":"Mohammadreza Samadi, Saeedeh Momtazi","doi":"10.1007/s41060-023-00387-8","DOIUrl":"10.1007/s41060-023-00387-8","url":null,"abstract":"<p><p>Due to the widespread use of social media, people are exposed to fake news and misinformation. Spreading fake news has adverse effects on both the general public and governments. This issue motivated researchers to utilize advanced natural language processing concepts to detect such misinformation in social media. Despite the recent research studies that only focused on semantic features extracted by deep contextualized text representation models, we aim to show that content-based feature engineering can enhance the semantic models in a complex task like fake news detection. These features can provide valuable information from different aspects of input texts and assist our neural classifier in detecting fake and real news more accurately than using semantic features. To substantiate the effectiveness of feature engineering besides semantic features, we proposed a deep neural architecture in which three parallel convolutional neural network (CNN) layers extract semantic features from contextual representation vectors. Then, semantic and content-based features are fed to a fully connected layer. We evaluated our model on an English dataset about the COVID-19 pandemic and a domain-independent Persian fake news dataset (TAJ). Our experiments on the English COVID-19 dataset show 4.16% and 4.02% improvement in accuracy and f1-score, respectively, compared to the baseline model, which does not benefit from the content-based features. We also achieved 2.01% and 0.69% improvement in accuracy and f1-score, respectively, compared to the state-of-the-art results reported by Shifath et al. (A transformer based approach for fighting covid-19 fake news, arXiv preprint arXiv:2101.12027, 2021). Our model outperformed the baseline on the TAJ dataset by improving accuracy and f1-score metrics by 1.89% and 1.74%, respectively. The model also shows 2.13% and 1.6% improvement in accuracy and f1-score, respectively, compared to the state-of-the-art model proposed by Samadi et al. (ACM Trans Asian Low-Resour Lang Inf Process, https://doi.org/10.1145/3472620, 2021).</p>","PeriodicalId":45667,"journal":{"name":"International Journal of Data Science and Analytics","volume":" ","pages":"1-12"},"PeriodicalIF":3.4,"publicationDate":"2023-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9998010/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10075360","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}