R. Andrews, Fahame F. Emamjome, A. T. Ter Hofstede, H. Reijers
{"title":"Root-cause analysis of process-data quality problems","authors":"R. Andrews, Fahame F. Emamjome, A. T. Ter Hofstede, H. Reijers","doi":"10.1080/2573234X.2021.1947751","DOIUrl":"https://doi.org/10.1080/2573234X.2021.1947751","url":null,"abstract":"ABSTRACT Process mining provides analytical tools and methods which can distil insights about process behaviour from big process-related data. Yet challenges relating to the impact of poor quality data on event logs, the input to process mining analyses, remain. Despite researchers raising concerns about event log data quality, event log preparation is, in practice, generally handled mechanistically, focusing on fixing symptoms rather than on uncovering the root causes of event log data quality issues. To address this, we introduce the Odigos (Greek for “guide”) framework. Based on semiotics and Peircean abductive reasoning, the Odigos framework facilitates an informed way of dealing with data quality issues in event logs. Odigos supports both prognostic (foreshadowing potential quality issues) and diagnostic (identifying root causes of discovered quality issues) approaches. We examine in depth how the framework supports a detailed root-cause analysis of a well-known collection of event log imperfection patterns.","PeriodicalId":36417,"journal":{"name":"Journal of Business Analytics","volume":"16 1","pages":"51 - 75"},"PeriodicalIF":0.0,"publicationDate":"2021-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73523849","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kazim Topuz, Brett D. Jones, Sumeyra Sahbaz, Murad A. Moqbel
{"title":"Methodology to combine theoretical knowledge with a data-driven probabilistic graphical model","authors":"Kazim Topuz, Brett D. Jones, Sumeyra Sahbaz, Murad A. Moqbel","doi":"10.1080/2573234X.2021.1937351","DOIUrl":"https://doi.org/10.1080/2573234X.2021.1937351","url":null,"abstract":"ABSTRACT This study presents an analytic inference methodology using probabilistic modeling that provides faster decision-making and a better understanding of complex relations. Two educational psychology models (i.e., the MUSIC Model of Motivation and the domain identification model) were coupled with a data-driven Probabilistic Graphical Model to provide a top-down and bottom-up combination for reasoning. Using survey data from middle school students, Bayesian Network models captured the probabilistic interactions between students’ perceptions of their science class, their identification with science, and their science career goals. Complex/non-linear relationships among these variables revealed that students’ perceptions of their science class (i.e., eMpowerment, Usefulness, Success, Interest, and Caring) were significant predictors of their science-related career goals. These findings provide validity evidence for using the MUSIC and domain identification models and provide educators and school administrators with a web-based simulator to estimate the effect of students’ science class perceptions on their science identification and career goals.","PeriodicalId":36417,"journal":{"name":"Journal of Business Analytics","volume":"35 1","pages":"125 - 139"},"PeriodicalIF":0.0,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73368264","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Exposing model bias in machine learning revisiting the boy who cried wolf in the context of phishing detection","authors":"D. Chaojie, Anuj Gaurav","doi":"10.1080/2573234X.2021.1934128","DOIUrl":"https://doi.org/10.1080/2573234X.2021.1934128","url":null,"abstract":"ABSTRACT Grown out of the quest for artificial intelligence (AI), machine learning (ML) is today’s most active field across disciplines with a sharp increase in applications ranging from criminology to fraud detection and to biometrics. ML and statistics both emphasise model estimation/training and thus share the inescapable Type 1 and 2 errors. Extending the concept of statistical errors into the domain of ML, we devise a ground-breaking pH scale-like ratio and intend it as a litmus test indicator of ML model bias completely masked by the popular performance criterion of accuracy. Using publicly available phishing dataset, we conduct experiments on a series of classification models and consequently unravel the significant cost implications of models with varying levels of bias. Based on these results, we recommend practitioners exercise human judgement and match their own risk tolerance profile with the bias ratio associated with each ML model in order to guard against potential unintended adverse effects.","PeriodicalId":36417,"journal":{"name":"Journal of Business Analytics","volume":"18 1","pages":"171 - 178"},"PeriodicalIF":0.0,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89531939","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Balancing privacy rights and surveillance analytics: a decision process guide","authors":"D. Power, C. Heavin, Yvonne O’Connor","doi":"10.1080/2573234X.2021.1920856","DOIUrl":"https://doi.org/10.1080/2573234X.2021.1920856","url":null,"abstract":"ABSTRACT The right to privacy has been discussed by scholars in multiple disciplines, yet privacy issues are increasing due to technological advances and lower costs for organisations to adopt smart surveillance. Given the potential for misuse, it seems prudent for stakeholders to critically evaluate Surveillance Analytics (SA) innovations. To assist in balancing the issues arising from SA adoption and the implications for privacy, we review key terms and ethical frameworks. Further, we prescribe a two-by-two Surveillance, Privacy, and Ethical Decision (SPED) Process Guide. SPED recommends the use of one or more of three ethical frameworks, Consequence, Duty, and Virtue. The vertical axis in the SPED matrix is the sophistication of an organisation’s SA and the horizontal axis is an assessment of the current privacy level and the rights afforded to the target(s) of surveillance. The proposed decision process guide can assist senior managers and technologists in making decisions about adopting SA.","PeriodicalId":36417,"journal":{"name":"Journal of Business Analytics","volume":"9 1","pages":"155 - 170"},"PeriodicalIF":0.0,"publicationDate":"2021-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82064872","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Christoph Sager, Christian Janiesch, Patrick Zschech
{"title":"A survey of image labelling for computer vision applications","authors":"Christoph Sager, Christian Janiesch, Patrick Zschech","doi":"10.1080/2573234X.2021.1908861","DOIUrl":"https://doi.org/10.1080/2573234X.2021.1908861","url":null,"abstract":"ABSTRACT Supervised machine learning methods for image analysis require large amounts of labelled training data to solve computer vision problems. The recent rise of deep learning algorithms for recognising image content has led to the emergence of many ad-hoc labelling tools. With this survey, we capture and systematise the commonalities as well as the distinctions between existing image labelling software. We perform a structured literature review to compile the underlying concepts and features of image labelling software such as annotation expressiveness and degree of automation. We structure the manual labelling task by its organisation of work, user interface design options, and user support techniques to derive a systematisation schema for this survey. Applying it to available software and the body of literature, enabled us to uncover several application archetypes and key domains such as image retrieval or instance identification in healthcare or television.","PeriodicalId":36417,"journal":{"name":"Journal of Business Analytics","volume":"19 1","pages":"91 - 110"},"PeriodicalIF":0.0,"publicationDate":"2021-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82668843","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
E. Mortaz, Ali Dağ, L. Hutzler, C. Gharibo, Lisa Anzisi, J. Bosco
{"title":"Short-term prediction of opioid prescribing patterns for orthopaedic surgical procedures: a machine learning framework","authors":"E. Mortaz, Ali Dağ, L. Hutzler, C. Gharibo, Lisa Anzisi, J. Bosco","doi":"10.1080/2573234X.2021.1873078","DOIUrl":"https://doi.org/10.1080/2573234X.2021.1873078","url":null,"abstract":"ABSTRACT Overprescribing of opioids after surgical procedures can increase the risk of addiction in patients, and under prescribing can lead to poor quality of care. In this study, we propose a machine learning-based predictive framework to identify the varying effects of factors that are related to the opioid prescription amount after orthopaedic surgery. To predict the prescription classes, we train multiple classifiers combined with random and SMOTE over-sampling and weight-balancing techniques to cope with the imbalance state of the dataset. Our results show that the gradient boosting machines (XGB) with SMOTE achieve the highest classification accuracy. Our proposed analytical framework can be employed to assist and therefore, enable the surgeons to determine the timely changing effects of these variables.","PeriodicalId":36417,"journal":{"name":"Journal of Business Analytics","volume":"4 1","pages":"1 - 13"},"PeriodicalIF":0.0,"publicationDate":"2021-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75112932","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Using Scouting Reports Text To Predict NCAA → NBA Performance","authors":"Philip Z. Maymin","doi":"10.1080/2573234X.2021.1873077","DOIUrl":"https://doi.org/10.1080/2573234X.2021.1873077","url":null,"abstract":"ABSTRACT Draft decisions by National Basketball Association (NBA) teams are notoriously poor. Analytics can help but are often dismissed for being too overfit, complex, risky, and incomplete. To address these concerns, we train separate leave-one-out random forests machine learning models for each collegiate NBA prospect from 2006 through 2019 with a conservative utility function on a novel comprehensive dataset including the raw text of scouting reports, combine measurements, on-court stats, mock draft placements, and more. Despite being unable to draft high school or international players, the resulting model outperforms the actual decisions of all but one NBA team, with an average gain of $100 million. Target shuffling shows that the model does not overfit and feature shuffling shows that handedness and ESPN mock draft rating, but not other mock drafts, are most important. NBA teams may be missing value by not following a disciplined, model-driven, prescriptive analytics approach to decision making.","PeriodicalId":36417,"journal":{"name":"Journal of Business Analytics","volume":"12 1","pages":"40 - 54"},"PeriodicalIF":0.0,"publicationDate":"2021-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78907326","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Classifying insincere questions on Question Answering (QA) websites: meta-textual features and word embedding","authors":"M. Al-Ramahi, I. Alsmadi","doi":"10.1080/2573234X.2021.1895681","DOIUrl":"https://doi.org/10.1080/2573234X.2021.1895681","url":null,"abstract":"ABSTRACT The power of information and information exchange defines the current Internet and Online Social Networks (OSNs). With such power and influence, individuals and entities expose those networks to different types of false information. This paper proposes several classification models based on Quora insincere questions; a dataset released by Kaggle. We evaluated several models including word embeddings based on meta and word-level features. Best results were achieved using the BERT transformer with an overall accuracy of more than 95% on several individual classifiers. Overall, results indicated that the meta-textual features are important predictors for whether a question is sincere or not. In one implication, we noticed that users are putting more cognitive efforts into writing more readable sincere questions compared to insincere questions. Moreover, a dictionary is assembled from several explicit dictionaries and significant words selected from Quora questions. The dictionary showed a good performance in predicting insincere questions.","PeriodicalId":36417,"journal":{"name":"Journal of Business Analytics","volume":"30 1","pages":"55 - 66"},"PeriodicalIF":0.0,"publicationDate":"2021-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83845999","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Time Series Analysis","authors":"W. Paczkowski","doi":"10.1007/978-3-030-87023-2_7","DOIUrl":"https://doi.org/10.1007/978-3-030-87023-2_7","url":null,"abstract":"","PeriodicalId":36417,"journal":{"name":"Journal of Business Analytics","volume":"71 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85898412","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Classification with Supervised Learning Methods","authors":"W. Paczkowski","doi":"10.1007/978-3-030-87023-2_11","DOIUrl":"https://doi.org/10.1007/978-3-030-87023-2_11","url":null,"abstract":"","PeriodicalId":36417,"journal":{"name":"Journal of Business Analytics","volume":"38 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91367525","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}