{"title":"A survey study of success factors in data science projects","authors":"Iñigo Martinez, Elisabeth Viles, Igor G. Olaizola","doi":"arxiv-2201.06310","DOIUrl":"https://doi.org/arxiv-2201.06310","url":null,"abstract":"In recent years, the data science community has pursued excellence and made\u0000significant research efforts to develop advanced analytics, focusing on solving\u0000technical problems at the expense of organizational and socio-technical\u0000challenges. According to previous surveys on the state of data science project\u0000management, there is a significant gap between technical and organizational\u0000processes. In this article we present new empirical data from a survey to 237\u0000data science professionals on the use of project management methodologies for\u0000data science. We provide additional profiling of the survey respondents' roles\u0000and their priorities when executing data science projects. Based on this survey\u0000study, the main findings are: (1) Agile data science lifecycle is the most\u0000widely used framework, but only 25% of the survey participants state to follow\u0000a data science project methodology. (2) The most important success factors are\u0000precisely describing stakeholders' needs, communicating the results to\u0000end-users, and team collaboration and coordination. (3) Professionals who\u0000adhere to a project methodology place greater emphasis on the project's\u0000potential risks and pitfalls, version control, the deployment pipeline to\u0000production, and data security and privacy.","PeriodicalId":501533,"journal":{"name":"arXiv - CS - General Literature","volume":"9 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138544608","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Data Science in Perspective","authors":"Rogerio Rossi","doi":"arxiv-2201.05852","DOIUrl":"https://doi.org/arxiv-2201.05852","url":null,"abstract":"Data and Science has stood out in the generation of results, whether in the\u0000projects of the scientific domain or business domain. CERN Project, Scientific\u0000Institutes, companies like Walmart, Google, Apple, among others, need data to\u0000present their results and make predictions in the competitive data world. Data\u0000and Science are words that together culminated in a globally recognized term\u0000called Data Science. Data Science is in its initial phase, possibly being part\u0000of formal sciences and also being presented as part of applied sciences,\u0000capable of generating value and supporting decision making. Data Science\u0000considers science and, consequently, the scientific method to promote decision\u0000making through data intelligence. In many cases, the application of the method\u0000(or part of it) is considered in Data Science projects in scientific domain\u0000(social sciences, bioinformatics, geospatial projects) or business domain\u0000(finance, logistic, retail), among others. In this sense, this article\u0000addresses the perspectives of Data Science as a multidisciplinary area,\u0000considering science and the scientific method, and its formal structure which\u0000integrate Statistics, Computer Science, and Business Science, also taking into\u0000account Artificial Intelligence, emphasizing Machine Learning, among others.\u0000The article also deals with the perspective of applied Data Science, since Data\u0000Science is used for generating value through scientific and business projects.\u0000Data Science persona is also discussed in the article, concerning the education\u0000of Data Science professionals and its corresponding profiles, since its\u0000projection changes the field of data in the world.","PeriodicalId":501533,"journal":{"name":"arXiv - CS - General Literature","volume":"114 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138544611","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Data science to investigate temperature profiles of large networks of food refrigeration systems","authors":"Corneliu Arsene","doi":"arxiv-2201.02046","DOIUrl":"https://doi.org/arxiv-2201.02046","url":null,"abstract":"The electrical generation and transmission infrastructures of many countries\u0000are under increased pressure. This partially reflects the move towards low\u0000carbon economies and the increased reliance on renewable power generation\u0000systems. There has been a reduction in the use of traditional fossil fuel\u0000generation systems, which provide a stable base load, and this has been\u0000replaced with more unpredictable renewable generation. As a consequence, the\u0000available load on the grid is becoming more unstable. To cope with this\u0000variability, the UK National Grid has placed emphasis on the investigation of\u0000various technical mechanisms (e.g. implementation of smart grids, energy\u0000storage technologies, auxiliary power sources), which may be able to prevent\u0000critical situations, when the grid may become sometimes unstable. The\u0000successful implementation of these mechanisms may require large numbers of\u0000electrical consumers (e.g. HVAC systems, food refrigeration systems) for\u0000example to make additional investments in energy storage technologies (food\u0000refrigeration systems) or to integrate their electrical demand from industrial\u0000processes into the National Grid (HVAC systems). However, in the situation of\u0000food refrigeration systems, during these critical situations, even if the\u0000thermal inertia within refrigeration systems may maintain effective performance\u0000of the device for a short period of time (e.g. under 1 minute) when the\u0000electrical input load into the system is reduced, this still carries the\u0000paramount risk of food safety even for very short periods of time (e.g. under 1\u0000minute). Therefore before considering any future actions (e.g. investing in\u0000energy storage technologies) to prevent the critical situations when grid\u0000becomes unstable, it is also needed to understand during the normal use how the\u0000temperature profiles evolve along the time inside these massive networks of\u0000food refrigeration systems.","PeriodicalId":501533,"journal":{"name":"arXiv - CS - General Literature","volume":"29 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138544610","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Harmonic numbers as the summation of integrals","authors":"N. Karjanto","doi":"arxiv-2112.00257","DOIUrl":"https://doi.org/arxiv-2112.00257","url":null,"abstract":"Harmonic numbers arise from the truncation of the harmonic series. The\u0000$n^text{th}$ harmonic number is the sum of the reciprocals of each positive\u0000integer up to $n$. In addition to briefly introducing the properties of\u0000harmonic numbers, we cover harmonic numbers as the summation of integrals that\u0000involve the product of exponential and hyperbolic secant functions. The proof\u0000is relatively simple since it only comprises the Principle of Mathematical\u0000Induction and integration by parts.","PeriodicalId":501533,"journal":{"name":"arXiv - CS - General Literature","volume":"2 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138544605","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Thorsten Wittkopp, Philipp Wiesner, Dominik Scheinert, Odej Kao
{"title":"A Taxonomy of Anomalies in Log Data","authors":"Thorsten Wittkopp, Philipp Wiesner, Dominik Scheinert, Odej Kao","doi":"arxiv-2111.13462","DOIUrl":"https://doi.org/arxiv-2111.13462","url":null,"abstract":"Log data anomaly detection is a core component in the area of artificial\u0000intelligence for IT operations. However, the large amount of existing methods\u0000makes it hard to choose the right approach for a specific system. A better\u0000understanding of different kinds of anomalies, and which algorithms are\u0000suitable for detecting them, would support researchers and IT operators.\u0000Although a common taxonomy for anomalies already exists, it has not yet been\u0000applied specifically to log data, pointing out the characteristics and\u0000peculiarities in this domain. In this paper, we present a taxonomy for different kinds of log data\u0000anomalies and introduce a method for analyzing such anomalies in labeled\u0000datasets. We applied our taxonomy to the three common benchmark datasets\u0000Thunderbird, Spirit, and BGL, and trained five state-of-the-art unsupervised\u0000anomaly detection algorithms to evaluate their performance in detecting\u0000different kinds of anomalies. Our results show, that the most common anomaly\u0000type is also the easiest to predict. Moreover, deep learning-based approaches\u0000outperform data mining-based approaches in all anomaly types, but especially\u0000when it comes to detecting contextual anomalies.","PeriodicalId":501533,"journal":{"name":"arXiv - CS - General Literature","volume":"13 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138544607","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Review on Analysis and Visualization Methods for Biclustering","authors":"Melih Sozdinler","doi":"arxiv-2111.12154","DOIUrl":"https://doi.org/arxiv-2111.12154","url":null,"abstract":"Recently, biclustering is one of the hot topics in bioinformatics and takes\u0000the attention of authors from several different disciplines. Hence, many\u0000different methodologies from a variety of disciplines are proposed as a\u0000solution to the biclustering problem. As a consequence of this issue, a variety\u0000of solutions makes it harder to evaluate the proposed methods. With this review\u0000paper, we are aimed to discuss both analysis and visualization of biclustering\u0000as a guide for the comparisons between brand new and existing biclustering\u0000algorithms. Additionally, we concentrate on the tools that provide\u0000visualizations with accompanied analysis techniques. Through the paper, we give\u0000several references that are also a short review of the state of the art for the\u0000ones who will pursue research on biclustering. The Paper outline is as follows;\u0000we first give the visualization and analysis methods, then we evaluate each\u0000proposed tool with the visualization contribution and analysis options,\u0000finally, we discuss future directions for biclustering and we propose standards\u0000for future work.","PeriodicalId":501533,"journal":{"name":"arXiv - CS - General Literature","volume":"64 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-11-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138544304","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
C. H. Bryan Liu, Ângelo Cardoso, Paul Couturier, Emma J. McCoy
{"title":"Datasets for Online Controlled Experiments","authors":"C. H. Bryan Liu, Ângelo Cardoso, Paul Couturier, Emma J. McCoy","doi":"arxiv-2111.10198","DOIUrl":"https://doi.org/arxiv-2111.10198","url":null,"abstract":"Online Controlled Experiments (OCE) are the gold standard to measure impact\u0000and guide decisions for digital products and services. Despite many\u0000methodological advances in this area, the scarcity of public datasets and the\u0000lack of a systematic review and categorization hinder its development. We\u0000present the first survey and taxonomy for OCE datasets, which highlight the\u0000lack of a public dataset to support the design and running of experiments with\u0000adaptive stopping, an increasingly popular approach to enable quickly deploying\u0000improvements or rolling back degrading changes. We release the first such\u0000dataset, containing daily checkpoints of decision metrics from multiple, real\u0000experiments run on a global e-commerce platform. The dataset design is guided\u0000by a broader discussion on data requirements for common statistical tests used\u0000in digital experimentation. We demonstrate how to use the dataset in the\u0000adaptive stopping scenario using sequential and Bayesian hypothesis tests and\u0000learn the relevant parameters for each approach.","PeriodicalId":501533,"journal":{"name":"arXiv - CS - General Literature","volume":"5 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138544303","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"State of the Art of Augmented Reality (AR) Capabilities for Civil Infrastructure Applications","authors":"Jiaqi Xu, Derek Doyle, Fernando Moreu","doi":"arxiv-2110.08698","DOIUrl":"https://doi.org/arxiv-2110.08698","url":null,"abstract":"Augmented Reality (AR) is a technology superimposing interactional virtual\u0000objects onto a real environment. Since the beginning of the millennium, AR\u0000technologies have shown rapid growth, with significant research publications in\u0000engineering and science. However, the civil infrastructure community has\u0000minimally implemented AR technologies to date. One of the challenges that civil\u0000engineers face when understanding and using AR is the lack of a classification\u0000of AR in the context of capabilities for civil infrastructure applications.\u0000Practitioners in civil infrastructure, like most engineering fields, prioritize\u0000understanding the level of maturity of a new technology before considering its\u0000adoption and field implementation. This paper compares the capabilities of\u0000sixteen AR Head-Mounted Devices (HMDs) available in the market since 2017,\u0000ranking them in terms of performance for civil infrastructure implementations.\u0000Finally, the authors recommend a development framework for practical AR\u0000interfaces with civil infrastructure and operations.","PeriodicalId":501533,"journal":{"name":"arXiv - CS - General Literature","volume":"61 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138544604","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Towards a Theory of Bullshit Visualization","authors":"Michael Correll","doi":"arxiv-2109.12975","DOIUrl":"https://doi.org/arxiv-2109.12975","url":null,"abstract":"In this unhinged rant, I lay out my suspicion that a lot of visualizations\u0000are bullshit: charts that do not have even the common decency to intentionally\u0000lie but are totally unconcerned about the state of the world or any practical\u0000utility. I suspect that bullshit charts take up a large fraction of the time\u0000and attention of actual visualization producers and consumers, and yet are\u0000seemingly absent from academic research into visualization design.","PeriodicalId":501533,"journal":{"name":"arXiv - CS - General Literature","volume":"38 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138544302","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yichen Tang, Jerry J. Zhang, Paul M. Corballis, Luke E. Hallum
{"title":"Towards the Classification of Error-Related Potentials using Riemannian Geometry","authors":"Yichen Tang, Jerry J. Zhang, Paul M. Corballis, Luke E. Hallum","doi":"arxiv-2109.13085","DOIUrl":"https://doi.org/arxiv-2109.13085","url":null,"abstract":"The error-related potential (ErrP) is an event-related potential (ERP) evoked\u0000by an experimental participant's recognition of an error during task\u0000performance. ErrPs, originally described by cognitive psychologists, have been\u0000adopted for use in brain-computer interfaces (BCIs) for the detection and\u0000correction of errors, and the online refinement of decoding algorithms.\u0000Riemannian geometry-based feature extraction and classification is a new\u0000approach to BCI which shows good performance in a range of experimental\u0000paradigms, but has yet to be applied to the classification of ErrPs. Here, we\u0000describe an experiment that elicited ErrPs in seven normal participants\u0000performing a visual discrimination task. Audio feedback was provided on each\u0000trial. We used multi-channel electroencephalogram (EEG) recordings to classify\u0000ErrPs (success/failure), comparing a Riemannian geometry-based method to a\u0000traditional approach that computes time-point features. Overall, the Riemannian\u0000approach outperformed the traditional approach (78.2% versus 75.9% accuracy, p\u0000< 0.05); this difference was statistically significant (p < 0.05) in three of\u0000seven participants. These results indicate that the Riemannian approach better\u0000captured the features from feedback-elicited ErrPs, and may have application in\u0000BCI for error detection and correction.","PeriodicalId":501533,"journal":{"name":"arXiv - CS - General Literature","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138544606","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}