{"title":"Guest Editor’s Introduction: COVID-19 and Data Science","authors":"Aihua Li","doi":"10.1007/s40745-022-00443-3","DOIUrl":"10.1007/s40745-022-00443-3","url":null,"abstract":"","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50491699","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Santosh Kumar Bharti, Rajeev Kumar Gupta, Samir Patel, Manan Shah
{"title":"Context-Based Bigram Model for POS Tagging in Hindi: A Heuristic Approach","authors":"Santosh Kumar Bharti, Rajeev Kumar Gupta, Samir Patel, Manan Shah","doi":"10.1007/s40745-022-00434-4","DOIUrl":"10.1007/s40745-022-00434-4","url":null,"abstract":"<div><p>In the domain of natural language processing, part-of-speech (POS) tagging is the most important task. It plays a vital role in applications like sentiment analysis, text summarization, opinion mining, <i>etc</i>. POS tagging is a process of assigning POS information (noun, pronoun, verb, <i>etc.</i>) to the given word. This information is considered in the context of their relationship with the surrounding words. Hindi is very popular language in countries like India, Nepal, United States, Mauritius, <i>etc</i>. Majority of Indians are accustomed to Hindi for reading and writing. They also use Hindi for writing on social media such as <i>Twitter, Facebook, WhatsApp</i>, <i>etc.</i> POS tagging is the most important phase to analyze these Hindi text from social media. The text scripted in Hindi is ambiguous in nature and rich in morphology. It makes identification of POS information challenging. In this article, a heuristic based approach is proposed for identifying POS information. The proposed method deployed a context-based bigram model that create a bigram sequence based on the relationship with the adjacent words. Subsequently, it selects the most likelihood POS information for a word based on both the forward and reverse bigram sequences. The experimental result of the proposed heuristic approach is compared with existing state-of-the-art techniques like <i>hidden Markov model, decision tree, conditional random fields, support vector machine, neural network, and recurrent neural networks</i>. Finally, it is observe that the proposed heuristic approach for POS tagging in Hindi outperforms the existing techniques and attains an accuracy of 94.3%.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47952117","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Laion L. Boaventura, Rosemeire L. Fiaccone, Paulo H. Ferreira
{"title":"Prediction Control Charts: A New and Flexible Artificial Intelligence-Based Statistical Process Control Approach","authors":"Laion L. Boaventura, Rosemeire L. Fiaccone, Paulo H. Ferreira","doi":"10.1007/s40745-022-00441-5","DOIUrl":"10.1007/s40745-022-00441-5","url":null,"abstract":"<div><p>Statistical techniques allow assertive and controlled studies of projects, processes and products, aiding in management decision-making. Statistical Process Control (SPC) is one of the most important and powerful statistical tools for measuring, monitoring and improving the quality of processes and products. Adopting Artificial Intelligence (AI) has recently gained increasing attention in the SPC literature. This paper presents a combined use of SPC and AI techniques, which results in a novel and efficient process monitoring tool. The proposed prediction control chart, which we call pred-chart, may be regarded as a more robust and flexible alternative (given that it adopts the median behavior of the process) to traditional SPC tools. Besides its ability to recognize patterns and diagnose anomalies in the data, regardless of the sample scenario, this innovative approach is capable of performing its monitoring functions also on a large scale, predicting market scenarios and processes on massive amounts of data. The performance of the pred-chart is evaluated by the average run length (ARL) computed through Monte Carlo simulation studies. Two real data sets (small and medium sets) are also used to illustrate the applicability and usefulness of the proposed control chart for prediction of continuous outcomes.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44406273","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Automatic Merging of Scopus and Web of Science Data for Simplified and Effective Bibliometric Analysis","authors":"HimaJyothi Kasaraneni, Salini Rosaline","doi":"10.1007/s40745-022-00438-0","DOIUrl":"10.1007/s40745-022-00438-0","url":null,"abstract":"<div><p>The desideratum of organizing and synthesizing the rising corpus of publications has prompted an escalation in bibliometric studies. Bibliometric analysis is an essential statistical tool that ascertains critical information for identifying research prospects for researchers. Besides, it acts as evidence to support scientific findings. Researchers primarily use either Scopus or Web of Science (WoS) databases for conducting bibliometric analysis. The individual usage of these databases in the bibliometric analysis does not achieve the desired outcome, which requires the merging of these two databases. There are several manual processes defined in the literature for merging Scopus and WoS data. However, all these manual procedures consume more time and may lead to an inaccurate merging of the databases, as they often involve human errors due to difficulty in data scrutinization. Hence, to avoid the manual process, this paper proposes an automatic process for merging Scopus and WoS data. To understand the importance of the proposed process, a small (40 records) and large (2344 records) dataset cases are considered on which both the manual and automatic processes are implemented. From the simulation results, it is observed that the proposed process consumed 0.4497659 s on small dataset and 1.715981 s on large dataset for merging process. Thus, it can be said that the proposed automatic merging process is an effective and time-saving approach that significantly reduces human effort and the risk of committing an error. The outcome of this process is a merged dataset that includes unique data of both Scopus and WoS databases.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43304293","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Azadeh Najjar, Abbas Allami, Samira Dodangeh, Mohammad Mahdi Daei
{"title":"The effect of coronavirus infection on QT and QTc intervals of hospitalized patients in Qazvin, Iran","authors":"Azadeh Najjar, Abbas Allami, Samira Dodangeh, Mohammad Mahdi Daei","doi":"10.1007/s40745-022-00425-5","DOIUrl":"10.1007/s40745-022-00425-5","url":null,"abstract":"<div><p>Electrocardiographic (ECG) changes have been investigated in the condition of coronavirus disease (COVID-19) indicating that COVID-19 infection exacerbates arrhythmias and triggers conduction abnormalities. However, the specific type of ECG abnormalities in COVID-19 and their impact on mortality fail to have been fully elucidated. The present retrospective, tertiary care hospital-based cross-sectional study was conducted by reviewing the medical records of all patients diagnosed with COVID-19 infection who were admitted to Booali Sina Hospital in Qazvin, Iran from March to July 2020. Demographic information, length of hospital stay, treatment outcome, and electrocardiographic information (heart rate, QTc interval, arrhythmias, and blocks) were extracted from the medical records of the patients. Finally, a total of 231 patients were enrolled in the study. Atrial fibrillation was a common arrhythmia, and the left anterior fascicular block was a common cardiac conduction defect other than sinus arrhythmia. The deceased patients were significantly older than the recovered ones (71 ± 14 vs. 57 ± 16 years, p < 0.001). Longer hospital stay (p = 0.036), non-sinus rhythm (p < 0.001), bundle and node blocks (p = 0.002), ST-T waves changes (p = 0.003), and Tachycardia (p = 0.024) were significantly prevalent in the deceased group. In baseline ECGs, no significant difference was observed in terms of the absolute size of QT; however, a prolonged QTc in the deceased was about twice of the recovered patients (using Bazett, Sagie, and Fridericia’s formula). Serial ECGs are recommended to be taken from all hospitalized patients with COVID-19 due to increased in-hospital mortality in patients with prolonged QTc interval, non-sinus rhythms, ST-T changes, tachycardia, and bundle, and node blocks.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44940511","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"High Resolution Solar Image Generation Using Generative Adversarial Networks","authors":"Ankan Dash, Junyi Ye, Guiling Wang, Huiran Jin","doi":"10.1007/s40745-022-00436-2","DOIUrl":"10.1007/s40745-022-00436-2","url":null,"abstract":"<div><p>We applied Deep Learning algorithm known as Generative Adversarial Networks (GANs) to perform solar image-to-image translation. That is, from Solar Dynamics Observatory (SDO)/Helioseismic and Magnetic Imager (HMI) line of sight magnetogram images to SDO/Atmospheric Imaging Assembly (AIA) 0304-Å images. The Ultraviolet (UV)/Extreme Ultraviolet observations like the SDO/AIA 0304-Å images were only made available to scientists in the late 1990s even though the magnetic field observations like the SDO/HMI have been available since the 1970s. Therefore, by leveraging Deep Learning algorithms like GANs we can give scientists access to complete datasets for analysis. For generating high resolution solar images, we use the Pix2PixHD and Pix2Pix algorithms. The Pix2PixHD algorithm was specifically designed for high resolution image generation tasks, and the Pix2Pix algorithm is by far the most widely used image to image translation algorithm. For training and testing we used the data for the year 2012, 2013 and 2014. After model training, we evaluated the model on the test data. The results show that our deep learning models are capable of generating high resolution (1024 × 1024 pixels) SDO/AIA0304 images from SDO/HMI line of sight magnetograms. Specifically, the pixel-to-pixel Pearson Correlation Coefficient of the images generated by Pix2PixHD and original images is as high as 0.99. The number is 0.962 if Pix2Pix is used to generate images. The results we get for our Pix2PixHD model is better than the results obtained by previous works done by others to generate SDO/AIA 0304 images. Thus, we can use these models to generate AIA0304 images when the AIA0304 data is not available which can be used for understanding space weather and giving researchers the capability to predict solar events such as Solar Flares and Coronal Mass Ejections. As far as we know, our work is the first attempt to leverage Pix2PixHD algorithm for SDO/HMI to SDO/AIA0304 image-to-image translation.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82842673","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Large Scopus Data Sets and Its Analysis for Decision Making","authors":"Prem Kumar Singh","doi":"10.1007/s40745-022-00435-3","DOIUrl":"10.1007/s40745-022-00435-3","url":null,"abstract":"<div><p>Recently several authors paid attention towards Scopus Data analysis for intellectual measurement of institutes or authors. It is well known that the SCOPUS contains more than 34,346 peer reviewed Journals from different subjects with 3 lakh conferences. It is difficult to measure the performance or expertise of any institute or author in the given domain for admission, job, and ranking or other decision making process. The reason is several manipulations started in document and citation count via strategic authors or institute which can be measured via average author publications, number of funding, collaborations and retracted papers. It is happening due to rogue editor or business strategic of educationalist for profit. However these types of misconduct impacts lot to real researcher which forces brain drain. To resolve this issue, the current paper provides a way to measure the intellectual achievement of an institute or author based on several metrics. The proposed method is illustrated using the SCOPUS data sets and it’s metric for critical understanding.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43428665","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Forecasting Directional Movement of Stock Prices using Deep Learning","authors":"Deeksha Chandola, Akshit Mehta, Shikha Singh, Vinay Anand Tikkiwal, Himanshu Agrawal","doi":"10.1007/s40745-022-00432-6","DOIUrl":"10.1007/s40745-022-00432-6","url":null,"abstract":"<div><p>Stock market’s volatile and complex nature makes it difficult to predict the market situation. Deep Learning is capable of simulating and analyzing complex patterns in unstructured data. Deep learning models have applications in image recognition, speech recognition, natural language processing (NLP), and many more. Its application in stock market prediction is gaining attention because of its capacity to handle large datasets and data mapping with accurate prediction. However, most methods ignore the impact of mass media on the company’s stock and investors’ behaviours. This work proposes a hybrid deep learning model combining Word2Vec and long short-term memory (LSTM) algorithms. The main objective is to design an intelligent tool to forecast the directional movement of stock market prices based on financial time series and news headlines as inputs. The binary predicted output obtained using the proposed model would aid investors in making better decisions. The effectiveness of the proposed model is assessed in terms of accuracy of the prediction of directional movement of stock prices of five companies from different sectors of operation.\u0000\u0000</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43566318","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Generalized Gompertz Distribution with Hazard Power Parameter and Its Bivariate Extension: Properties and Applications","authors":"Hiba Zeyada Muhammed","doi":"10.1007/s40745-022-00420-w","DOIUrl":"10.1007/s40745-022-00420-w","url":null,"abstract":"<div><p>Recently, a new class of distributions, named bivariate hazard power parameter family of distributions is introduced. In this paper, a generalized Gompertz distribution is introduced as a member of this family in both univariate and bivariate cases. Different properties are discussed as moments and moment generating function. It is observed that the joint probability density function and the joint survival function can be expressed in explicit forms. Maximum likelihood estimation is considered for the model unknown parameters. Asymptotic confidence intervals for the unknown parameters are evaluated. Some simulations have been performed to see the performances of the MLEs. Three real data sets are applied to this model for illustrative purposes.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45210829","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Monowar Wadud Hridoy, Mohammad Mizanur Rahman, Saadman Sakib
{"title":"A Framework for Industrial Inspection System using Deep Learning","authors":"Monowar Wadud Hridoy, Mohammad Mizanur Rahman, Saadman Sakib","doi":"10.1007/s40745-022-00437-1","DOIUrl":"10.1007/s40745-022-00437-1","url":null,"abstract":"<div><p>Industrial Inspection systems are an essential part of Industry 4.0. An automated inspection system can significantly improve product quality and reduce human labor while making their life easier. However, a deep learning-based camera inspection system requires a large amount of data to classify the defective products accurately. In this paper, a framework is proposed for an industrial inspection system with the help of deep learning. Additionally, A new dataset of hex-nut products is proposed containing 4000 images, i.e., 2000 defective and 2000 non-defective. Moreover, different CNN architectures, i.e., Custom CNN, Inception ResNet v2, Xception, ResNet 101 v2, ResNet 152 v2, are experimented with the concept of transfer learning on the new hex-nut dataset. Fine-tuning the CNN architectures is performed by freezing the last 14 layers, which provided the optimal architecture, i.e., Xception (last 14 layers trainable, excluding the fully connected layer). The proposed framework can efficiently separate the defective products from the non-defective products with 100% accuracy on the hex nut dataset. Furthermore, the proposed optimal Xception architecture has experimented on a publicly available casting material dataset which produced 99.72% accuracy, outperforming existing methods.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47230517","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}