Tarakashar Das, Sabrina Mobassirin, Syed Md. Minhaz Hossain, Aka Das, Anik Sen, Khaleque Md. Aashiq Kamal, Kaushik Deb
{"title":"Patient Questionnaires Based Parkinson’s Disease Classification Using Artificial Neural Network","authors":"Tarakashar Das, Sabrina Mobassirin, Syed Md. Minhaz Hossain, Aka Das, Anik Sen, Khaleque Md. Aashiq Kamal, Kaushik Deb","doi":"10.1007/s40745-023-00482-4","DOIUrl":"10.1007/s40745-023-00482-4","url":null,"abstract":"<div><p>Parkinson’s disease is one of the most prevalent and harmful neurodegenerative conditions (PD). Even today, PD diagnosis and monitoring remain pricy and inconvenient processes. With the unprecedented progress of artificial intelligence algorithms, there is an opportunity to develop a cost-effective system for diagnosing PD at an earlier stage. No permanent remedy has been established yet; however, an earlier diagnosis helps lead a better life. Probably, the three most responsible categories of symptoms for Parkinson’s Disease are tremors, rigidity, and body bradykinesia. Therefore, we investigate the 53 unique features of the Parkinson’s Progression Markers Initiative dataset to determine the significant symptoms, including three major categories. As feature selection is integral to developing a generalized model, we investigate including and excluding feature selection. Four feature selection methods are incorporated—low variance filter, Wilcoxon rank-sum test, principle component analysis, and Chi-square test. Furthermore, we utilize machine learning, ensemble learning, and artificial neural networks (ANN) for classification. Experimental evidence shows that not all symptoms are equally important, but no symptom can be completely eliminated. However, our proposed ANN model attains the best mean accuracy of 99.51%, 98.17% mean specificity, 0.9830 mean Kappa Score, 0.99 mean AUC, and 99.70% mean F1-score with all the features. The efficiency of our suggested technique on diverse data modalities is demonstrated by comparison with recent publications. Finally, we established a trade-off between classification time and accuracy.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s40745-023-00482-4.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46611876","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ishfaq S. Ahmad, Rameesa Jan, Poonam Nirwan, Peer Bilal Ahmad
{"title":"A New Class of Distribution Over Bounded Support and Its Associated Regression Model","authors":"Ishfaq S. Ahmad, Rameesa Jan, Poonam Nirwan, Peer Bilal Ahmad","doi":"10.1007/s40745-023-00483-3","DOIUrl":"10.1007/s40745-023-00483-3","url":null,"abstract":"<div><p>In this paper, a new two-parameter distribution over the bounded support (0,1) is introduced and studied in detail. Some of the interesting statistical properties like concavity, hazard rate function, mean residual life, moments and quantile function are discussed. The method of moments and maximum likelihood estimation methods are used to estimate unknown parameters of the proposed model. Besides, finite sample performance of estimation methods are evaluated through the Monte-Carlo simulation study. Application of the proposed distribution to the real data sets shows a better fit than many known two-parameter distributions on the unit interval. Moreover, a new regression model as an alternative to various unit interval regression models is introduced.\u0000</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44336464","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Inception-UDet: An Improved U-Net Architecture for Brain Tumor Segmentation","authors":"Ilyasse Aboussaleh, Jamal Riffi, Adnane Mohamed Mahraz, Hamid Tairi","doi":"10.1007/s40745-023-00480-6","DOIUrl":"10.1007/s40745-023-00480-6","url":null,"abstract":"<div><p>Brain tumor segmentation is an important field and a sensitive task in tumor diagnosis. The treatment research in this area has helped specialists in detecting the tumor’s location in order to deal with it in its early stages. Numerous methods based on deep learning, have been proposed, including the symmetric U-Net architectures, which revealed great results in the medical imaging field, precisely brain tumor segmentation. In this paper, we proposed an improved U-Net architecture called Inception U-Det inspired by U-Det. This work aims at employing the inception block instead of the convolution one used in the bi-directional feature pyramid neural (Bi-FPN) network during the skip connection U-Det phase. Furthermore, a comparison study has been performed between our proposed approach and the three known architectures in medical imaging segmentation; U-Net, DC-Unet, and U-Det. Several segmentation metrics have been computed and then taken into account in these methods, by means of the publicly available BraTS datasets. Thus, our obtained results have showed promising results in terms of accuracy, dice similarity coefficient (DSC), and intersection–union ratio (IOU). Moreover, the proposed method has achieved a DSC of 87.9%, 85.5%, and 83.9% on BraTS2020, BraTS2018, and BraTS2017, respectively, calculated from the best fold in fourfold cross-validation employed in the present approach.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45364813","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Stable Variable Selection for High-Dimensional Genomic Data with Strong Correlations","authors":"Reetika Sarkar, Sithija Manage, Xiaoli Gao","doi":"10.1007/s40745-023-00481-5","DOIUrl":"10.1007/s40745-023-00481-5","url":null,"abstract":"<div><p>High-dimensional genomic data studies are often found to exhibit strong correlations, which results in instability and inconsistency in the estimates obtained using commonly used regularization approaches including the Lasso and MCP, etc. In this paper, we perform comparative study of regularization approaches for variable selection under different correlation structures and propose a two-stage procedure named rPGBS to address the issue of stable variable selection in various strong correlation settings. This approach involves repeatedly running a two-stage hierarchical approach consisting of a random pseudo-group clustering and bi-level variable selection. Extensive simulation studies and high-dimensional genomic data analysis on real datasets have demonstrated the advantage of the proposed rPGBS method over some of the most used regularization methods. In particular, rPGBS results in more stable selection of variables across a variety of correlation settings, as compared to some recent methods addressing variable selection with strong correlations: Precision Lasso (Wang et al. in Bioinformatics 35:1181–1187, 2019) and Whitening Lasso (Zhu et al. in Bioinformatics 37:2238–2244, 2021). Moreover, rPGBS has been shown to be computationally efficient across various settings.\u0000</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135049935","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kousik Maiti, Suchandan Kayal, Aditi Kar Gangopadhyay
{"title":"On Progressively Censored Generalized X-Exponential Distribution: (Non) Bayesian Estimation with an Application to Bladder Cancer Data","authors":"Kousik Maiti, Suchandan Kayal, Aditi Kar Gangopadhyay","doi":"10.1007/s40745-023-00477-1","DOIUrl":"10.1007/s40745-023-00477-1","url":null,"abstract":"<div><p>This article addresses estimation of the parameters and reliability characteristics of a generalized <i>X</i>-Exponential distribution based on the progressive type-II censored sample. The maximum likelihood estimates (MLEs) are obtained. The uniqueness and existence of the MLEs are studied. The Bayes estimates are obtained under squared error and entropy loss functions. For computation of the Bayes estimates, Markov Chain Monte Carlo method is used. Bootstrap-<i>t</i> and bootstrap-<i>p</i> methods are used to compute the interval estimates. Further, a simulation study is performed to compare the performance of the proposed estimates. Finally, a real-life dataset is considered and analysed for illustrative purposes.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45717684","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
WeiKang Liu, Yanchun Zhang, Hong Yang, Qinxue Meng
{"title":"A Survey on Differential Privacy for Medical Data Analysis","authors":"WeiKang Liu, Yanchun Zhang, Hong Yang, Qinxue Meng","doi":"10.1007/s40745-023-00475-3","DOIUrl":"10.1007/s40745-023-00475-3","url":null,"abstract":"<div><p>Machine learning methods promote the sustainable development of wise information technology of medicine (WITMED), and a variety of medical data brings high value and convenience to medical analysis. However, the applications of medical data have also been confronted with the risk of privacy leakage that is hard to avoid, especially when conducting correlation analysis or data sharing among multiple institutions. Data security and privacy preservation have recently played an essential role in the field of secure and private medical data analysis, where many differential privacy strategies are applied to medical data publishing and mining. In this paper, we survey research work on the applications of differential privacy for medical data analysis, discussing the necessity of medical privacy-preserving, the advantages of differential privacy, and their applications to typical medical data, such as genomic data and wearable device data. Furthermore, we discuss the challenges and potential future research directions for differential privacy in medical applications.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47520588","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Naïve Bayes Classifier Model for Detecting Spam Mails","authors":"Shrawan Kumar, Kavita Gupta, Manya Gupta","doi":"10.1007/s40745-023-00479-z","DOIUrl":"10.1007/s40745-023-00479-z","url":null,"abstract":"<div><p>In this paper, the machine learning algorithm Naive Bayes Classifier is applied to the Kaggle spam mails dataset to classify the emails in our inbox as spam or ham. The dataset is made up of two main attributes: type and text. The target variable \"Type\" has two factors: ham and spam. The text variable contains the text messages that will be classified as spam or ham. The results are obtained by employing two different Laplace values. It is up to the decision maker to select error tolerance in ham and spam messages derived from two different Laplace values. Computing software R is used for data analysis.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43486989","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Clemens Tegetmeier, Arne Johannssen, Nataliya Chukhrova
{"title":"Artificial Intelligence Algorithms for Collaborative Book Recommender Systems","authors":"Clemens Tegetmeier, Arne Johannssen, Nataliya Chukhrova","doi":"10.1007/s40745-023-00474-4","DOIUrl":"10.1007/s40745-023-00474-4","url":null,"abstract":"<div><p>Book recommender systems provide personalized recommendations of books to users based on their previous searches or purchases. As online trading of books has become increasingly important in recent years, artificial intelligence (AI) algorithms are needed to recommend suitable books to users and encourage them to make purchasing decisions in the short and the long run. In this paper, we consider AI algorithms for so called collaborative book recommender systems, especially the matrix factorization algorithm using the stochastic gradient descent method and the book-based <i>k</i>-nearest-neighbor algorithm. We perform a comprehensive case study based on the Book-Crossing benchmark data set, and implement various variants of both AI algorithms to predict unknown book ratings and to recommend books to individual users based on the highest predicted ratings. This study aims to evaluate the quality of the implemented methods in recommending books by using selected evaluation metrics for AI algorithms.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s40745-023-00474-4.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45942766","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On Poisson Moment Exponential Distribution with Associated Regression and INAR(1) Process","authors":"R. Maya, Jie Huang, M. R. Irshad, Fukang Zhu","doi":"10.1007/s40745-023-00476-2","DOIUrl":"10.1007/s40745-023-00476-2","url":null,"abstract":"<div><p>Numerous studies have emphasised the significance of count data modeling and its applications to phenomena that occur in the real world. From this perspective, this article examines the traits and applications of the Poisson-moment exponential (PME) distribution in the contexts of time series analysis and regression analysis for real-world phenomena. The PME distribution is a novel one-parameter discrete distribution that can be used as a powerful alternative for the existing distributions for modeling over-dispersed count datasets. The advantages of the PME distribution, including the simplicity of the probability mass function and the explicit expressions of the functions of all the statistical properties, drove us to develop the inferential aspects and learn more about its practical applications. The unknown parameter is estimated using both maximum likelihood and moment estimation methods. Also, we present a parametric regression model based on the PME distribution for the count datasets. To strengthen the utility of the suggested distribution, we propose a new first-order integer-valued autoregressive (INAR(1)) process with PME innovations based on binomial thinning for modeling integer-valued time series with over-dispersion. Application to four real datasets confirms the empirical significance of the proposed model.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43264212","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A New Compound Distribution and Its Applications in Over-dispersed Count Data","authors":"Peer Bilal Ahmad, Mohammad Kafeel Wani","doi":"10.1007/s40745-023-00478-0","DOIUrl":"10.1007/s40745-023-00478-0","url":null,"abstract":"<div><p>Every time variance exceeds mean, over-dispersed models are typically employed. This is the reason that over-dispersed models are such an important aspect of statistical modeling. In this work, the parameter of Poisson distribution is assumed to follow a new lifespan distribution called as Chris-Jerry distribution. The resulting compound distribution is an over-dispersed model known as the Poisson-Chris-Jerry distribution. As a result of deriving a general expression for the <i>r th</i> factorial moment, we acquired the moments about origin and the central moments. In addition to this, moment’s related measurements, generating functions, over-dispersion property, reliability characteristics, recurrence relation for probability, and other statistical qualities, have also been described. For the goal of estimating parameter of the suggested model, the maximum likelihood estimation and method of moment estimation have been addressed. The usefulness of maximum likelihood estimates has also been taken into consideration through a simulation study. We employed four real life data sets, examined the goodness-of-fit test, and considered additional standards such as the Akaike’s information criterion and Bayesian information criterion. The outcomes are compared with several potential models.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46822534","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}