Samuel Ackerman, Ella Rabinovich, Eitan Farchi, Ateret Anaby-Tavor
{"title":"A Novel Metric for Measuring the Robustness of Large Language Models in Non-adversarial Scenarios","authors":"Samuel Ackerman, Ella Rabinovich, Eitan Farchi, Ateret Anaby-Tavor","doi":"arxiv-2408.01963","DOIUrl":"https://doi.org/arxiv-2408.01963","url":null,"abstract":"We evaluate the robustness of several large language models on multiple\u0000datasets. Robustness here refers to the relative insensitivity of the model's\u0000answers to meaning-preserving variants of their input. Benchmark datasets are\u0000constructed by introducing naturally-occurring, non-malicious perturbations, or\u0000by generating semantically equivalent paraphrases of input questions or\u0000statements. We further propose a novel metric for assessing a model robustness,\u0000and demonstrate its benefits in the non-adversarial scenario by empirical\u0000evaluation of several models on the created datasets.","PeriodicalId":501172,"journal":{"name":"arXiv - STAT - Applications","volume":"62 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141944132","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SPINEX-Optimization: Similarity-based Predictions with Explainable Neighbors Exploration for Single, Multiple, and Many Objectives Optimization","authors":"MZ Naser, Ahmed Z Naser","doi":"arxiv-2408.02155","DOIUrl":"https://doi.org/arxiv-2408.02155","url":null,"abstract":"This article introduces an expansion within SPINEX (Similarity-based\u0000Predictions with Explainable Neighbors Exploration) suite, now extended to\u0000single, multiple, and many objective optimization problems. The newly developed\u0000SPINEX-Optimization algorithm incorporates a nuanced approach to optimization\u0000in low and high dimensions by accounting for similarity across various\u0000solutions. We conducted extensive benchmarking tests comparing\u0000SPINEX-Optimization against ten single and eight multi/many optimization\u0000algorithms over 55 mathematical benchmarking functions and realistic scenarios.\u0000Then, we evaluated the performance of the proposed algorithm in terms of\u0000scalability and computational efficiency across low and high dimensions, number\u0000of objectives, and population sizes. The results indicate that\u0000SPINEX-Optimization consistently outperforms most algorithms and excels in\u0000managing complex scenarios, especially in high dimensions. The algorithm's\u0000capabilities in explainability, Pareto efficiency, and moderate complexity are\u0000highlighted through in-depth experiments and visualization methods.","PeriodicalId":501172,"journal":{"name":"arXiv - STAT - Applications","volume":"10 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141944127","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yunjiao Lu, Jean-Charles Quinton, Caroline Jolly, Vincent Brault
{"title":"A statistical procedure to assist dysgraphia detection through dynamic modelling of handwriting","authors":"Yunjiao Lu, Jean-Charles Quinton, Caroline Jolly, Vincent Brault","doi":"arxiv-2408.02099","DOIUrl":"https://doi.org/arxiv-2408.02099","url":null,"abstract":"Dysgraphia is a neurodevelopmental condition in which children encounter\u0000difficulties in handwriting. Dysgraphia is not a disorder per se, but is\u0000secondary to neurodevelopmental disorders, mainly dyslexia, Developmental\u0000Coordination Disorder (DCD, also known as dyspraxia) or Attention Deficit\u0000Hyperactivity Disorder (ADHD). Since the mastering of handwriting is central\u0000for the further acquisition of other skills such as orthograph or syntax, an\u0000early diagnosis and handling of dysgraphia is thus essential for the academic\u0000success of children. In this paper, we investigated a large handwriting\u0000database composed of 36 individual symbols (26 isolated letters of the Latin\u0000alphabet written in cursive and the 10 digits) written by 545 children from 6,5\u0000to 16 years old, among which 66 displayed dysgraphia (around 12%). To better\u0000understand the dynamics of handwriting, mathematical models of nonpathological\u0000handwriting have been proposed, assuming oscillatory and fluid generation of\u0000strokes (Parsimonious Oscillatory Model of Handwriting [Andr'e, 2014]). The\u0000purpose of this work is to study how such models behave when applied to\u0000children dysgraphic handwriting, and whether a lack of fit may help in the\u0000diagnosis, using a two-layer classification procedure with different\u0000compositions of classification algorithms.","PeriodicalId":501172,"journal":{"name":"arXiv - STAT - Applications","volume":"39 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141943967","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Journey-Based Transit Equity Analysis: A Case Study in the Greater Boston Area","authors":"Daniela Shuman, Xiaotong Guo, Nicholas S. Caros","doi":"arxiv-2408.01888","DOIUrl":"https://doi.org/arxiv-2408.01888","url":null,"abstract":"In this paper, a new methodology, journey-based equity analysis, is presented\u0000for measuring the equity of transit convenience between income groups. Two data\u0000sources are combined in the proposed transit equity analysis: on-board\u0000ridership surveys and passenger origin-destination data. The spatial unit of\u0000our proposed transit equity analysis is census blocks, which are relatively\u0000stable over time and allows an exploration of the data that is granular enough\u0000to make conclusions about the service convenience various communities are\u0000facing. A case study in the Greater Boston area using real data from the\u0000Massachusetts Bay Transportation Authority (MBTA) bus network demonstrates a\u0000significant difference in transit service convenience, measured by number of\u0000transfers per unit distance, transfer wait time and travel time per unit\u0000distance, between low-income riders and high income riders. Implications of\u0000analysis results to transit agencies are also discussed.","PeriodicalId":501172,"journal":{"name":"arXiv - STAT - Applications","volume":"23 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141944129","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Reconstructing and Forecasting Marine Dynamic Variable Fields across Space and Time Globally and Gaplessly","authors":"Zhixi Xiong, Yukang Jiang, Wenfang Lu, Xueqin Wang, Ting Tian","doi":"arxiv-2408.01509","DOIUrl":"https://doi.org/arxiv-2408.01509","url":null,"abstract":"Spatiotemporal projections in marine science are essential for understanding\u0000ocean systems and their impact on Earth's climate. However, existing AI-based\u0000and statistics-based inversion methods face challenges in leveraging ocean\u0000data, generating continuous outputs, and incorporating physical constraints. We\u0000propose the Marine Dynamic Reconstruction and Forecast Neural Networks\u0000(MDRF-Net), which integrates marine physical mechanisms and observed data to\u0000reconstruct and forecast continuous ocean temperature-salinity and dynamic\u0000fields. MDRF-Net leverages statistical theories and techniques, incorporating\u0000parallel neural network sharing initial layer, two-step training strategy, and\u0000ensemble methodology, facilitating in exploring challenging marine areas like\u0000the Arctic zone. We have theoretically justified the efficacy of our ensemble\u0000method and the rationality of it by providing an upper bound on its\u0000generalization error.The effectiveness of MDRF-Net's is validated through a\u0000comprehensive simulation study, which highlights its capability to reliably\u0000estimate unknown parameters. Comparison with other inversion methods and\u0000reanalysis data are also conducted, and the global test error is 0.455{deg}C\u0000for temperature and 0.0714psu for salinity. Overall, MDRF-Net effectively\u0000learns the ocean dynamics system using physical mechanisms and statistical\u0000insights, contributing to a deeper understanding of marine systems and their\u0000impact on the environment and human use of the ocean.","PeriodicalId":501172,"journal":{"name":"arXiv - STAT - Applications","volume":"30 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141944130","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Momentum Capture and Prediction System Based on Wimbledon Open2023 Tournament Data","authors":"Chang Liu, Tongyuan Yang, Yan Zhao","doi":"arxiv-2408.01544","DOIUrl":"https://doi.org/arxiv-2408.01544","url":null,"abstract":"There is a hidden energy in tennis, which cannot be seen or touched. It is\u0000the force that controls the flow of the game and is present in all types of\u0000matches. This mysterious force is Momentum. This study introduces an evaluation\u0000model that synergizes the Entropy Weight Method (EWM) and Gray Relation\u0000Analysis (GRA) to quantify momentum's impact on match outcomes. Empirical\u0000validation was conducted through Mann-Whitney U and Kolmogorov-Smirnov tests,\u0000which yielded p values of 0.0043 and 0.00128,respectively. These results\u0000underscore the non-random association between momentum shifts and match\u0000outcomes, highlighting the critical role of momentum in tennis. Otherwise, our\u0000investigation foucus is the creation of a predictive model that combines the\u0000advanced machine learning algorithm XGBoost with the SHAP framework. This model\u0000enables precise predictions of match swings with exceptional accuracy (0.999013\u0000for multiple matches and 0.992738 for finals). The model's ability to identify\u0000the influence of specific factors on match dynamics,such as bilateral distance\u0000run during points, demonstrates its prowess.The model's generalizability was\u0000thoroughly evaluated using datasets from the four Grand Slam tournaments. The\u0000results demonstrate its remarkable adaptability to different match\u0000scenarios,despite minor variations in predictive accuracy. It offers strategic\u0000insights that can help players effectively respond to opponents' shifts in\u0000momentum,enhancing their competitive edge.","PeriodicalId":501172,"journal":{"name":"arXiv - STAT - Applications","volume":"136 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141943942","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Bayesian reliability acceptance sampling plans under adaptive simple step stress partial accelerated life test","authors":"Rathin Das, Biswabrata Pradhan","doi":"arxiv-2408.00734","DOIUrl":"https://doi.org/arxiv-2408.00734","url":null,"abstract":"In the traditional simple step-stress partial accelerated life test\u0000(SSSPALT), the items are put on normal operating conditions up to a certain\u0000time and after that the stress is increased to get the failure time information\u0000early. However, when the stress increases, an additional cost is incorporated\u0000that increases the cost of the life test. In this context, an adaptive SSSPALT\u0000is considered where the stress is increased after a certain time if the number\u0000of failures up to that point is less than a pre-specified number of failures.\u0000We consider determination of Bayesian reliability acceptance sampling plans\u0000(BSP) through adaptive SSSALT conducted under Type I censoring. The BSP under\u0000adaptive SSSPALT is called BSPAA. The Bayes decision function and Bayes risk\u0000are obtained for the general loss function. Optimal BSPAAs are obtained for the\u0000quadratic loss function by minimizing Bayes risk. An algorithm is provided for\u0000computation of optimum BSPAA. Comparisons between the proposed BSPAA and the\u0000conventional BSP through non-accelerated life test (CBSP) and conventional BSP\u0000through SSSPALT (CBSPA) are carried out.","PeriodicalId":501172,"journal":{"name":"arXiv - STAT - Applications","volume":"45 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141882621","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"When Audits and Recounts Distract from Election Integrity: The 2020 U.S. Presidential Election in Georgia","authors":"Philip B. Stark","doi":"arxiv-2408.00055","DOIUrl":"https://doi.org/arxiv-2408.00055","url":null,"abstract":"Georgia was central to efforts to overturn the 2020 Presidential election,\u0000including a call from then-president Trump to Georgia Secretary of State\u0000Raffensperger asking Raffensperger to `find' 11,780 votes. Raffensperger has\u0000maintained that a `100% full-count risk-limiting audit' and a machine recount\u0000agreed with the initial machine-count results, which proved that the reported\u0000election results were accurate and that `no votes were flipped.' There is no\u0000indication of widespread fraud, but there is reason to distrust the election\u0000outcome: the two machine counts and the manual `audit' tallies disagree\u0000substantially, even about the number of ballots cast. Some ballots in Fulton\u0000County were included in the original count at least twice; some were included\u0000in the machine recount at least thrice. Audit results for some tally batches\u0000were omitted from the reported audit totals. The two machine counts and the\u0000audit were not probative of who won because of poor processes and controls: a\u0000lack of secure physical chain of custody, ballot accounting, pollbook\u0000reconciliation, and accounting for other election materials such as memory\u0000cards. Moreover, most voters voted with demonstrably untrustworthy\u0000ballot-marking devices, so even a perfect handcount or audit would not\u0000necessarily reveal who really won. True risk-limiting audits (RLAs) and\u0000rigorous recounts can limit the risk that an incorrect electoral outcome will\u0000be certified rather than being corrected. But no procedure can limit that risk\u0000without a trustworthy record of the vote. And even a properly conducted RLA of\u0000some contests in an election does not show that any other contests in that\u0000election were decided correctly. The 2020 U.S. Presidential election in Georgia\u0000illustrates unrecoverable errors that can render recounts and audits `security\u0000theater' that distract from the more serious problems rather than justifying\u0000trust.","PeriodicalId":501172,"journal":{"name":"arXiv - STAT - Applications","volume":"75 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141882627","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kathleen Salazar-Serna, Lorena Cadavid, Carlos J. Franco
{"title":"Modeling Urban Transport Choices: Incorporating Sociocultural Aspects","authors":"Kathleen Salazar-Serna, Lorena Cadavid, Carlos J. Franco","doi":"arxiv-2407.21307","DOIUrl":"https://doi.org/arxiv-2407.21307","url":null,"abstract":"This paper introduces an agent-based simulation model aimed at understanding\u0000urban commuters mode choices and evaluating the impacts of transport policies\u0000to promote sustainable mobility. Crafted for developing countries, where\u0000utilitarian travel heavily relies on motorcycles, the model integrates\u0000sociocultural factors that influence transport behavior. Multinomial models and\u0000inferential statistics applied to survey data from Cali, Colombia, inform the\u0000model, revealing significant influences of sociodemographic factors and travel\u0000attributes on mode choice. Findings highlight the importance of cost, time,\u0000safety, comfort, and personal security, with disparities across socioeconomic\u0000groups. Policy simulations demonstrate positive responses to interventions like\u0000free public transportation, increased bus frequency, and enhanced security, yet\u0000with modest shifts in mode choice. Multifaceted policy approaches are deemed\u0000more effective, addressing diverse user preferences. Outputs can be extended to\u0000cities with similar sociocultural characteristics and transport dynamics. The\u0000methodology applied in this work can be replicated for other territories.","PeriodicalId":501172,"journal":{"name":"arXiv - STAT - Applications","volume":"204 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141866618","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Remarks on the Poisson additive process","authors":"Haoming Wang","doi":"arxiv-2407.21651","DOIUrl":"https://doi.org/arxiv-2407.21651","url":null,"abstract":"The Poisson additive process is a binary conditionally additive process such\u0000that the first is the Poisson process provided the second is given. We prove\u0000the existence and uniqueness of predictable increasing mean intensity for the\u0000Poisson additive process. Besides, we establish a likelihood ratio formula for\u0000the Poisson additive process. It directly implies there doesn't exist an\u0000anticipative Poisson additive process which is absolutely continuous with\u0000respect to the standard Poisson process, which confirms a conjecture proposed\u0000by P. Br'emaud in his PhD thesis in 1972. When applied to the Hawkes process,\u0000it concludes that the self-exciting function is constant. Similar results are\u0000also obtained for the Wiener additive process and Markov additive process.","PeriodicalId":501172,"journal":{"name":"arXiv - STAT - Applications","volume":"59 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141866617","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}