{"title":"Similarity-Based Analysis of Atmospheric Organic Compounds for Machine Learning Applications","authors":"Hilda Sandström, Patrick Rinke","doi":"arxiv-2406.18171","DOIUrl":"https://doi.org/arxiv-2406.18171","url":null,"abstract":"The formation of aerosol particles in the atmosphere impacts air quality and\u0000climate change, but many of the organic molecules involved remain unknown.\u0000Machine learning could aid in identifying these compounds through accelerated\u0000analysis of molecular properties and detection characteristics. However, such\u0000progress is hindered by the current lack of curated datasets for atmospheric\u0000molecules and their associated properties. To tackle this challenge, we propose\u0000a similarity analysis that connects atmospheric compounds to existing large\u0000molecular datasets used for machine learning development. We find a small\u0000overlap between atmospheric and non-atmospheric molecules using standard\u0000molecular representations in machine learning applications. The identified\u0000out-of-domain character of atmospheric compounds is related to their distinct\u0000functional groups and atomic composition. Our investigation underscores the\u0000need for collaborative efforts to gather and share more molecular-level\u0000atmospheric chemistry data. The presented similarity based analysis can be used\u0000for future dataset curation for machine learning development in the atmospheric\u0000sciences.","PeriodicalId":501065,"journal":{"name":"arXiv - PHYS - Data Analysis, Statistics and Probability","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141500971","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Brownian friction dynamics: fluctuations in sliding distance","authors":"Ruibin Xu, Feng Zhou, B. N. J. Persson","doi":"arxiv-2406.16139","DOIUrl":"https://doi.org/arxiv-2406.16139","url":null,"abstract":"We have studied the fluctuation (noise) in the position of sliding blocks\u0000under constant driving forces on different substrate surfaces. The experimental\u0000data are complemented by simulations using a simple spring-block model where\u0000the asperity contact regions are modeled by miniblocks connected to the big\u0000block by viscoelastic springs. The miniblocks experience forces that fluctuate\u0000randomly with the lateral position, simulating the interaction between\u0000asperities on the block and the substrate. The theoretical model provides\u0000displacement power spectra that agree well with the experimental results.","PeriodicalId":501065,"journal":{"name":"arXiv - PHYS - Data Analysis, Statistics and Probability","volume":"76 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141500974","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Gaussian approximation of dynamic cavity equations for linearly-coupled stochastic dynamics","authors":"Mattia Tarabolo, Luca Dall'Asta","doi":"arxiv-2406.14200","DOIUrl":"https://doi.org/arxiv-2406.14200","url":null,"abstract":"Stochastic dynamics on sparse graphs and disordered systems often lead to\u0000complex behaviors characterized by heterogeneity in time and spatial scales,\u0000slow relaxation, localization, and aging phenomena. The mathematical tools and\u0000approximation techniques required to analyze these complex systems are still\u0000under development, posing significant technical challenges and resulting in a\u0000reliance on numerical simulations. We introduce a novel computational framework\u0000for investigating the dynamics of sparse disordered systems with continuous\u0000degrees of freedom. Starting with a graphical model representation of the\u0000dynamic partition function for a system of linearly-coupled stochastic\u0000differential equations, we use dynamic cavity equations on locally tree-like\u0000factor graphs to approximate the stochastic measure. Here, cavity marginals are\u0000identified with local functionals of single-site trajectories. Our primary\u0000approximation involves a second-order truncation of a small-coupling expansion,\u0000leading to a Gaussian form for the cavity marginals. For linear dynamics with\u0000additive noise, this method yields a closed set of causal integro-differential\u0000equations for cavity versions of one-time and two-time averages. These\u0000equations provide an exact dynamical description within the local tree-like\u0000approximation, retrieving classical results for the spectral density of sparse\u0000random matrices. Global constraints, non-linear forces, and state-dependent\u0000noise terms can be addressed using a self-consistent perturbative closure\u0000technique. The resulting equations resemble those of dynamical mean-field\u0000theory in the mode-coupling approximation used for fully-connected models.\u0000However, due to their cavity formulation, the present method can also be\u0000applied to ensembles of sparse random graphs and employed as a message-passing\u0000algorithm on specific graph instances.","PeriodicalId":501065,"journal":{"name":"arXiv - PHYS - Data Analysis, Statistics and Probability","volume":"65 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141500976","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Machine Learning Models for Accurately Predicting Properties of CsPbCl3 Perovskite Quantum Dots","authors":"Mehmet Sıddık Çadırcı, Musa Çadırcı","doi":"arxiv-2406.15515","DOIUrl":"https://doi.org/arxiv-2406.15515","url":null,"abstract":"Perovskite Quantum Dots (PQDs) have a promising future for several\u0000applications due to their unique properties. This study investigates the\u0000effectiveness of Machine Learning (ML) in predicting the size, absorbance (1S\u0000abs) and photoluminescence (PL) properties of $mathrm{CsPbCl}_3$ PQDs using\u0000synthesizing features as the input dataset. the study employed ML models of\u0000Support Vector Regression (SVR), Nearest Neighbour Distance (NND), Random\u0000Forest (RF), Gradient Boosting Machine (GBM), Decision Tree (DT) and Deep\u0000Learning (DL). Although all models performed highly accurate results, SVR and\u0000NND demonstrated the best accurate property prediction by achieving excellent\u0000performance on the test and training datasets, with high $mathrm{R}^2$ and low\u0000Root Mean Squared Error (RMSE) and low Mean Absolute Error (MAE) metric values.\u0000Given that ML is becoming more superior, its ability to understand the QDs\u0000field could prove invaluable to shape the future of nanomaterials designing.","PeriodicalId":501065,"journal":{"name":"arXiv - PHYS - Data Analysis, Statistics and Probability","volume":"11 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141500973","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jalmari Passilahti, Anton Vladyka, Johannes Niskanen
{"title":"Encoder-Decoder Neural Networks in Interpretation of X-ray Spectra","authors":"Jalmari Passilahti, Anton Vladyka, Johannes Niskanen","doi":"arxiv-2406.14044","DOIUrl":"https://doi.org/arxiv-2406.14044","url":null,"abstract":"Encoder-decoder neural networks (EDNN) condense information most relevant to\u0000the output of the feedforward network to activation values at a bottleneck\u0000layer. We study the use of this architecture in emulation and interpretation of\u0000simulated X-ray spectroscopic data with the aim to identify key structural\u0000characteristics for the spectra, previously studied using emulator-based\u0000component analysis (ECA). We find an EDNN to outperform ECA in covered target\u0000variable variance, but also discover complications in interpreting the latent\u0000variables in physical terms. As a compromise of the benefits of these two\u0000approaches, we develop a network where the linear projection of ECA is used,\u0000thus maintaining the beneficial characteristics of vector expansion from the\u0000latent variables for their interpretation. These results underline the\u0000necessity of information recovery after its condensation and identification of\u0000decisive structural degrees for the output spectra for a justified\u0000interpretation.","PeriodicalId":501065,"journal":{"name":"arXiv - PHYS - Data Analysis, Statistics and Probability","volume":"20 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141500978","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On countering adversarial perturbations in graphs using error correcting codes","authors":"Saif Eddin Jabari","doi":"arxiv-2406.14245","DOIUrl":"https://doi.org/arxiv-2406.14245","url":null,"abstract":"We consider the problem of a graph subjected to adversarial perturbations,\u0000such as those arising from cyber-attacks, where edges are covertly added or\u0000removed. The adversarial perturbations occur during the transmission of the\u0000graph between a sender and a receiver. To counteract potential perturbations,\u0000we explore a repetition coding scheme with sender-assigned binary noise and\u0000majority voting on the receiver's end to rectify the graph's structure. Our\u0000approach operates without prior knowledge of the attack's characteristics. We\u0000provide an analytical derivation of a bound on the number of repetitions needed\u0000to satisfy probabilistic constraints on the quality of the reconstructed graph.\u0000We show that the method can accurately decode graphs that were subjected to\u0000non-random edge removal, namely, those connected to vertices with the highest\u0000eigenvector centrality, in addition to random addition and removal of edges by\u0000the attacker.","PeriodicalId":501065,"journal":{"name":"arXiv - PHYS - Data Analysis, Statistics and Probability","volume":"27 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141500975","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Anastasia Ragulskaya, Vladimir Starostin, Fajun Zhang, Christian Gutt, Frank Schreiber
{"title":"On the analysis of two-time correlation functions: equilibrium vs non-equilibrium systems","authors":"Anastasia Ragulskaya, Vladimir Starostin, Fajun Zhang, Christian Gutt, Frank Schreiber","doi":"arxiv-2406.12520","DOIUrl":"https://doi.org/arxiv-2406.12520","url":null,"abstract":"X-ray photon correlation spectroscopy (XPCS) is a powerful tool for the\u0000investigation of dynamics covering a broad range of time and length scales. The\u0000two-time correlation function (TTC) is commonly used to track non-equilibrium\u0000dynamical evolution in XPCS measurements, followed by the extraction of\u0000one-time correlations. While the theoretical foundation for the quantitative\u0000analysis of TTCs is primarily established for equilibrium systems, where key\u0000parameters such as diffusion remain constant, non-equilibrium systems pose a\u0000unique challenge. In such systems, different projections (\"cuts\") of the TTC\u0000may lead to divergent results if the underlying fundamental parameters\u0000themselves are subject to temporal variations. This article explores widely\u0000used approaches for TTC calculations and common methods for extracting relevant\u0000information from correlation functions on case studies, particularly in the\u0000light of comparing dynamics in equilibrium and non-equilibrium systems.","PeriodicalId":501065,"journal":{"name":"arXiv - PHYS - Data Analysis, Statistics and Probability","volume":"35 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141500977","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Uncertainties in ROC (Receiver Operating Characteristic) Curves Derived from Counting Data","authors":"M. P. Fewell","doi":"arxiv-2406.11396","DOIUrl":"https://doi.org/arxiv-2406.11396","url":null,"abstract":"The ROC (receiver operating characteristic) curve is a widely used device for\u0000assessing decision-making systems. It seems surprising, in view of its history\u0000dating back to World War Two, that the assignment of uncertainties to a ROC\u0000curve is apparently not settled. This note returns to the question, focusing on\u0000the application of ROC curves to the analysis of data from counting experiments\u0000and taking a practical operational approach to the concept of uncertainty.","PeriodicalId":501065,"journal":{"name":"arXiv - PHYS - Data Analysis, Statistics and Probability","volume":"32 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141500979","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Predicting Exoplanetary Features with a Residual Model for Uniform and Gaussian Distributions","authors":"Andrew Sweet","doi":"arxiv-2406.10771","DOIUrl":"https://doi.org/arxiv-2406.10771","url":null,"abstract":"The advancement of technology has led to rampant growth in data collection\u0000across almost every field, including astrophysics, with researchers turning to\u0000machine learning to process and analyze this data. One prominent example of\u0000this data in astrophysics is the atmospheric retrievals of exoplanets. In order\u0000to help bridge the gap between machine learning and astrophysics domain\u0000experts, the 2023 Ariel Data Challenge was hosted to predict posterior\u0000distributions of 7 exoplanetary features. The procedure outlined in this paper\u0000leveraged a combination of two deep learning models to address this challenge:\u0000a Multivariate Gaussian model that generates the mean and covariance matrix of\u0000a multivariate Gaussian distribution, and a Uniform Quantile model that\u0000predicts quantiles for use as the upper and lower bounds of a uniform\u0000distribution. Training of the Multivariate Gaussian model was found to be\u0000unstable, while training of the Uniform Quantile model was stable. An ensemble\u0000of uniform distributions was found to have competitive results during testing\u0000(posterior score of 696.43), and when combined with a multivariate Gaussian\u0000distribution achieved a final rank of third in the 2023 Ariel Data Challenge\u0000(final score of 681.57).","PeriodicalId":501065,"journal":{"name":"arXiv - PHYS - Data Analysis, Statistics and Probability","volume":"19 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141500980","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Daniel E. Lopez-Fogliani, Andres D. Perez, Roberto Ruiz de Austri
{"title":"Insights into Dark Matter Direct Detection Experiments: Decision Trees versus Deep Learning","authors":"Daniel E. Lopez-Fogliani, Andres D. Perez, Roberto Ruiz de Austri","doi":"arxiv-2406.10372","DOIUrl":"https://doi.org/arxiv-2406.10372","url":null,"abstract":"The detection of Dark Matter (DM) remains a significant challenge in particle\u0000physics. This study exploits advanced machine learning models to improve\u0000detection capabilities of liquid xenon time projection chamber experiments,\u0000utilizing state-of-the-art transformers alongside traditional methods like\u0000Multilayer Perceptrons and Convolutional Neural Networks. We evaluate various\u0000data representations and find that simplified feature representations,\u0000particularly corrected S1 and S2 signals, retain critical information for\u0000classification. Our results show that while transformers offer promising\u0000performance, simpler models like XGBoost can achieve comparable results with\u0000optimal data representations. We also derive exclusion limits in the\u0000cross-section versus DM mass parameter space, showing minimal differences\u0000between XGBoost and the best performing deep learning models. The comparative\u0000analysis of different machine learning approaches provides a valuable reference\u0000for future experiments by guiding the choice of models and data representations\u0000to maximize detection capabilities.","PeriodicalId":501065,"journal":{"name":"arXiv - PHYS - Data Analysis, Statistics and Probability","volume":"73 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141500981","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}