Yifan Gao, Zakariyya Mughal, Jose A. Jaramillo-Villegas, Marie Corradi, Alexandre Borrel, Ben Lieberman, Suliman Sharif, John Shaffer, Karamarie Fecho, Ajay Chatrath, Alexandra Maertens, Marc A. T. Teunis, Nicole Kleinstreuer, Thomas Hartung, Thomas Luechtefeld
{"title":"BioBricks.ai: A Versioned Data Registry for Life Sciences Data Assets","authors":"Yifan Gao, Zakariyya Mughal, Jose A. Jaramillo-Villegas, Marie Corradi, Alexandre Borrel, Ben Lieberman, Suliman Sharif, John Shaffer, Karamarie Fecho, Ajay Chatrath, Alexandra Maertens, Marc A. T. Teunis, Nicole Kleinstreuer, Thomas Hartung, Thomas Luechtefeld","doi":"arxiv-2408.17320","DOIUrl":"https://doi.org/arxiv-2408.17320","url":null,"abstract":"Researchers in biomedical research, public health, and the life sciences\u0000often spend weeks or months discovering, accessing, curating, and integrating\u0000data from disparate sources, significantly delaying the onset of actual\u0000analysis and innovation. Instead of countless developers creating redundant and\u0000inconsistent data pipelines, BioBricks.ai offers a centralized data repository\u0000and a suite of developer-friendly tools to simplify access to scientific data.\u0000Currently, BioBricks.ai delivers over ninety biological and chemical datasets.\u0000It provides a package manager-like system for installing and managing\u0000dependencies on data sources. Each 'brick' is a Data Version Control git\u0000repository that supports an updateable pipeline for extraction, transformation,\u0000and loading data into the BioBricks.ai backend at https://biobricks.ai. Use\u0000cases include accelerating data science workflows and facilitating the creation\u0000of novel data assets by integrating multiple datasets into unified, harmonized\u0000resources. In conclusion, BioBricks.ai offers an opportunity to accelerate\u0000access and use of public data through a single open platform.","PeriodicalId":501266,"journal":{"name":"arXiv - QuanBio - Quantitative Methods","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142213324","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A note on promotion time cure models with a new biological consideration","authors":"Zhi Zhao, Fatih Kızılaslan","doi":"arxiv-2408.17188","DOIUrl":"https://doi.org/arxiv-2408.17188","url":null,"abstract":"We introduce a generalized promotion time cure model motivated by a new\u0000biological consideration. The new approach is flexible to model heterogeneous\u0000survival data, in particular for addressing intra-sample heterogeneity.","PeriodicalId":501266,"journal":{"name":"arXiv - QuanBio - Quantitative Methods","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142213322","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Paul N. Patrone, Lili Wang, Sheng Lin-Gibson, Anthony J. Kearsley
{"title":"Uncertainty Quantification of Antibody Measurements: Physical Principles and Implications for Standardization","authors":"Paul N. Patrone, Lili Wang, Sheng Lin-Gibson, Anthony J. Kearsley","doi":"arxiv-2409.00191","DOIUrl":"https://doi.org/arxiv-2409.00191","url":null,"abstract":"Harmonizing serology measurements is critical for identifying reference\u0000materials that permit standardization and comparison of results across\u0000different diagnostic platforms. However, the theoretical foundations of such\u0000tasks have yet to be fully explored in the context of antibody thermodynamics\u0000and uncertainty quantification (UQ). This has restricted the usefulness of\u0000standards currently deployed and limited the scope of materials considered as\u0000viable reference material. To address these problems, we develop rigorous\u0000theories of antibody normalization and harmonization, as well as formulate a\u0000probabilistic framework for defining correlates of protection. We begin by\u0000proposing a mathematical definition of harmonization equipped with structure\u0000needed to quantify uncertainty associated with the choice of standard, assay,\u0000etc. We then show how a thermodynamic description of serology measurements (i)\u0000relates this structure to the Gibbs free-energy of antibody binding, and\u0000thereby (ii) induces a regression analysis that directly harmonizes\u0000measurements. We supplement this with a novel, optimization-based normalization\u0000(not harmonization!) method that checks for consistency between reference and\u0000sample dilution curves. Last, we relate these analyses to uncertainty\u0000propagation techniques to estimate correlates of protection. A key result of\u0000these analyses is that under physically reasonable conditions, the choice of\u0000reference material does not increase uncertainty associated with harmonization\u0000or correlates of protection. We provide examples and validate main ideas in the\u0000context of an interlab study that lays the foundation for using monoclonal\u0000antibodies as a reference for SARS-CoV-2 serology measurements.","PeriodicalId":501266,"journal":{"name":"arXiv - QuanBio - Quantitative Methods","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142213321","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Coalitions of AI-based Methods Predict 15-Year Risks of Breast Cancer Metastasis Using Real-World Clinical Data with AUC up to 0.9","authors":"Xia Jiang, Yijun Zhou, Alan Wells, Adam Brufsky","doi":"arxiv-2408.16256","DOIUrl":"https://doi.org/arxiv-2408.16256","url":null,"abstract":"Breast cancer is one of the two cancers responsible for the most deaths in\u0000women, with about 42,000 deaths each year in the US. That there are over\u0000300,000 breast cancers newly diagnosed each year suggests that only a fraction\u0000of the cancers result in mortality. Thus, most of the women undergo seemingly\u0000curative treatment for localized cancers, but a significant later succumb to\u0000metastatic disease for which current treatments are only temporizing for the\u0000vast majority. The current prognostic metrics are of little actionable value\u0000for 4 of the 5 women seemingly cured after local treatment, and many women are\u0000exposed to morbid and even mortal adjuvant therapies unnecessarily, with these\u0000adjuvant therapies reducing metastatic recurrence by only a third. Thus, there\u0000is a need for better prognostics to target aggressive treatment at those who\u0000are likely to relapse and spare those who were actually cured. While there is a\u0000plethora of molecular and tumor-marker assays in use and under-development to\u0000detect recurrence early, these are time consuming, expensive and still often\u0000un-validated as to actionable prognostic utility. A different approach would\u0000use large data techniques to determine clinical and histopathological\u0000parameters that would provide accurate prognostics using existing data. Herein,\u0000we report on machine learning, together with grid search and Bayesian Networks\u0000to develop algorithms that present a AUC of up to 0.9 in ROC analyses, using\u0000only extant data. Such algorithms could be rapidly translated to clinical\u0000management as they do not require testing beyond routine tumor evaluations.","PeriodicalId":501266,"journal":{"name":"arXiv - QuanBio - Quantitative Methods","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142213325","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Adrienne C. Kinney, Roberto Barrera, Joceline Lega
{"title":"Rapid and accurate mosquito abundance forecasting with Aedes-AI neural networks","authors":"Adrienne C. Kinney, Roberto Barrera, Joceline Lega","doi":"arxiv-2408.16152","DOIUrl":"https://doi.org/arxiv-2408.16152","url":null,"abstract":"We present a method to convert weather data into probabilistic forecasts of\u0000Aedes aegypti abundance. The approach, which relies on the Aedes-AI suite of\u0000neural networks, produces weekly point predictions with corresponding\u0000uncertainty estimates. Once calibrated on past trap and weather data, the model\u0000is designed to use weather forecasts to estimate future trap catches. We\u0000demonstrate that when reliable input data are used, the resulting predictions\u0000have high skill. This technique may therefore be used to supplement vector\u0000surveillance efforts or identify periods of elevated risk for vector-borne\u0000disease outbreaks.","PeriodicalId":501266,"journal":{"name":"arXiv - QuanBio - Quantitative Methods","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142226811","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Q-MRS: A Deep Learning Framework for Quantitative Magnetic Resonance Spectra Analysis","authors":"Christopher J. Wu, Lawrence S. Kegeles, Jia Guo","doi":"arxiv-2408.15999","DOIUrl":"https://doi.org/arxiv-2408.15999","url":null,"abstract":"Magnetic resonance spectroscopy (MRS) is an established technique for\u0000studying tissue metabolism, particularly in central nervous system disorders.\u0000While powerful and versatile, MRS is often limited by challenges associated\u0000with data quality, processing, and quantification. Existing MRS quantification\u0000methods face difficulties in balancing model complexity and reproducibility\u0000during spectral modeling, often falling into the trap of either\u0000oversimplification or over-parameterization. To address these limitations, this\u0000study introduces a deep learning (DL) framework that employs transfer learning,\u0000in which the model is pre-trained on simulated datasets before it undergoes\u0000fine-tuning on in vivo data. The proposed framework showed promising\u0000performance when applied to the Philips dataset from the BIG GABA repository\u0000and represents an exciting advancement in MRS data analysis.","PeriodicalId":501266,"journal":{"name":"arXiv - QuanBio - Quantitative Methods","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142213331","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Filip Dorm, Christian Lange, Scott Loarie, Oisin Mac Aodha
{"title":"Generating Binary Species Range Maps","authors":"Filip Dorm, Christian Lange, Scott Loarie, Oisin Mac Aodha","doi":"arxiv-2408.15956","DOIUrl":"https://doi.org/arxiv-2408.15956","url":null,"abstract":"Accurately predicting the geographic ranges of species is crucial for\u0000assisting conservation efforts. Traditionally, range maps were manually created\u0000by experts. However, species distribution models (SDMs) and, more recently,\u0000deep learning-based variants offer a potential automated alternative. Deep\u0000learning-based SDMs generate a continuous probability representing the\u0000predicted presence of a species at a given location, which must be binarized by\u0000setting per-species thresholds to obtain binary range maps. However, selecting\u0000appropriate per-species thresholds to binarize these predictions is non-trivial\u0000as different species can require distinct thresholds. In this work, we evaluate\u0000different approaches for automatically identifying the best thresholds for\u0000binarizing range maps using presence-only data. This includes approaches that\u0000require the generation of additional pseudo-absence data, along with ones that\u0000only require presence data. We also propose an extension of an existing\u0000presence-only technique that is more robust to outliers. We perform a detailed\u0000evaluation of different thresholding techniques on the tasks of binary range\u0000estimation and large-scale fine-grained visual classification, and we\u0000demonstrate improved performance over existing pseudo-absence free approaches\u0000using our method.","PeriodicalId":501266,"journal":{"name":"arXiv - QuanBio - Quantitative Methods","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142213333","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Deep Learning to Predict Late-Onset Breast Cancer Metastasis: the Single Hyperparameter Grid Search (SHGS) Strategy for Meta Tuning Concerning Deep Feed-forward Neural Network","authors":"Yijun Zhou, Om Arora-Jain, Xia Jiang","doi":"arxiv-2408.15498","DOIUrl":"https://doi.org/arxiv-2408.15498","url":null,"abstract":"While machine learning has advanced in medicine, its widespread use in\u0000clinical applications, especially in predicting breast cancer metastasis, is\u0000still limited. We have been dedicated to constructing a DFNN model to predict\u0000breast cancer metastasis n years in advance. However, the challenge lies in\u0000efficiently identifying optimal hyperparameter values through grid search,\u0000given the constraints of time and resources. Issues such as the infinite\u0000possibilities for continuous hyperparameters like l1 and l2, as well as the\u0000time-consuming and costly process, further complicate the task. To address\u0000these challenges, we developed Single Hyperparameter Grid Search (SHGS)\u0000strategy, serving as a preselection method before grid search. Our experiments\u0000with SHGS applied to DFNN models for breast cancer metastasis prediction focus\u0000on analyzing eight target hyperparameters: epochs, batch size, dropout, L1, L2,\u0000learning rate, decay, and momentum. We created three figures, each depicting\u0000the experiment results obtained from three LSM-I-10-Plus-year datasets. These\u0000figures illustrate the relationship between model performance and the target\u0000hyperparameter values. For each hyperparameter, we analyzed whether changes in\u0000this hyperparameter would affect model performance, examined if there were\u0000specific patterns, and explored how to choose values for the particular\u0000hyperparameter. Our experimental findings reveal that the optimal value of a\u0000hyperparameter is not only dependent on the dataset but is also significantly\u0000influenced by the settings of other hyperparameters. Additionally, our\u0000experiments suggested some reduced range of values for a target hyperparameter,\u0000which may be helpful for low-budget grid search. This approach serves as a\u0000prior experience and foundation for subsequent use of grid search to enhance\u0000model performance.","PeriodicalId":501266,"journal":{"name":"arXiv - QuanBio - Quantitative Methods","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142213328","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A reaction network model of microscale liquid-liquid phase separation reveals effects of spatial dimension","authors":"Jinyoung Kim, Sean D. Lawley, Jinsu Kim","doi":"arxiv-2408.15303","DOIUrl":"https://doi.org/arxiv-2408.15303","url":null,"abstract":"Proteins can form droplets via liquid-liquid phase separation (LLPS) in\u0000cells. Recent experiments demonstrate that LLPS is qualitatively different on\u0000two-dimensional (2d) surfaces compared to three-dimensional (3d) solutions. In\u0000this paper, we use mathematical modeling to investigate the causes of the\u0000discrepancies between LLPS in 2d versus 3d. We model the number of proteins and\u0000droplets inducing LLPS by continuous-time Markov chains and use chemical\u0000reaction network theory to analyze the model. To reflect the influence of space\u0000dimension, droplet formation and dissociation rates are determined using the\u0000first hitting times of diffusing proteins. We first show that our stochastic\u0000model reproduces the appropriate phase diagram and is consistent with the\u0000relevant thermodynamic constraints. After further analyzing the model, we find\u0000that it predicts that the space dimension induces qualitatively different\u0000features of LLPS which are consistent with recent experiments. While it has\u0000been claimed that the differences between 2d and 3d LLPS stems mainly from\u0000different diffusion coefficients, our analysis is independent of the diffusion\u0000coefficients of the proteins since we use the stationary model behavior.\u0000Therefore, our results give new hypotheses about how space dimension affects\u0000LLPS.","PeriodicalId":501266,"journal":{"name":"arXiv - QuanBio - Quantitative Methods","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142213330","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Phuc D Nguyen, Claire Dunbar, Hannah Scott, Bastien Lechat, Jack Manners, Gorica Micic, Nicole Lovato, Amy C Reynolds, Leon Lack, Robert Adams, Danny Eckert, Andrew Vakulin, Peter G Catcheside
{"title":"A novel method to separate circadian from non-circadian masking effects in order to enhance daily circadian timing and amplitude estimation from core body temperature","authors":"Phuc D Nguyen, Claire Dunbar, Hannah Scott, Bastien Lechat, Jack Manners, Gorica Micic, Nicole Lovato, Amy C Reynolds, Leon Lack, Robert Adams, Danny Eckert, Andrew Vakulin, Peter G Catcheside","doi":"arxiv-2408.15295","DOIUrl":"https://doi.org/arxiv-2408.15295","url":null,"abstract":"Circadian disruption contributes to adverse effects on sleep, performance,\u0000and health. One accepted method to track continuous daily changes in circadian\u0000timing is to measure core body temperature (CBT), and establish daily,\u0000circadian-related CBT minimum time (Tmin). This method typically applies\u0000cosine-model fits to measured CBT data, which may not adequately account for\u0000substantial wake metabolic activity and sleep effects on CBT that confound and\u0000mask circadian effects, and thus estimates of the circadian-related Tmin. This\u0000study introduced a novel physiology-grounded analytic approach to separate\u0000circadian from non-circadian effects on CBT, which we compared against\u0000traditional cosine-based methods. The dataset comprised 33 healthy participants\u0000attending a 39-hour in-laboratory study with an initial overnight sleep\u0000followed by an extended wake period. CBT data were collected at 30-second\u0000intervals via ingestible capsules. Our design captured CBT during both the\u0000baseline sleep period and during extended wake period (without sleep) and\u0000allowed us to model the influence of circadian and non-circadian effects of\u0000sleep, wake, and activity on CBT using physiology-guided generalized additive\u0000models. Model fits and estimated Tmin inferred from extended wake without sleep\u0000were compared with traditional cosine-based models fits. Compared to the\u0000traditional cosine model, the new model exhibited superior fits to CBT (Pearson\u0000R 0.90 [95%CI; [0.83 - 0.96] versus 0.81 [0.55-0.93]). The difference between\u0000estimated vs measured circadian Tmin, derived from the day without sleep, was\u0000better fit with our method (0.2 [-0.5,0.3] hours) versus previous methods (1.4\u0000[1.1 to 1.7] hours). This new method provides superior demasking of\u0000non-circadian influences compared to traditional cosine methods, including the\u0000removal of a sleep-related bias towards an earlier estimate of circadian Tmin.","PeriodicalId":501266,"journal":{"name":"arXiv - QuanBio - Quantitative Methods","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142213353","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}