{"title":"Efficient Approximate Methods for Design of Experiments for Copolymer Engineering","authors":"Swagatam Mukhopadhyay","doi":"arxiv-2408.02166","DOIUrl":"https://doi.org/arxiv-2408.02166","url":null,"abstract":"We develop a set of algorithms to solve a broad class of Design of Experiment\u0000(DoE) problems efficiently. Specifically, we consider problems in which one\u0000must choose a subset of polymers to test in experiments such that the learning\u0000of the polymeric design rules is optimal. This subset must be selected from a\u0000larger set of polymers permissible under arbitrary experimental design\u0000constraints. We demonstrate the performance of our algorithms by solving\u0000several pragmatic nucleic acid therapeutics engineering scenarios, where\u0000limitations in synthesis of chemically diverse nucleic acids or feasibility of\u0000measurements in experimental setups appear as constraints. Our approach focuses\u0000on identifying optimal experimental designs from a given set of experiments,\u0000which is in contrast to traditional, generative DoE methods like BIBD. Finally,\u0000we discuss how these algorithms are broadly applicable to well-established\u0000optimal DoE criteria like D-optimality.","PeriodicalId":501266,"journal":{"name":"arXiv - QuanBio - Quantitative Methods","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141935360","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jihye Choi, Nils Palumbo, Prasad Chalasani, Matthew M. Engelhard, Somesh Jha, Anivarya Kumar, David Page
{"title":"MALADE: Orchestration of LLM-powered Agents with Retrieval Augmented Generation for Pharmacovigilance","authors":"Jihye Choi, Nils Palumbo, Prasad Chalasani, Matthew M. Engelhard, Somesh Jha, Anivarya Kumar, David Page","doi":"arxiv-2408.01869","DOIUrl":"https://doi.org/arxiv-2408.01869","url":null,"abstract":"In the era of Large Language Models (LLMs), given their remarkable text\u0000understanding and generation abilities, there is an unprecedented opportunity\u0000to develop new, LLM-based methods for trustworthy medical knowledge synthesis,\u0000extraction and summarization. This paper focuses on the problem of\u0000Pharmacovigilance (PhV), where the significance and challenges lie in\u0000identifying Adverse Drug Events (ADEs) from diverse text sources, such as\u0000medical literature, clinical notes, and drug labels. Unfortunately, this task\u0000is hindered by factors including variations in the terminologies of drugs and\u0000outcomes, and ADE descriptions often being buried in large amounts of narrative\u0000text. We present MALADE, the first effective collaborative multi-agent system\u0000powered by LLM with Retrieval Augmented Generation for ADE extraction from drug\u0000label data. This technique involves augmenting a query to an LLM with relevant\u0000information extracted from text resources, and instructing the LLM to compose a\u0000response consistent with the augmented data. MALADE is a general LLM-agnostic\u0000architecture, and its unique capabilities are: (1) leveraging a variety of\u0000external sources, such as medical literature, drug labels, and FDA tools (e.g.,\u0000OpenFDA drug information API), (2) extracting drug-outcome association in a\u0000structured format along with the strength of the association, and (3) providing\u0000explanations for established associations. Instantiated with GPT-4 Turbo or\u0000GPT-4o, and FDA drug label data, MALADE demonstrates its efficacy with an Area\u0000Under ROC Curve of 0.90 against the OMOP Ground Truth table of ADEs. Our\u0000implementation leverages the Langroid multi-agent LLM framework and can be\u0000found at https://github.com/jihyechoi77/malade.","PeriodicalId":501266,"journal":{"name":"arXiv - QuanBio - Quantitative Methods","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-08-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141935350","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Saleh Sakib Ahmed, Nahian Shabab, Md. Abul Hassan Samee, M. Sohel Rahman
{"title":"GraphAge: Unleashing the power of Graph Neural Network to Decode Epigenetic Aging","authors":"Saleh Sakib Ahmed, Nahian Shabab, Md. Abul Hassan Samee, M. Sohel Rahman","doi":"arxiv-2408.00984","DOIUrl":"https://doi.org/arxiv-2408.00984","url":null,"abstract":"DNA methylation is a crucial epigenetic marker used in various clocks to\u0000predict epigenetic age. However, many existing clocks fail to account for\u0000crucial information about CpG sites and their interrelationships, such as\u0000co-methylation patterns. We present a novel approach to represent methylation\u0000data as a graph, using methylation values and relevant information about CpG\u0000sites as nodes, and relationships like co-methylation, same gene, and same\u0000chromosome as edges. We then use a Graph Neural Network (GNN) to predict age.\u0000Thus our model, GraphAge, leverages both structural and positional information\u0000for prediction as well as better interpretation. Although we had to train in a\u0000constrained compute setting, GraphAge still showed competitive performance with\u0000a Mean Absolute Error (MAE) of 3.207 and a Mean Squared Error (MSE) of 25.277,\u0000slightly outperforming the current state of the art. Perhaps more importantly,\u0000we utilized GNN explainer for interpretation purposes and were able to unearth\u0000interesting insights (e.g., key CpG sites, pathways, and their relationships\u0000through Methylation Regulated Networks in the context of aging), which were not\u0000possible to 'decode' without leveraging the unique capability of GraphAge to\u0000'encode' various structural relationships. GraphAge has the potential to\u0000consume and utilize all relevant information (if available) about an individual\u0000that relates to the complex process of aging. So, in that sense, it is one of\u0000its kind and can be seen as the first benchmark for a multimodal model that can\u0000incorporate all this information in order to close the gap in our understanding\u0000of the true nature of aging.","PeriodicalId":501266,"journal":{"name":"arXiv - QuanBio - Quantitative Methods","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141935362","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"UMMAN: Unsupervised Multi-graph Merge Adversarial Network for Disease Prediction Based on Intestinal Flora","authors":"Dingkun Liu, Hongjie Zhou, Yilu Qu, Huimei Zhang, Yongdong Xu","doi":"arxiv-2407.21714","DOIUrl":"https://doi.org/arxiv-2407.21714","url":null,"abstract":"The abundance of intestinal flora is closely related to human diseases, but\u0000diseases are not caused by a single gut microbe. Instead, they result from the\u0000complex interplay of numerous microbial entities. This intricate and implicit\u0000connection among gut microbes poses a significant challenge for disease\u0000prediction using abundance information from OTU data. Recently, several methods\u0000have shown potential in predicting corresponding diseases. However, these\u0000methods fail to learn the inner association among gut microbes from different\u0000hosts, leading to unsatisfactory performance. In this paper, we present a novel\u0000architecture, Unsupervised Multi-graph Merge Adversarial Network (UMMAN). UMMAN\u0000can obtain the embeddings of nodes in the Multi-Graph in an unsupervised\u0000scenario, so that it helps learn the multiplex association. Our method is the\u0000first to combine Graph Neural Network with the task of intestinal flora disease\u0000prediction. We employ complex relation-types to construct the Original-Graph\u0000and disrupt the relationships among nodes to generate corresponding\u0000Shuffled-Graph. We introduce the Node Feature Global Integration (NFGI) module\u0000to represent the global features of the graph. Furthermore, we design a joint\u0000loss comprising adversarial loss and hybrid attention loss to ensure that the\u0000real graph embedding aligns closely with the Original-Graph and diverges from\u0000the Shuffled-Graph. Comprehensive experiments on five classical OTU gut\u0000microbiome datasets demonstrate the effectiveness and stability of our method.\u0000(We will release our code soon.)","PeriodicalId":501266,"journal":{"name":"arXiv - QuanBio - Quantitative Methods","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141869854","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Cooperative SIR dynamics as a model for spontaneous blood clot initiation","authors":"Philip Greulich","doi":"arxiv-2408.00039","DOIUrl":"https://doi.org/arxiv-2408.00039","url":null,"abstract":"Blood clotting is an important physiological process to suppress bleeding\u0000upon injury, but when it occurs inadvertently, it can cause thrombosis, which\u0000can lead to life threatening conditions. Hence, understanding the microscopic\u0000mechanistic factors for inadvertent, spontaneous blood clotting, in absence of\u0000a vessel breach, can help in predicting and adverting such conditions. Here, we\u0000present a minimal model -- reminiscent of the SIR model -- for the initiating\u0000stage of spontaneous blood clotting, the collective activation of blood\u0000platelets. This model predicts that in the presence of very small initial\u0000activation signals, macroscopic activation of the platelet population requires\u0000a sufficient degree of heterogeneity of platelet sensitivity. To propagate the\u0000activation signal and achieve collective activation of the bulk platelet\u0000population, it requires the presence of, possibly only few, hyper-sensitive\u0000platelets, but also a sufficient proportion of platelets with intermediate, yet\u0000higher-than-average sensitivity. A comparison with experimental results\u0000demonstrates a qualitative agreement for high platelet signalling activity.","PeriodicalId":501266,"journal":{"name":"arXiv - QuanBio - Quantitative Methods","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141887005","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nima Shoghi, Pooya Shoghi, Anuroop Sriram, Abhishek Das
{"title":"Distribution Learning for Molecular Regression","authors":"Nima Shoghi, Pooya Shoghi, Anuroop Sriram, Abhishek Das","doi":"arxiv-2407.20475","DOIUrl":"https://doi.org/arxiv-2407.20475","url":null,"abstract":"Using \"soft\" targets to improve model performance has been shown to be\u0000effective in classification settings, but the usage of soft targets for\u0000regression is a much less studied topic in machine learning. The existing\u0000literature on the usage of soft targets for regression fails to properly assess\u0000the method's limitations, and empirical evaluation is quite limited. In this\u0000work, we assess the strengths and drawbacks of existing methods when applied to\u0000molecular property regression tasks. Our assessment outlines key biases present\u0000in existing methods and proposes methods to address them, evaluated through\u0000careful ablation studies. We leverage these insights to propose Distributional\u0000Mixture of Experts (DMoE): A model-independent, and data-independent method for\u0000regression which trains a model to predict probability distributions of its\u0000targets. Our proposed loss function combines the cross entropy between\u0000predicted and target distributions and the L1 distance between their expected\u0000values to produce a loss function that is robust to the outlined biases. We\u0000evaluate the performance of DMoE on different molecular property prediction\u0000datasets -- Open Catalyst (OC20), MD17, and QM9 -- across different backbone\u0000model architectures -- SchNet, GemNet, and Graphormer. Our results demonstrate\u0000that the proposed method is a promising alternative to classical regression for\u0000molecular property prediction tasks, showing improvements over baselines on all\u0000datasets and architectures.","PeriodicalId":501266,"journal":{"name":"arXiv - QuanBio - Quantitative Methods","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141869857","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Gal Becker, Jerome Nicolas Janssen, Rotem Kalev-Altman, Dana Meilich, Astar Shitrit, Svetlana Penn, Ram Reifen, Efrat Monsonego Ornan
{"title":"Plant and insect proteins support optimal bone growth and development; Evidences from a pre-clinical model","authors":"Gal Becker, Jerome Nicolas Janssen, Rotem Kalev-Altman, Dana Meilich, Astar Shitrit, Svetlana Penn, Ram Reifen, Efrat Monsonego Ornan","doi":"arxiv-2407.21087","DOIUrl":"https://doi.org/arxiv-2407.21087","url":null,"abstract":"By 2050, the global population will exceed 9 billion, demanding a 70%\u0000increase in food production. Animal proteins alone may not suffice and\u0000contribute to global warming. Alternative proteins such as legumes, algae, and\u0000insects are being explored, but their health impacts are largely unknown. For\u0000this, three-week-old rats were fed diets containing 20% protein from various\u0000sources for six weeks. A casein-based control diet was compared to soy isolate,\u0000spirulina powder, chickpea isolate, chickpea flour, and fly larvae powder.\u0000Except for spirulina, alternative protein groups showed comparable growth\u0000patterns to the casein group. Morphological and mechanical tests of femur bones\u0000matched growth patterns. Caecal 16S analysis highlighted the impact on gut\u0000microbiota diversity. Chickpea flour showed significantly lower\u0000$alpha$-diversity compared with casein and chickpea isolate groups while\u0000chickpea flour, had the greatest distinction in $beta$-diversity. Alternative\u0000protein sources supported optimal growth, but quality and health implications\u0000require further exploration.","PeriodicalId":501266,"journal":{"name":"arXiv - QuanBio - Quantitative Methods","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141869984","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Patterns in soil organic carbon dynamics: integrating microbial activity, chemotaxis and data-driven approaches","authors":"Angela Monti, Fasma Diele, Deborah Lacitignola, Carmela Marangi","doi":"arxiv-2407.20625","DOIUrl":"https://doi.org/arxiv-2407.20625","url":null,"abstract":"Models of soil organic carbon (SOC) frequently overlook the effects of\u0000spatial dimensions and microbiological activities. In this paper, we focus on\u0000two reaction-diffusion chemotaxis models for SOC dynamics, both supporting\u0000chemotaxis-driven instability and exhibiting a variety of spatial patterns as\u0000stripes, spots and hexagons when the microbial chemotactic sensitivity is above\u0000a critical threshold. We use symplectic techniques to numerically approximate\u0000chemotaxis-driven spatial patterns and explore the effectiveness of the\u0000piecewice dynamic mode decomposition (pDMD) to reconstruct them. Our findings\u0000show that pDMD is effective at precisely recreating chemotaxis-driven spatial\u0000patterns, therefore broadening the range of application of the method to\u0000classes of solutions different than Turing patterns. By validating its efficacy\u0000across a wider range of models, this research lays the groundwork for applying\u0000pDMD to experimental spatiotemporal data, advancing predictions crucial for\u0000soil microbial ecology and agricultural sustainability.","PeriodicalId":501266,"journal":{"name":"arXiv - QuanBio - Quantitative Methods","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141869856","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kanad Sen, Saksham Gupta, Abhishek Raj, Alankar Alankar
{"title":"Graph Residual based Method for Molecular Property Prediction","authors":"Kanad Sen, Saksham Gupta, Abhishek Raj, Alankar Alankar","doi":"arxiv-2408.03342","DOIUrl":"https://doi.org/arxiv-2408.03342","url":null,"abstract":"Property prediction of materials has recently been of high interest in the\u0000recent years in the field of material science. Various Physics-based and\u0000Machine Learning models have already been developed, that can give good\u0000results. However, they are not accurate enough and are inadequate for critical\u0000applications. The traditional machine learning models try to predict properties\u0000based on the features extracted from the molecules, which are not easily\u0000available most of the time. In this paper, a recently developed novel Deep\u0000Learning method, the Graph Neural Network (GNN), has been applied, allowing us\u0000to predict properties directly only the Graph-based structures of the\u0000molecules. SMILES (Simplified Molecular Input Line Entry System) representation\u0000of the molecules has been used in the present study as input data format, which\u0000has been further converted into a graph database, which constitutes the\u0000training data. This article highlights the detailed description of the novel\u0000GRU-based methodology to map the inputs that have been used. Emphasis on\u0000highlighting both the regressive property as well as the classification-based\u0000property of the GNN backbone. A detailed description of the Variational\u0000Autoencoder (VAE) and the end-to-end learning method has been given to\u0000highlight the multi-class multi-label property prediction of the backbone. The\u0000results have been compared with standard benchmark datasets as well as some\u0000newly developed datasets. All performance metrics which have been used have\u0000been clearly defined as well as their reason for choice. Keywords: GNN, VAE,\u0000SMILES, multi-label multi-class classification, GRU","PeriodicalId":501266,"journal":{"name":"arXiv - QuanBio - Quantitative Methods","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141935363","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Predicting T-Cell Receptor Specificity","authors":"Tengyao Tu, Wei Zeng, Kun Zhao, Zhenyu Zhang","doi":"arxiv-2407.19349","DOIUrl":"https://doi.org/arxiv-2407.19349","url":null,"abstract":"Researching the specificity of TCR contributes to the development of\u0000immunotherapy and provides new opportunities and strategies for personalized\u0000cancer immunotherapy. Therefore, we established a TCR generative specificity\u0000detection framework consisting of an antigen selector and a TCR classifier\u0000based on the Random Forest algorithm, aiming to efficiently screen out TCRs and\u0000target antigens and achieve TCR specificity prediction. Furthermore, we used\u0000the k-fold validation method to compare the performance of our model with\u0000ordinary deep learning methods. The result proves that adding a classifier to\u0000the model based on the random forest algorithm is very effective, and our model\u0000generally outperforms ordinary deep learning methods. Moreover, we put forward\u0000feasible optimization suggestions for the shortcomings and challenges of our\u0000model found during model implementation.","PeriodicalId":501266,"journal":{"name":"arXiv - QuanBio - Quantitative Methods","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141869858","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}