Guided optimization of ToxPi model weights using a Semi-Automated approach

IF 3.1 Q2 TOXICOLOGY

Computational Toxicology Pub Date : 2023-12-09 DOI:10.1016/j.comtox.2023.100294

Jonathon F. Fleming , John S. House , Jessie R. Chappel , Alison A. Motsinger-Reif , David M. Reif

{"title":"Guided optimization of ToxPi model weights using a Semi-Automated approach","authors":"Jonathon F. Fleming , John S. House , Jessie R. Chappel , Alison A. Motsinger-Reif , David M. Reif","doi":"10.1016/j.comtox.2023.100294","DOIUrl":null,"url":null,"abstract":"<div><p>The Toxicological Prioritization Index (ToxPi) is a visual analysis and decision support tool for dimension reduction and visualization of high throughput, multi-dimensional feature data. ToxPi was originally developed for assessing the relative toxicity of multiple chemicals or stressors by synthesizing complex toxicological data to provide a single comprehensive view of the potential health effects. It continues to be used for profiling chemicals and has since been applied to other types of “sample” entities, including geospatial (e.g. county-level Covid-19 risk and sites of historical PFAS exposure) and other profiling applications. For any set of features (data collected on a set of sample entities), ToxPi integrates the data into a set of weighted slices that provide a visual profile and a score metric for comparison. This scoring system is highly dependent on user-provided feature weights, yet users often lack knowledge of how to define these feature weights. Common methods for predicting feature weights are generally unusable due to inappropriate statistical assumptions and lack of global distributional expectation. However, users often have an inherent understanding of expected results for a small subset of samples. For example, in chemical toxicity, prior knowledge can often place subsets of chemicals into categories of low, moderate or high toxicity (reference chemicals). Ordinal regression can be used to predict weights based on these response levels that are applicable to the entire feature set, analogous to using positive and negative controls to contextualize an empirical distribution. We propose a semi-supervised method utilizing ordinal regression to predict a set of feature weights that produces the best fit for the known response (“reference”) data and subsequently fine-tunes the weights via a customized genetic algorithm. We conduct a simulation study to show when this method can improve the results of ordinal regression, allowing for accurate feature weight prediction and sample ranking in scenarios with minimal response data. To ground-truth the guided weight optimization, we test this method on published data to build a ToxPi model for comparison against expert-knowledge-driven weight assignments.</p></div>","PeriodicalId":37651,"journal":{"name":"Computational Toxicology","volume":null,"pages":null},"PeriodicalIF":3.1000,"publicationDate":"2023-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational Toxicology","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S246811132300035X","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"TOXICOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

The Toxicological Prioritization Index (ToxPi) is a visual analysis and decision support tool for dimension reduction and visualization of high throughput, multi-dimensional feature data. ToxPi was originally developed for assessing the relative toxicity of multiple chemicals or stressors by synthesizing complex toxicological data to provide a single comprehensive view of the potential health effects. It continues to be used for profiling chemicals and has since been applied to other types of “sample” entities, including geospatial (e.g. county-level Covid-19 risk and sites of historical PFAS exposure) and other profiling applications. For any set of features (data collected on a set of sample entities), ToxPi integrates the data into a set of weighted slices that provide a visual profile and a score metric for comparison. This scoring system is highly dependent on user-provided feature weights, yet users often lack knowledge of how to define these feature weights. Common methods for predicting feature weights are generally unusable due to inappropriate statistical assumptions and lack of global distributional expectation. However, users often have an inherent understanding of expected results for a small subset of samples. For example, in chemical toxicity, prior knowledge can often place subsets of chemicals into categories of low, moderate or high toxicity (reference chemicals). Ordinal regression can be used to predict weights based on these response levels that are applicable to the entire feature set, analogous to using positive and negative controls to contextualize an empirical distribution. We propose a semi-supervised method utilizing ordinal regression to predict a set of feature weights that produces the best fit for the known response (“reference”) data and subsequently fine-tunes the weights via a customized genetic algorithm. We conduct a simulation study to show when this method can improve the results of ordinal regression, allowing for accurate feature weight prediction and sample ranking in scenarios with minimal response data. To ground-truth the guided weight optimization, we test this method on published data to build a ToxPi model for comparison against expert-knowledge-driven weight assignments.

查看原文本刊更多论文

利用半自动方法指导优化 ToxPi 模型权重

毒理学优先指数（ToxPi）是一种可视化分析和决策支持工具，用于对高通量、多维特征数据进行降维和可视化。ToxPi 最初是通过综合复杂的毒理学数据来评估多种化学品或压力源的相对毒性，从而提供潜在健康影响的单一综合视图。现在，它仍被用于化学品剖析，并已应用于其他类型的 "样本 "实体，包括地理空间（如县级 Covid-19 风险和历史 PFAS 暴露地点）和其他剖析应用。对于任何一组特征（在一组样本实体上收集的数据），ToxPi 都会将数据整合到一组加权切片中，从而提供可视化的剖面图和用于比较的评分标准。该评分系统高度依赖于用户提供的特征权重，但用户往往不知道如何定义这些特征权重。由于不恰当的统计假设和缺乏全局分布预期，预测特征权重的常用方法通常无法使用。不过，用户往往对一小部分样本的预期结果有固有的理解。例如，在化学毒性方面，先验知识通常可将化学品子集归入低、中或高毒性类别（参考化学品）。正序回归可用于预测基于这些响应水平的权重，这些权重适用于整个特征集，类似于使用正负对照来确定经验分布的背景。我们提出了一种半监督方法，利用序数回归来预测一组对已知响应（"参考"）数据具有最佳拟合效果的特征权重，然后通过定制的遗传算法对权重进行微调。我们进行了一项模拟研究，以说明这种方法何时能改善序数回归的结果，从而在响应数据极少的情况下准确预测特征权重并进行样本排序。为了验证引导式权重优化，我们在已发布的数据上测试了这种方法，以建立一个 ToxPi 模型，与专家知识驱动的权重分配进行比较。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Computational Toxicology Computer Science-Computer Science Applications

CiteScore

5.50

自引率

0.00%

发文量

审稿时长

56 days

期刊介绍： Computational Toxicology is an international journal publishing computational approaches that assist in the toxicological evaluation of new and existing chemical substances assisting in their safety assessment. -All effects relating to human health and environmental toxicity and fate -Prediction of toxicity, metabolism, fate and physico-chemical properties -The development of models from read-across, (Q)SARs, PBPK, QIVIVE, Multi-Scale Models -Big Data in toxicology: integration, management, analysis -Implementation of models through AOPs, IATA, TTC -Regulatory acceptance of models: evaluation, verification and validation -From metals, to small organic molecules to nanoparticles -Pharmaceuticals, pesticides, foods, cosmetics, fine chemicals -Bringing together the views of industry, regulators, academia, NGOs