自动峰标注和面积估计聚糖图峰直接从色谱

IF 2.3 4区化学 Q1 SOCIAL WORK

Journal of Chemometrics Pub Date : 2023-10-24 DOI:10.1002/cem.3521

Domen Hudnik, Naja Bohanec, Igor Drobnak, Peter Ernst, Alexander Hanke, Matej Horvat, Franz Innerbichler, Miha Mikelj, Tilen Praper, Vasja Progar, Nika Valenčič, Matjaž Omladič

{"title":"自动峰标注和面积估计聚糖图峰直接从色谱","authors":"Domen Hudnik, Naja Bohanec, Igor Drobnak, Peter Ernst, Alexander Hanke, Matej Horvat, Franz Innerbichler, Miha Mikelj, Tilen Praper, Vasja Progar, Nika Valenčič, Matjaž Omladič","doi":"10.1002/cem.3521","DOIUrl":null,"url":null,"abstract":"<p>The present bottleneck in biosimilar bioprocess development has become evaluation of analytical results, due to recent advances in analytics, such as automated sample preparation and development of high-throughput methods. Currently automated chromatogram integration and annotation is only efficient for simple chromatograms. In an ever more competitive field of biosimilars, this represents a serious drawback because chromatographic analytical methods that provide some of the most valuable physicochemical quality attributes of the product also require careful chromatogram integration and annotation. This work focuses on the glycan mapping analytical method as utilized in development of monoclonal antibody biosimilars, evaluating more than 2000 chromatograms spanning the life cycle of multiple biosimilar development projects. It proposes a modified workflow by implementing automatic machine learning algorithms to determine the proportion of specific relevant glycan species in a sample directly from the chromatogram. Data preparation and analysis is performed using a pipeline approach. Pipeline is a modular design of data processing where signal “travels” through various active modules in a series. Each module performs a specific function or transformation on the signal and propagates the transformed signal to the next module. The pipeline is designed in a way that modules can be independently improved and exchanged. Module functions currently implemented are chromatogram resampling by spline interpolation, baseline removal by asymmetric least squares, peak alignment using parametric time warping, and quantification of the relative proportion of a glycan species using partial least squares regression. Hyper-parameters of the pipeline are then optimized using the Nelder–Mead method. The approach stands out for its ability to accommodate a broad landscape of samples, covering multiple different proteins in different stages of biosimilar development, analyzed using different adaptations of the glycan map analytical method. The pipeline presents an intuitive, flexible, and creatively simple method design capable of providing reliable results for a wide range of glycan species essential for biosimilar development. It enables transparent, faster, and less subjective evaluation of analytic raw data (from sample to result). Furthermore, our automated approach maintained an accuracy comparable with manual integration thus demonstrating its readiness for implementation in the conservative and highly regulated environment. The presented methodology reduces the cost and time of biosimilar development and should be applicable for any chromatogram-based analytical method.</p>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"37 12","pages":""},"PeriodicalIF":2.3000,"publicationDate":"2023-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Automatic peak annotation and area estimation of glycan map peaks directly from chromatograms\",\"authors\":\"Domen Hudnik, Naja Bohanec, Igor Drobnak, Peter Ernst, Alexander Hanke, Matej Horvat, Franz Innerbichler, Miha Mikelj, Tilen Praper, Vasja Progar, Nika Valenčič, Matjaž Omladič\",\"doi\":\"10.1002/cem.3521\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>The present bottleneck in biosimilar bioprocess development has become evaluation of analytical results, due to recent advances in analytics, such as automated sample preparation and development of high-throughput methods. Currently automated chromatogram integration and annotation is only efficient for simple chromatograms. In an ever more competitive field of biosimilars, this represents a serious drawback because chromatographic analytical methods that provide some of the most valuable physicochemical quality attributes of the product also require careful chromatogram integration and annotation. This work focuses on the glycan mapping analytical method as utilized in development of monoclonal antibody biosimilars, evaluating more than 2000 chromatograms spanning the life cycle of multiple biosimilar development projects. It proposes a modified workflow by implementing automatic machine learning algorithms to determine the proportion of specific relevant glycan species in a sample directly from the chromatogram. Data preparation and analysis is performed using a pipeline approach. Pipeline is a modular design of data processing where signal “travels” through various active modules in a series. Each module performs a specific function or transformation on the signal and propagates the transformed signal to the next module. The pipeline is designed in a way that modules can be independently improved and exchanged. Module functions currently implemented are chromatogram resampling by spline interpolation, baseline removal by asymmetric least squares, peak alignment using parametric time warping, and quantification of the relative proportion of a glycan species using partial least squares regression. Hyper-parameters of the pipeline are then optimized using the Nelder–Mead method. The approach stands out for its ability to accommodate a broad landscape of samples, covering multiple different proteins in different stages of biosimilar development, analyzed using different adaptations of the glycan map analytical method. The pipeline presents an intuitive, flexible, and creatively simple method design capable of providing reliable results for a wide range of glycan species essential for biosimilar development. It enables transparent, faster, and less subjective evaluation of analytic raw data (from sample to result). Furthermore, our automated approach maintained an accuracy comparable with manual integration thus demonstrating its readiness for implementation in the conservative and highly regulated environment. The presented methodology reduces the cost and time of biosimilar development and should be applicable for any chromatogram-based analytical method.</p>\",\"PeriodicalId\":15274,\"journal\":{\"name\":\"Journal of Chemometrics\",\"volume\":\"37 12\",\"pages\":\"\"},\"PeriodicalIF\":2.3000,\"publicationDate\":\"2023-10-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Chemometrics\",\"FirstCategoryId\":\"92\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/cem.3521\",\"RegionNum\":4,\"RegionCategory\":\"化学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"SOCIAL WORK\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Chemometrics","FirstCategoryId":"92","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/cem.3521","RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"SOCIAL WORK","Score":null,"Total":0}

引用次数: 0

摘要

由于最近分析技术的进步，例如自动化样品制备和高通量方法的发展，目前生物类似药生物工艺发展的瓶颈已经成为分析结果的评估。目前，自动化的色谱集成和注释仅对简单的色谱有效。在竞争日益激烈的生物仿制药领域，这代表了一个严重的缺点，因为色谱分析方法提供了产品的一些最有价值的物理化学质量属性，也需要仔细的色谱整合和注释。本研究的重点是用于单克隆抗体生物类似药开发的聚糖定位分析方法，评估了跨越多个生物类似药开发项目生命周期的2000多个色谱图。它提出了一种改进的工作流程，通过实现自动机器学习算法来直接从色谱中确定样品中特定相关聚糖物种的比例。数据准备和分析使用管道方法执行。流水线是一种数据处理的模块化设计，其中信号通过一系列的各种有源模块“传播”。每个模块对信号执行特定的功能或转换，并将转换后的信号传播给下一个模块。该管道的设计方式使模块可以独立地改进和交换。目前实现的模块功能包括样条插值法的色谱重采样，非对称最小二乘法的基线去除，参数时间扭曲法的峰对齐，以及偏最小二乘回归法的聚糖种类相对比例量化。然后使用Nelder-Mead方法对管道的超参数进行优化。该方法因其适应广泛样品的能力而脱颖而出，涵盖了生物类似药开发不同阶段的多种不同蛋白质，使用不同的聚糖图分析方法进行分析。该管道提供了一种直观、灵活和创造性的简单方法设计，能够为生物类似药开发所需的广泛聚糖物种提供可靠的结果。它支持对分析原始数据(从样本到结果)进行透明、快速和较少主观的评估。此外，我们的自动化方法保持了与人工集成相当的准确性，从而证明了它可以在保守和高度监管的环境中实现。该方法降低了生物类似药开发的成本和时间，适用于任何基于色谱的分析方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Automatic peak annotation and area estimation of glycan map peaks directly from chromatograms

The present bottleneck in biosimilar bioprocess development has become evaluation of analytical results, due to recent advances in analytics, such as automated sample preparation and development of high-throughput methods. Currently automated chromatogram integration and annotation is only efficient for simple chromatograms. In an ever more competitive field of biosimilars, this represents a serious drawback because chromatographic analytical methods that provide some of the most valuable physicochemical quality attributes of the product also require careful chromatogram integration and annotation. This work focuses on the glycan mapping analytical method as utilized in development of monoclonal antibody biosimilars, evaluating more than 2000 chromatograms spanning the life cycle of multiple biosimilar development projects. It proposes a modified workflow by implementing automatic machine learning algorithms to determine the proportion of specific relevant glycan species in a sample directly from the chromatogram. Data preparation and analysis is performed using a pipeline approach. Pipeline is a modular design of data processing where signal “travels” through various active modules in a series. Each module performs a specific function or transformation on the signal and propagates the transformed signal to the next module. The pipeline is designed in a way that modules can be independently improved and exchanged. Module functions currently implemented are chromatogram resampling by spline interpolation, baseline removal by asymmetric least squares, peak alignment using parametric time warping, and quantification of the relative proportion of a glycan species using partial least squares regression. Hyper-parameters of the pipeline are then optimized using the Nelder–Mead method. The approach stands out for its ability to accommodate a broad landscape of samples, covering multiple different proteins in different stages of biosimilar development, analyzed using different adaptations of the glycan map analytical method. The pipeline presents an intuitive, flexible, and creatively simple method design capable of providing reliable results for a wide range of glycan species essential for biosimilar development. It enables transparent, faster, and less subjective evaluation of analytic raw data (from sample to result). Furthermore, our automated approach maintained an accuracy comparable with manual integration thus demonstrating its readiness for implementation in the conservative and highly regulated environment. The presented methodology reduces the cost and time of biosimilar development and should be applicable for any chromatogram-based analytical method.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Chemometrics 化学-分析化学

CiteScore

5.20

自引率

8.30%

发文量

审稿时长

2 months

期刊介绍： The Journal of Chemometrics is devoted to the rapid publication of original scientific papers, reviews and short communications on fundamental and applied aspects of chemometrics. It also provides a forum for the exchange of information on meetings and other news relevant to the growing community of scientists who are interested in chemometrics and its applications. Short, critical review papers are a particularly important feature of the journal, in view of the multidisciplinary readership at which it is aimed.