生物信息学数据输入的改进技术

Technologies (Basel) Pub Date : 2023-11-01 DOI:10.3390/technologies11060154

Lesia Mochurad, Pavlo Horun

{"title":"生物信息学数据输入的改进技术","authors":"Lesia Mochurad, Pavlo Horun","doi":"10.3390/technologies11060154","DOIUrl":null,"url":null,"abstract":"Using existing software technologies for imputing missing genetic data (GD), such as Beagle, HPImpute, Impute, MACH, AlphaPlantImpute, MissForest, and LinkImputeR, has its advantages and disadvantages. The wide range of input parameters and their nonlinear dependence on the target results require a lot of time and effort to find optimal values in each specific case. Thus, optimizing resources for GD imputation and improving its quality is an important current issue for the quality analysis of digitized deoxyribonucleic acid (DNA) samples. This work provides a critical analysis of existing methods and approaches for obtaining high-quality imputed GD. We observed that most of them do not investigate the problem of time and resource costs, which play a significant role in a mass approach. It is also worth noting that the considered articles are often characterized by high development complexity and, at times, unclear (or missing) descriptions of the input parameters for the methods, algorithms, or models under consideration. As a result, two algorithms were developed in this work. The first one aims to optimize the imputation time, allowing for real-time solutions, while the second one aims to improve imputation accuracy by selecting the best results at each iteration. The success of the first algorithm in improving imputation speed ranges from 47% (for small files) to 87% of the time (for medium and larger files), depending on the available resources. For the second algorithm, the accuracy has been improved by about 0.1%. This, in turn, encourages continued research on the latest version of Beagle software, particularly in the selection of optimal input parameters and possibly other models with similar or higher imputation accuracy.","PeriodicalId":472933,"journal":{"name":"Technologies (Basel)","volume":"23 4","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Improvement Technologies for Data Imputation in Bioinformatics\",\"authors\":\"Lesia Mochurad, Pavlo Horun\",\"doi\":\"10.3390/technologies11060154\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Using existing software technologies for imputing missing genetic data (GD), such as Beagle, HPImpute, Impute, MACH, AlphaPlantImpute, MissForest, and LinkImputeR, has its advantages and disadvantages. The wide range of input parameters and their nonlinear dependence on the target results require a lot of time and effort to find optimal values in each specific case. Thus, optimizing resources for GD imputation and improving its quality is an important current issue for the quality analysis of digitized deoxyribonucleic acid (DNA) samples. This work provides a critical analysis of existing methods and approaches for obtaining high-quality imputed GD. We observed that most of them do not investigate the problem of time and resource costs, which play a significant role in a mass approach. It is also worth noting that the considered articles are often characterized by high development complexity and, at times, unclear (or missing) descriptions of the input parameters for the methods, algorithms, or models under consideration. As a result, two algorithms were developed in this work. The first one aims to optimize the imputation time, allowing for real-time solutions, while the second one aims to improve imputation accuracy by selecting the best results at each iteration. The success of the first algorithm in improving imputation speed ranges from 47% (for small files) to 87% of the time (for medium and larger files), depending on the available resources. For the second algorithm, the accuracy has been improved by about 0.1%. This, in turn, encourages continued research on the latest version of Beagle software, particularly in the selection of optimal input parameters and possibly other models with similar or higher imputation accuracy.\",\"PeriodicalId\":472933,\"journal\":{\"name\":\"Technologies (Basel)\",\"volume\":\"23 4\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Technologies (Basel)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3390/technologies11060154\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Technologies (Basel)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/technologies11060154","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

利用Beagle、HPImpute、Impute、MACH、AlphaPlantImpute、MissForest和LinkImputeR等现有软件技术进行缺失基因数据(GD)的输入，各有优缺点。输入参数的广泛范围及其对目标结果的非线性依赖需要大量的时间和精力来找到每个特定情况下的最优值。因此，优化资源、提高质量是当前数字化脱氧核糖核酸(DNA)样品质量分析的重要课题。这项工作为获得高质量的估算GD提供了现有方法和途径的关键分析。我们观察到，他们中的大多数没有调查时间和资源成本问题，这在大规模方法中起着重要作用。值得注意的是，所考虑的文章通常具有开发复杂性高的特点，并且有时对所考虑的方法、算法或模型的输入参数描述不清楚(或缺失)。因此，在这项工作中开发了两种算法。第一个目标是优化插入时间，允许实时解决方案，而第二个目标是通过在每次迭代中选择最佳结果来提高插入精度。根据可用资源的不同，第一种算法在提高插入速度方面的成功率从47%(小文件)到87%(中型和大型文件)不等。对于第二种算法，精度提高了约0.1%。这反过来又鼓励了对最新版本Beagle软件的持续研究，特别是在选择最佳输入参数和可能具有类似或更高输入精度的其他模型方面。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Improvement Technologies for Data Imputation in Bioinformatics

Using existing software technologies for imputing missing genetic data (GD), such as Beagle, HPImpute, Impute, MACH, AlphaPlantImpute, MissForest, and LinkImputeR, has its advantages and disadvantages. The wide range of input parameters and their nonlinear dependence on the target results require a lot of time and effort to find optimal values in each specific case. Thus, optimizing resources for GD imputation and improving its quality is an important current issue for the quality analysis of digitized deoxyribonucleic acid (DNA) samples. This work provides a critical analysis of existing methods and approaches for obtaining high-quality imputed GD. We observed that most of them do not investigate the problem of time and resource costs, which play a significant role in a mass approach. It is also worth noting that the considered articles are often characterized by high development complexity and, at times, unclear (or missing) descriptions of the input parameters for the methods, algorithms, or models under consideration. As a result, two algorithms were developed in this work. The first one aims to optimize the imputation time, allowing for real-time solutions, while the second one aims to improve imputation accuracy by selecting the best results at each iteration. The success of the first algorithm in improving imputation speed ranges from 47% (for small files) to 87% of the time (for medium and larger files), depending on the available resources. For the second algorithm, the accuracy has been improved by about 0.1%. This, in turn, encourages continued research on the latest version of Beagle software, particularly in the selection of optimal input parameters and possibly other models with similar or higher imputation accuracy.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Technologies (Basel)

自引率

0.00%

发文量