MyESL: A Software for Evolutionary Sparse Learning in Molecular Phylogenetics and Genomics.

IF 5.3 1区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY
Maxwell Sanderford, Sudip Sharma, Glen Stecher, Michael Suleski, Jun Liu, Jieping Ye, Sudhir Kumar
{"title":"MyESL: A Software for Evolutionary Sparse Learning in Molecular Phylogenetics and Genomics.","authors":"Maxwell Sanderford, Sudip Sharma, Glen Stecher, Michael Suleski, Jun Liu, Jieping Ye, Sudhir Kumar","doi":"10.1093/molbev/msaf224","DOIUrl":null,"url":null,"abstract":"<p><p>Evolutionary sparse learning uses supervised machine learning to build evolutionary models where genomic sites loci are parameters. It uses the Least Absolute Shrinkage and Selection Operator with bi-level sparsity to connect a specific phylogenetic hypothesis with sequence variation across genomic loci. The MyESL software addresses the need for open-source tools to perform evolutionary sparse learning analyses, offering features to preprocess input phylogenomic alignments, post-process output models to generate molecular evolutionary metrics, and make Least Absolute Shrinkage and Selection Operator regression adaptable and efficient for phylogenetic trees and alignments. The core of MyESL, which constructs models with logistic regressions using bi-level sparsity, is written in C++. Its input data preprocessing and result post-processing tools are developed in Python. Compared to other tools, MyESL is more computationally efficient and provides evolution-friendly inputs and outputs. These features have already enabled the use of MyESL in two phylogenomic applications, one to identify outlier sequences and fragile clades in inferred phylogenies and another to build genetic models of convergent traits. In addition to the use in a Python environment, MyESL is available as a standalone executable compatible across multiple platforms, which can be directly integrated into scripts and third-party software. The source code, executable, and documentation for MyESL are openly accessible at https://github.com/kumarlabgit/MyESL.</p>","PeriodicalId":18730,"journal":{"name":"Molecular biology and evolution","volume":" ","pages":""},"PeriodicalIF":5.3000,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12498521/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Molecular biology and evolution","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/molbev/msaf224","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Evolutionary sparse learning uses supervised machine learning to build evolutionary models where genomic sites loci are parameters. It uses the Least Absolute Shrinkage and Selection Operator with bi-level sparsity to connect a specific phylogenetic hypothesis with sequence variation across genomic loci. The MyESL software addresses the need for open-source tools to perform evolutionary sparse learning analyses, offering features to preprocess input phylogenomic alignments, post-process output models to generate molecular evolutionary metrics, and make Least Absolute Shrinkage and Selection Operator regression adaptable and efficient for phylogenetic trees and alignments. The core of MyESL, which constructs models with logistic regressions using bi-level sparsity, is written in C++. Its input data preprocessing and result post-processing tools are developed in Python. Compared to other tools, MyESL is more computationally efficient and provides evolution-friendly inputs and outputs. These features have already enabled the use of MyESL in two phylogenomic applications, one to identify outlier sequences and fragile clades in inferred phylogenies and another to build genetic models of convergent traits. In addition to the use in a Python environment, MyESL is available as a standalone executable compatible across multiple platforms, which can be directly integrated into scripts and third-party software. The source code, executable, and documentation for MyESL are openly accessible at https://github.com/kumarlabgit/MyESL.

分子系统发育学和基因组学中的进化稀疏学习软件。
进化稀疏学习(ESL)使用有监督的机器学习来构建以基因组位点和基因座为参数的进化模型。它使用具有双水平稀疏性的最小绝对收缩和选择算子(LASSO)将特定的系统发育假设与基因组位点的序列变异联系起来。MyESL软件解决了执行ESL分析的开源工具的需求,提供了预处理输入系统基因组比对、后处理输出模型生成分子进化指标的功能,并使LASSO回归适应和有效地用于系统发育树和比对。MyESL的核心是用c++编写的,它使用双级稀疏性构建逻辑回归模型。它的输入数据预处理和结果后处理工具是用Python开发的。与其他工具相比,MyESL的计算效率更高,并提供进化友好的输入和输出选项。这些特征已经使MyESL在两个系统发育学应用中得以使用,一个用于识别推断系统发育中的异常序列和脆弱枝,另一个用于建立趋同性状的遗传模型。除了在Python环境中使用之外,MyESL还可以作为一个独立的可执行文件在多个平台上兼容,并且可以直接集成到脚本和第三方软件中。MyESL的源代码、可执行文件和文档可以在https://github.com/kumarlabgit/MyESL上公开访问。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Molecular biology and evolution
Molecular biology and evolution 生物-进化生物学
CiteScore
19.70
自引率
3.70%
发文量
257
审稿时长
1 months
期刊介绍: Molecular Biology and Evolution Journal Overview: Publishes research at the interface of molecular (including genomics) and evolutionary biology Considers manuscripts containing patterns, processes, and predictions at all levels of organization: population, taxonomic, functional, and phenotypic Interested in fundamental discoveries, new and improved methods, resources, technologies, and theories advancing evolutionary research Publishes balanced reviews of recent developments in genome evolution and forward-looking perspectives suggesting future directions in molecular evolution applications.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信