Vladimir Sukhov, Aigul Nugmanova, Yury Vorontsov, Parul Mehrotra, Maksim Kleverov, Kodi Ravichandran, Maxim Artyomov, Alexey Sergushichev
{"title":"CORESH:基于基因签名的公共基因表达数据集搜索引擎","authors":"Vladimir Sukhov, Aigul Nugmanova, Yury Vorontsov, Parul Mehrotra, Maksim Kleverov, Kodi Ravichandran, Maxim Artyomov, Alexey Sergushichev","doi":"10.1093/nar/gkaf372","DOIUrl":null,"url":null,"abstract":"Public data repositories like Gene Expression Omnibus (GEO) contain an extensive amount of data from hundreds of thousands of experiments, making them a valuable resource for researchers. A common scenario for utilizing this resource is to show transcriptional similarity of one’s own data to a public dataset as evidence of potentially similar biology. However, when searching for such datasets, researchers are usually limited to keyword-based search, which requires having a specific hypothesis and relies on the presence of high-quality metadata in public datasets. Here, we introduce CORESH, a web server designed to systematically find GEO datasets that match a user-provided gene signature—such as a list of top upregulated genes in response to a treatment—in a data-driven manner. CORESH operates on a compendium of >40 000 human and 40 000 mouse datasets and outputs a ranked list of datasets where the input genes exhibit similar expression patterns. The discovered datasets can then be used to identify experimental conditions associated with the activation of the query signature, offering insights into underlying biological mechanisms and guiding experimental validation. CORESH is freely accessible at https://alserglab.wustl.edu/coresh/, requires no login, and is regularly updated with the latest GEO data.","PeriodicalId":19471,"journal":{"name":"Nucleic Acids Research","volume":"69 1","pages":""},"PeriodicalIF":13.1000,"publicationDate":"2025-05-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"CORESH: a gene signature-based search engine for public gene expression datasets\",\"authors\":\"Vladimir Sukhov, Aigul Nugmanova, Yury Vorontsov, Parul Mehrotra, Maksim Kleverov, Kodi Ravichandran, Maxim Artyomov, Alexey Sergushichev\",\"doi\":\"10.1093/nar/gkaf372\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Public data repositories like Gene Expression Omnibus (GEO) contain an extensive amount of data from hundreds of thousands of experiments, making them a valuable resource for researchers. A common scenario for utilizing this resource is to show transcriptional similarity of one’s own data to a public dataset as evidence of potentially similar biology. However, when searching for such datasets, researchers are usually limited to keyword-based search, which requires having a specific hypothesis and relies on the presence of high-quality metadata in public datasets. Here, we introduce CORESH, a web server designed to systematically find GEO datasets that match a user-provided gene signature—such as a list of top upregulated genes in response to a treatment—in a data-driven manner. CORESH operates on a compendium of >40 000 human and 40 000 mouse datasets and outputs a ranked list of datasets where the input genes exhibit similar expression patterns. The discovered datasets can then be used to identify experimental conditions associated with the activation of the query signature, offering insights into underlying biological mechanisms and guiding experimental validation. CORESH is freely accessible at https://alserglab.wustl.edu/coresh/, requires no login, and is regularly updated with the latest GEO data.\",\"PeriodicalId\":19471,\"journal\":{\"name\":\"Nucleic Acids Research\",\"volume\":\"69 1\",\"pages\":\"\"},\"PeriodicalIF\":13.1000,\"publicationDate\":\"2025-05-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Nucleic Acids Research\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1093/nar/gkaf372\",\"RegionNum\":2,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"BIOCHEMISTRY & MOLECULAR BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Nucleic Acids Research","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/nar/gkaf372","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0
摘要
像Gene Expression Omnibus (GEO)这样的公共数据存储库包含了来自数十万个实验的大量数据,使它们成为研究人员的宝贵资源。利用这种资源的一个常见场景是显示自己的数据与公共数据集的转录相似性,作为潜在相似生物学的证据。然而,在搜索这些数据集时,研究人员通常仅限于基于关键字的搜索,这需要有一个特定的假设,并依赖于公共数据集中存在高质量的元数据。在这里,我们介绍了CORESH,这是一个web服务器,旨在以数据驱动的方式系统地查找与用户提供的基因签名相匹配的GEO数据集,例如响应治疗的顶级上调基因列表。CORESH对40000个人类和40000个小鼠数据集进行操作,并输出输入基因表现出相似表达模式的数据集的排序列表。然后,发现的数据集可用于识别与查询签名激活相关的实验条件,从而深入了解潜在的生物学机制并指导实验验证。CORESH可以在https://alserglab.wustl.edu/coresh/免费访问,无需登录,并定期更新最新的GEO数据。
CORESH: a gene signature-based search engine for public gene expression datasets
Public data repositories like Gene Expression Omnibus (GEO) contain an extensive amount of data from hundreds of thousands of experiments, making them a valuable resource for researchers. A common scenario for utilizing this resource is to show transcriptional similarity of one’s own data to a public dataset as evidence of potentially similar biology. However, when searching for such datasets, researchers are usually limited to keyword-based search, which requires having a specific hypothesis and relies on the presence of high-quality metadata in public datasets. Here, we introduce CORESH, a web server designed to systematically find GEO datasets that match a user-provided gene signature—such as a list of top upregulated genes in response to a treatment—in a data-driven manner. CORESH operates on a compendium of >40 000 human and 40 000 mouse datasets and outputs a ranked list of datasets where the input genes exhibit similar expression patterns. The discovered datasets can then be used to identify experimental conditions associated with the activation of the query signature, offering insights into underlying biological mechanisms and guiding experimental validation. CORESH is freely accessible at https://alserglab.wustl.edu/coresh/, requires no login, and is regularly updated with the latest GEO data.
期刊介绍:
Nucleic Acids Research (NAR) is a scientific journal that publishes research on various aspects of nucleic acids and proteins involved in nucleic acid metabolism and interactions. It covers areas such as chemistry and synthetic biology, computational biology, gene regulation, chromatin and epigenetics, genome integrity, repair and replication, genomics, molecular biology, nucleic acid enzymes, RNA, and structural biology. The journal also includes a Survey and Summary section for brief reviews. Additionally, each year, the first issue is dedicated to biological databases, and an issue in July focuses on web-based software resources for the biological community. Nucleic Acids Research is indexed by several services including Abstracts on Hygiene and Communicable Diseases, Animal Breeding Abstracts, Agricultural Engineering Abstracts, Agbiotech News and Information, BIOSIS Previews, CAB Abstracts, and EMBASE.