Seojin Kim, Jaehyun Nam, Sihyun Yu, Younghoon Shin, Jinwoo Shin
{"title":"利用层次化文本反演进行高效数据分子生成","authors":"Seojin Kim, Jaehyun Nam, Sihyun Yu, Younghoon Shin, Jinwoo Shin","doi":"arxiv-2405.02845","DOIUrl":null,"url":null,"abstract":"Developing an effective molecular generation framework even with a limited\nnumber of molecules is often important for its practical deployment, e.g., drug\ndiscovery, since acquiring task-related molecular data requires expensive and\ntime-consuming experimental costs. To tackle this issue, we introduce\nHierarchical textual Inversion for Molecular generation (HI-Mol), a novel\ndata-efficient molecular generation method. HI-Mol is inspired by the\nimportance of hierarchical information, e.g., both coarse- and fine-grained\nfeatures, in understanding the molecule distribution. We propose to use\nmulti-level embeddings to reflect such hierarchical features based on the\nadoption of the recent textual inversion technique in the visual domain, which\nachieves data-efficient image generation. Compared to the conventional textual\ninversion method in the image domain using a single-level token embedding, our\nmulti-level token embeddings allow the model to effectively learn the\nunderlying low-shot molecule distribution. We then generate molecules based on\nthe interpolation of the multi-level token embeddings. Extensive experiments\ndemonstrate the superiority of HI-Mol with notable data-efficiency. For\ninstance, on QM9, HI-Mol outperforms the prior state-of-the-art method with 50x\nless training data. We also show the effectiveness of molecules generated by\nHI-Mol in low-shot molecular property prediction.","PeriodicalId":501325,"journal":{"name":"arXiv - QuanBio - Molecular Networks","volume":"27 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-05-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Data-Efficient Molecular Generation with Hierarchical Textual Inversion\",\"authors\":\"Seojin Kim, Jaehyun Nam, Sihyun Yu, Younghoon Shin, Jinwoo Shin\",\"doi\":\"arxiv-2405.02845\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Developing an effective molecular generation framework even with a limited\\nnumber of molecules is often important for its practical deployment, e.g., drug\\ndiscovery, since acquiring task-related molecular data requires expensive and\\ntime-consuming experimental costs. To tackle this issue, we introduce\\nHierarchical textual Inversion for Molecular generation (HI-Mol), a novel\\ndata-efficient molecular generation method. HI-Mol is inspired by the\\nimportance of hierarchical information, e.g., both coarse- and fine-grained\\nfeatures, in understanding the molecule distribution. We propose to use\\nmulti-level embeddings to reflect such hierarchical features based on the\\nadoption of the recent textual inversion technique in the visual domain, which\\nachieves data-efficient image generation. Compared to the conventional textual\\ninversion method in the image domain using a single-level token embedding, our\\nmulti-level token embeddings allow the model to effectively learn the\\nunderlying low-shot molecule distribution. We then generate molecules based on\\nthe interpolation of the multi-level token embeddings. Extensive experiments\\ndemonstrate the superiority of HI-Mol with notable data-efficiency. For\\ninstance, on QM9, HI-Mol outperforms the prior state-of-the-art method with 50x\\nless training data. We also show the effectiveness of molecules generated by\\nHI-Mol in low-shot molecular property prediction.\",\"PeriodicalId\":501325,\"journal\":{\"name\":\"arXiv - QuanBio - Molecular Networks\",\"volume\":\"27 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-05-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - QuanBio - Molecular Networks\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2405.02845\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - QuanBio - Molecular Networks","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2405.02845","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Data-Efficient Molecular Generation with Hierarchical Textual Inversion
Developing an effective molecular generation framework even with a limited
number of molecules is often important for its practical deployment, e.g., drug
discovery, since acquiring task-related molecular data requires expensive and
time-consuming experimental costs. To tackle this issue, we introduce
Hierarchical textual Inversion for Molecular generation (HI-Mol), a novel
data-efficient molecular generation method. HI-Mol is inspired by the
importance of hierarchical information, e.g., both coarse- and fine-grained
features, in understanding the molecule distribution. We propose to use
multi-level embeddings to reflect such hierarchical features based on the
adoption of the recent textual inversion technique in the visual domain, which
achieves data-efficient image generation. Compared to the conventional textual
inversion method in the image domain using a single-level token embedding, our
multi-level token embeddings allow the model to effectively learn the
underlying low-shot molecule distribution. We then generate molecules based on
the interpolation of the multi-level token embeddings. Extensive experiments
demonstrate the superiority of HI-Mol with notable data-efficiency. For
instance, on QM9, HI-Mol outperforms the prior state-of-the-art method with 50x
less training data. We also show the effectiveness of molecules generated by
HI-Mol in low-shot molecular property prediction.