Yi Yang;John P. Lalor;Ahmed Abbasi;Daniel Dajun Zeng
{"title":"Hierarchical Deep Document Model","authors":"Yi Yang;John P. Lalor;Ahmed Abbasi;Daniel Dajun Zeng","doi":"10.1109/TKDE.2024.3487523","DOIUrl":null,"url":null,"abstract":"Topic modeling is a commonly used text analysis tool for discovering latent topics in a text corpus. However, while topics in a text corpus often exhibit a hierarchical structure (e.g., cellphone is a sub-topic of electronics), most topic modeling methods assume a flat topic structure that ignores the hierarchical dependency among topics, or utilize a predefined topic hierarchy. In this work, we present a novel Hierarchical Deep Document Model (HDDM) to learn topic hierarchies using a variational autoencoder framework. We propose a novel objective function, sum of log likelihood, instead of the widely used evidence lower bound, to facilitate the learning of hierarchical latent topic structure. The proposed objective function can directly model and optimize the hierarchical topic-word distributions at all topic levels. We conduct experiments on four real-world text datasets to evaluate the topic modeling capability of the proposed HDDM method compared to state-of-the-art hierarchical topic modeling benchmarks. Experimental results show that HDDM achieves considerable improvement over benchmarks and is capable of learning meaningful topics and topic hierarchies. To further demonstrate the practical utility of HDDM, we apply it to a real-world medical notes dataset for clinical prediction. Experimental results show that HDDM can better summarize topics in medical notes, resulting in more accurate clinical predictions.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 1","pages":"351-364"},"PeriodicalIF":8.9000,"publicationDate":"2024-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Knowledge and Data Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10737364/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Topic modeling is a commonly used text analysis tool for discovering latent topics in a text corpus. However, while topics in a text corpus often exhibit a hierarchical structure (e.g., cellphone is a sub-topic of electronics), most topic modeling methods assume a flat topic structure that ignores the hierarchical dependency among topics, or utilize a predefined topic hierarchy. In this work, we present a novel Hierarchical Deep Document Model (HDDM) to learn topic hierarchies using a variational autoencoder framework. We propose a novel objective function, sum of log likelihood, instead of the widely used evidence lower bound, to facilitate the learning of hierarchical latent topic structure. The proposed objective function can directly model and optimize the hierarchical topic-word distributions at all topic levels. We conduct experiments on four real-world text datasets to evaluate the topic modeling capability of the proposed HDDM method compared to state-of-the-art hierarchical topic modeling benchmarks. Experimental results show that HDDM achieves considerable improvement over benchmarks and is capable of learning meaningful topics and topic hierarchies. To further demonstrate the practical utility of HDDM, we apply it to a real-world medical notes dataset for clinical prediction. Experimental results show that HDDM can better summarize topics in medical notes, resulting in more accurate clinical predictions.
期刊介绍:
The IEEE Transactions on Knowledge and Data Engineering encompasses knowledge and data engineering aspects within computer science, artificial intelligence, electrical engineering, computer engineering, and related fields. It provides an interdisciplinary platform for disseminating new developments in knowledge and data engineering and explores the practicality of these concepts in both hardware and software. Specific areas covered include knowledge-based and expert systems, AI techniques for knowledge and data management, tools, and methodologies, distributed processing, real-time systems, architectures, data management practices, database design, query languages, security, fault tolerance, statistical databases, algorithms, performance evaluation, and applications.