用主题模型分析自闭症谱系障碍的历史

2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA) Pub Date : 2016-10-01 DOI:10.1109/DSAA.2016.65

Adham Beykikhoshk, Dinh Q. Phung, Ognjen Arandjelovic, S. Venkatesh

{"title":"用主题模型分析自闭症谱系障碍的历史","authors":"Adham Beykikhoshk, Dinh Q. Phung, Ognjen Arandjelovic, S. Venkatesh","doi":"10.1109/DSAA.2016.65","DOIUrl":null,"url":null,"abstract":"We describe a novel framework for the discovery of underlying topics of a longitudinal collection of scholarly data, and the tracking of their lifetime and popularity over time. Unlike the social media or news data where the underlying topics evolve over time, the topic nuances in science result in new scientific directions to emerge. Therefore, we model the longitudinal literature data with a new approach that uses topics which remain identifiable over the course of time. Current studies either disregard the time dimension or treat it as an exchangeable covariate when they fix the topics over time or do not share the topics over epochs when they model the time naturally. We address these issues by adopting a non-parametric Bayesian approach. We assume the data is partially exchangeable and divide it into consecutive epochs. Then, by fixing the topics in a recurrent Chinese restaurant franchise, we impose a static topical structure on the corpus such that the topics are shared across epochs and the documents within epochs. We demonstrate the effectiveness of the proposed framework on a collection of medical literature related to autism spectrum disorder. We collect a large corpus of publications and carefully examine two important research issues of the domain as case studies. Moreover, we make the results of our experiment and the source code of the model, freely available to the public. This aids other researchers to analyse our results or apply the model to their data collections.","PeriodicalId":193885,"journal":{"name":"2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA)","volume":"217 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"Analysing the History of Autism Spectrum Disorder Using Topic Models\",\"authors\":\"Adham Beykikhoshk, Dinh Q. Phung, Ognjen Arandjelovic, S. Venkatesh\",\"doi\":\"10.1109/DSAA.2016.65\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We describe a novel framework for the discovery of underlying topics of a longitudinal collection of scholarly data, and the tracking of their lifetime and popularity over time. Unlike the social media or news data where the underlying topics evolve over time, the topic nuances in science result in new scientific directions to emerge. Therefore, we model the longitudinal literature data with a new approach that uses topics which remain identifiable over the course of time. Current studies either disregard the time dimension or treat it as an exchangeable covariate when they fix the topics over time or do not share the topics over epochs when they model the time naturally. We address these issues by adopting a non-parametric Bayesian approach. We assume the data is partially exchangeable and divide it into consecutive epochs. Then, by fixing the topics in a recurrent Chinese restaurant franchise, we impose a static topical structure on the corpus such that the topics are shared across epochs and the documents within epochs. We demonstrate the effectiveness of the proposed framework on a collection of medical literature related to autism spectrum disorder. We collect a large corpus of publications and carefully examine two important research issues of the domain as case studies. Moreover, we make the results of our experiment and the source code of the model, freely available to the public. This aids other researchers to analyse our results or apply the model to their data collections.\",\"PeriodicalId\":193885,\"journal\":{\"name\":\"2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA)\",\"volume\":\"217 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/DSAA.2016.65\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DSAA.2016.65","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 8

摘要

我们描述了一个新的框架，用于发现学术数据纵向收集的潜在主题，并跟踪他们的寿命和受欢迎程度。与社交媒体或新闻数据的潜在主题随着时间的推移而演变不同，科学中的主题细微差别导致新的科学方向出现。因此，我们用一种新的方法对纵向文献数据进行建模，这种方法使用的主题在一段时间内仍然可以识别。当前的研究在固定时间主题时，要么忽略时间维度，要么将其视为可交换的协变量，要么在自然建模时不跨时代共享主题。我们通过采用非参数贝叶斯方法来解决这些问题。我们假设数据是部分可交换的，并将其划分为连续的时期。然后，通过固定一个经常出现的中餐馆特许经营中的主题，我们在语料库上强加了一个静态的主题结构，这样主题就可以跨时代共享，并且可以在时代内共享文档。我们在与自闭症谱系障碍相关的医学文献集合上证明了所提出的框架的有效性。我们收集了大量的出版物，并仔细研究了该领域的两个重要研究问题作为案例研究。此外，我们将实验结果和模型的源代码免费提供给公众。这有助于其他研究人员分析我们的结果或将模型应用于他们的数据收集。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Analysing the History of Autism Spectrum Disorder Using Topic Models

We describe a novel framework for the discovery of underlying topics of a longitudinal collection of scholarly data, and the tracking of their lifetime and popularity over time. Unlike the social media or news data where the underlying topics evolve over time, the topic nuances in science result in new scientific directions to emerge. Therefore, we model the longitudinal literature data with a new approach that uses topics which remain identifiable over the course of time. Current studies either disregard the time dimension or treat it as an exchangeable covariate when they fix the topics over time or do not share the topics over epochs when they model the time naturally. We address these issues by adopting a non-parametric Bayesian approach. We assume the data is partially exchangeable and divide it into consecutive epochs. Then, by fixing the topics in a recurrent Chinese restaurant franchise, we impose a static topical structure on the corpus such that the topics are shared across epochs and the documents within epochs. We demonstrate the effectiveness of the proposed framework on a collection of medical literature related to autism spectrum disorder. We collect a large corpus of publications and carefully examine two important research issues of the domain as case studies. Moreover, we make the results of our experiment and the source code of the model, freely available to the public. This aids other researchers to analyse our results or apply the model to their data collections.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA)

自引率

0.00%

发文量