大纲中的漏洞:与主题相关的摘要质量及其对科学文献检索的影响

Proceedings of the 2019 Conference on Human Information Interaction and Retrieval Pub Date : 2019-03-08 DOI:10.1145/3295750.3298953

Chien-yu Huang, Arlene Casey, D. Glowacka, A. Medlar

{"title":"大纲中的漏洞:与主题相关的摘要质量及其对科学文献检索的影响","authors":"Chien-yu Huang, Arlene Casey, D. Glowacka, A. Medlar","doi":"10.1145/3295750.3298953","DOIUrl":null,"url":null,"abstract":"Scientific literature search engines typically index abstracts instead of the full-text of publications. The expectation is that the abstract provides a comprehensive summary of the article, enumerating key points for the reader to assess whether their information needs could be satisfied by reading the full-text. Furthermore, from a practical standpoint, obtaining the full-text is more complicated due to licensing issues, in the case of commercial publishers, and resource limitations of public repositories and pre-print servers. In this article, we use topic modelling to represent content in abstracts and full-text articles. Using Computer Science as a case study, we demonstrate that how well the abstract summarises the full-text is subfield-dependent. Indeed, we show that abstract representativeness has a direct impact on retrieval performance, with poorer abstracts leading to degraded performance. Finally, we present evidence that how well an abstract represents the full-text of an article is not random, but is a consequence of style and writing conventions in different subdisciplines and can be used to infer an \"evolutionary\" tree of subfields within Computer Science.","PeriodicalId":187771,"journal":{"name":"Proceedings of the 2019 Conference on Human Information Interaction and Retrieval","volume":"42 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"Holes in the Outline: Subject-dependent Abstract Quality and its Implications for Scientific Literature Search\",\"authors\":\"Chien-yu Huang, Arlene Casey, D. Glowacka, A. Medlar\",\"doi\":\"10.1145/3295750.3298953\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Scientific literature search engines typically index abstracts instead of the full-text of publications. The expectation is that the abstract provides a comprehensive summary of the article, enumerating key points for the reader to assess whether their information needs could be satisfied by reading the full-text. Furthermore, from a practical standpoint, obtaining the full-text is more complicated due to licensing issues, in the case of commercial publishers, and resource limitations of public repositories and pre-print servers. In this article, we use topic modelling to represent content in abstracts and full-text articles. Using Computer Science as a case study, we demonstrate that how well the abstract summarises the full-text is subfield-dependent. Indeed, we show that abstract representativeness has a direct impact on retrieval performance, with poorer abstracts leading to degraded performance. Finally, we present evidence that how well an abstract represents the full-text of an article is not random, but is a consequence of style and writing conventions in different subdisciplines and can be used to infer an \\\"evolutionary\\\" tree of subfields within Computer Science.\",\"PeriodicalId\":187771,\"journal\":{\"name\":\"Proceedings of the 2019 Conference on Human Information Interaction and Retrieval\",\"volume\":\"42 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-03-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2019 Conference on Human Information Interaction and Retrieval\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3295750.3298953\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2019 Conference on Human Information Interaction and Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3295750.3298953","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 8

摘要

科学文献搜索引擎通常索引摘要，而不是出版物的全文。期望摘要提供对文章的全面总结，列举要点，供读者评估是否可以通过阅读全文来满足他们的信息需求。此外，从实际的角度来看，由于商业出版商的许可问题以及公共存储库和预印本服务器的资源限制，获得全文更加复杂。在本文中，我们使用主题建模来表示摘要和全文文章中的内容。以计算机科学为例，我们证明了摘要对全文的总结程度与子字段有关。事实上，我们表明抽象代表性对检索性能有直接影响，较差的抽象会导致性能下降。最后，我们提供的证据表明，摘要如何很好地代表一篇文章的全文不是随机的，而是不同子学科的风格和写作惯例的结果，可以用来推断计算机科学中子领域的“进化”树。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Holes in the Outline: Subject-dependent Abstract Quality and its Implications for Scientific Literature Search

Scientific literature search engines typically index abstracts instead of the full-text of publications. The expectation is that the abstract provides a comprehensive summary of the article, enumerating key points for the reader to assess whether their information needs could be satisfied by reading the full-text. Furthermore, from a practical standpoint, obtaining the full-text is more complicated due to licensing issues, in the case of commercial publishers, and resource limitations of public repositories and pre-print servers. In this article, we use topic modelling to represent content in abstracts and full-text articles. Using Computer Science as a case study, we demonstrate that how well the abstract summarises the full-text is subfield-dependent. Indeed, we show that abstract representativeness has a direct impact on retrieval performance, with poorer abstracts leading to degraded performance. Finally, we present evidence that how well an abstract represents the full-text of an article is not random, but is a consequence of style and writing conventions in different subdisciplines and can be used to infer an "evolutionary" tree of subfields within Computer Science.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 2019 Conference on Human Information Interaction and Retrieval

自引率

0.00%

发文量