S. Verberne, Antal van den Bosch, S. Wubben, E. Krahmer
{"title":"特定领域论坛线程的自动汇总:收集参考数据","authors":"S. Verberne, Antal van den Bosch, S. Wubben, E. Krahmer","doi":"10.1145/3020165.3022127","DOIUrl":null,"url":null,"abstract":"We create and analyze two sets of reference summaries for discussion threads on a patient support forum: expert summaries and crowdsourced, non-expert summaries. Ideally, reference summaries for discussion forum threads are created by expert members of the forum community. When there are few or no expert members available, crowdsourcing the reference summaries is an alternative. In this paper we investigate whether domain-specific forum data requires the hiring of domain experts for creating reference summaries. We analyze the inter-rater agreement for both data-sets and we train summarization models using the two types of reference summaries. The inter-rater agreement in crowdsourced reference summaries is low, close to random, while domain experts achieve a considerably higher, fair, agreement. The trained models however are similar to each other. We conclude that it is possible to train an extractive summarization model on crowdsourced data that is similar to an expert model, even if the inter-rater agreement for the crowdsourced data is low.","PeriodicalId":398762,"journal":{"name":"Proceedings of the 2017 Conference on Conference Human Information Interaction and Retrieval","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Automatic Summarization of Domain-specific Forum Threads: Collecting Reference Data\",\"authors\":\"S. Verberne, Antal van den Bosch, S. Wubben, E. Krahmer\",\"doi\":\"10.1145/3020165.3022127\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We create and analyze two sets of reference summaries for discussion threads on a patient support forum: expert summaries and crowdsourced, non-expert summaries. Ideally, reference summaries for discussion forum threads are created by expert members of the forum community. When there are few or no expert members available, crowdsourcing the reference summaries is an alternative. In this paper we investigate whether domain-specific forum data requires the hiring of domain experts for creating reference summaries. We analyze the inter-rater agreement for both data-sets and we train summarization models using the two types of reference summaries. The inter-rater agreement in crowdsourced reference summaries is low, close to random, while domain experts achieve a considerably higher, fair, agreement. The trained models however are similar to each other. We conclude that it is possible to train an extractive summarization model on crowdsourced data that is similar to an expert model, even if the inter-rater agreement for the crowdsourced data is low.\",\"PeriodicalId\":398762,\"journal\":{\"name\":\"Proceedings of the 2017 Conference on Conference Human Information Interaction and Retrieval\",\"volume\":\"13 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-03-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2017 Conference on Conference Human Information Interaction and Retrieval\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3020165.3022127\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2017 Conference on Conference Human Information Interaction and Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3020165.3022127","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Automatic Summarization of Domain-specific Forum Threads: Collecting Reference Data
We create and analyze two sets of reference summaries for discussion threads on a patient support forum: expert summaries and crowdsourced, non-expert summaries. Ideally, reference summaries for discussion forum threads are created by expert members of the forum community. When there are few or no expert members available, crowdsourcing the reference summaries is an alternative. In this paper we investigate whether domain-specific forum data requires the hiring of domain experts for creating reference summaries. We analyze the inter-rater agreement for both data-sets and we train summarization models using the two types of reference summaries. The inter-rater agreement in crowdsourced reference summaries is low, close to random, while domain experts achieve a considerably higher, fair, agreement. The trained models however are similar to each other. We conclude that it is possible to train an extractive summarization model on crowdsourced data that is similar to an expert model, even if the inter-rater agreement for the crowdsourced data is low.