Mahmoud El-Haj, Lorna Balkan, Suzanne Barbalet, L. Bell, J. Shepherdson
{"title":"使用HASSET同义词典自动索引的实验","authors":"Mahmoud El-Haj, Lorna Balkan, Suzanne Barbalet, L. Bell, J. Shepherdson","doi":"10.1109/CEEC.2013.6659437","DOIUrl":null,"url":null,"abstract":"In this paper we present the tools, techniques and evaluation results of an automatic indexing experiment we conducted on the UK Data Archive/UK Data Service data-related document collection, as part of the Jisc-funded SKOS-HASSET project. We examined the quality of an automatic indexer based on a controlled vocabulary called the Humanities and Social Science Electronic Thesaurus (HASSET). We used the Keyphrase Extraction Algorithm (KEA), a text mining and a machine learning tool. KEA builds a classifier model using training documents with known keywords which is then applied to help assign keywords to new documents. We performed extensive manual and automatic evaluation on the results using recall, precision and F1 scores. The quality of the KEA indexing was measured a) automatically by the degree of overlap between the automated indexing decisions and those originally made by the human indexer and b) manually by comparing KEA's output with the source text. This paper explains how and why we applied the chosen technical solutions, and how we intend to take forward any lessons learned from this work in the future.","PeriodicalId":309053,"journal":{"name":"2013 5th Computer Science and Electronic Engineering Conference (CEEC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":"{\"title\":\"An experiment in automatic indexing using the HASSET thesaurus\",\"authors\":\"Mahmoud El-Haj, Lorna Balkan, Suzanne Barbalet, L. Bell, J. Shepherdson\",\"doi\":\"10.1109/CEEC.2013.6659437\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper we present the tools, techniques and evaluation results of an automatic indexing experiment we conducted on the UK Data Archive/UK Data Service data-related document collection, as part of the Jisc-funded SKOS-HASSET project. We examined the quality of an automatic indexer based on a controlled vocabulary called the Humanities and Social Science Electronic Thesaurus (HASSET). We used the Keyphrase Extraction Algorithm (KEA), a text mining and a machine learning tool. KEA builds a classifier model using training documents with known keywords which is then applied to help assign keywords to new documents. We performed extensive manual and automatic evaluation on the results using recall, precision and F1 scores. The quality of the KEA indexing was measured a) automatically by the degree of overlap between the automated indexing decisions and those originally made by the human indexer and b) manually by comparing KEA's output with the source text. This paper explains how and why we applied the chosen technical solutions, and how we intend to take forward any lessons learned from this work in the future.\",\"PeriodicalId\":309053,\"journal\":{\"name\":\"2013 5th Computer Science and Electronic Engineering Conference (CEEC)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-11-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"11\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 5th Computer Science and Electronic Engineering Conference (CEEC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CEEC.2013.6659437\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 5th Computer Science and Electronic Engineering Conference (CEEC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CEEC.2013.6659437","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
An experiment in automatic indexing using the HASSET thesaurus
In this paper we present the tools, techniques and evaluation results of an automatic indexing experiment we conducted on the UK Data Archive/UK Data Service data-related document collection, as part of the Jisc-funded SKOS-HASSET project. We examined the quality of an automatic indexer based on a controlled vocabulary called the Humanities and Social Science Electronic Thesaurus (HASSET). We used the Keyphrase Extraction Algorithm (KEA), a text mining and a machine learning tool. KEA builds a classifier model using training documents with known keywords which is then applied to help assign keywords to new documents. We performed extensive manual and automatic evaluation on the results using recall, precision and F1 scores. The quality of the KEA indexing was measured a) automatically by the degree of overlap between the automated indexing decisions and those originally made by the human indexer and b) manually by comparing KEA's output with the source text. This paper explains how and why we applied the chosen technical solutions, and how we intend to take forward any lessons learned from this work in the future.