Teru Agata, Atsushi Ikeuchi, Emi Ishita, Michiko Nozue, T. Kuno, S. Ueda
{"title":"日语PDF文件学术文章的自动识别","authors":"Teru Agata, Atsushi Ikeuchi, Emi Ishita, Michiko Nozue, T. Kuno, S. Ueda","doi":"10.46895/lis.56.43","DOIUrl":null,"url":null,"abstract":"As open-access becomes common, many researchers deposit their research products in a publicly accessible web (i.e. self-archiving). Although they are accessible from general search engines, massive other contents tend to hide them. The purpose of this research is to identify academic articles or quasi-articles from the entire web automatically. In this paper we conduct experiments on the performance of various classifiers and compare in terms of precision, recall, F-value. The classifiers used such attributes as terms appeared in PDF files and empirical rules. The diverse performance of each classifier discloses its characteristics.","PeriodicalId":42468,"journal":{"name":"Library and Information Science","volume":"1 1","pages":""},"PeriodicalIF":0.1000,"publicationDate":"2006-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Automatic identification of academic articles in Japanese PDF files\",\"authors\":\"Teru Agata, Atsushi Ikeuchi, Emi Ishita, Michiko Nozue, T. Kuno, S. Ueda\",\"doi\":\"10.46895/lis.56.43\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"As open-access becomes common, many researchers deposit their research products in a publicly accessible web (i.e. self-archiving). Although they are accessible from general search engines, massive other contents tend to hide them. The purpose of this research is to identify academic articles or quasi-articles from the entire web automatically. In this paper we conduct experiments on the performance of various classifiers and compare in terms of precision, recall, F-value. The classifiers used such attributes as terms appeared in PDF files and empirical rules. The diverse performance of each classifier discloses its characteristics.\",\"PeriodicalId\":42468,\"journal\":{\"name\":\"Library and Information Science\",\"volume\":\"1 1\",\"pages\":\"\"},\"PeriodicalIF\":0.1000,\"publicationDate\":\"2006-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Library and Information Science\",\"FirstCategoryId\":\"91\",\"ListUrlMain\":\"https://doi.org/10.46895/lis.56.43\",\"RegionNum\":4,\"RegionCategory\":\"管理学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"INFORMATION SCIENCE & LIBRARY SCIENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Library and Information Science","FirstCategoryId":"91","ListUrlMain":"https://doi.org/10.46895/lis.56.43","RegionNum":4,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"INFORMATION SCIENCE & LIBRARY SCIENCE","Score":null,"Total":0}
Automatic identification of academic articles in Japanese PDF files
As open-access becomes common, many researchers deposit their research products in a publicly accessible web (i.e. self-archiving). Although they are accessible from general search engines, massive other contents tend to hide them. The purpose of this research is to identify academic articles or quasi-articles from the entire web automatically. In this paper we conduct experiments on the performance of various classifiers and compare in terms of precision, recall, F-value. The classifiers used such attributes as terms appeared in PDF files and empirical rules. The diverse performance of each classifier discloses its characteristics.
期刊介绍:
Library and Information Science is the official journal of the Mita Society for Library and Information Science. It is issued semiannually and prepared by the Editorial Committee of the Society.