求助PDF
{"title":"利用自检索学习和自动关键词提取进行文档到文档检索","authors":"Yasuaki Seki, Tomoki Hamagami","doi":"10.1002/tee.24181","DOIUrl":null,"url":null,"abstract":"In this study, we propose self‐retrieval learning, a self‐supervised learning method that does not require an annotated dataset. In self‐retrieval learning, keywords extracted from documents are used as queries to construct training data that imitate the relationship between query and corpus, such that the documents themselves are retrieved. In the usual supervised learning for information retrieval, a pair of query and corpus document is required as training data, but self‐retrieval learning does not require such data. In addition, it does not use information such as reference lists or other documents connected to the query, but only the text of the documents in the target domain. In our experiments, self‐retrieval learning was performed on the EU and UK legal document retrieval task using a retrieval model called DRMM. We found that self‐retrieval learning not only does not require supervised datasets, but also outperforms supervised learning with the same model in terms of retrieval accuracy. © 2024 Institute of Electrical Engineers of Japan and Wiley Periodicals LLC.","PeriodicalId":13435,"journal":{"name":"IEEJ Transactions on Electrical and Electronic Engineering","volume":"35 1","pages":""},"PeriodicalIF":1.0000,"publicationDate":"2024-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Document‐to‐Document Retrieval Using Self‐Retrieval Learning and Automatic Keyword Extraction\",\"authors\":\"Yasuaki Seki, Tomoki Hamagami\",\"doi\":\"10.1002/tee.24181\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this study, we propose self‐retrieval learning, a self‐supervised learning method that does not require an annotated dataset. In self‐retrieval learning, keywords extracted from documents are used as queries to construct training data that imitate the relationship between query and corpus, such that the documents themselves are retrieved. In the usual supervised learning for information retrieval, a pair of query and corpus document is required as training data, but self‐retrieval learning does not require such data. In addition, it does not use information such as reference lists or other documents connected to the query, but only the text of the documents in the target domain. In our experiments, self‐retrieval learning was performed on the EU and UK legal document retrieval task using a retrieval model called DRMM. We found that self‐retrieval learning not only does not require supervised datasets, but also outperforms supervised learning with the same model in terms of retrieval accuracy. © 2024 Institute of Electrical Engineers of Japan and Wiley Periodicals LLC.\",\"PeriodicalId\":13435,\"journal\":{\"name\":\"IEEJ Transactions on Electrical and Electronic Engineering\",\"volume\":\"35 1\",\"pages\":\"\"},\"PeriodicalIF\":1.0000,\"publicationDate\":\"2024-08-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEJ Transactions on Electrical and Electronic Engineering\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://doi.org/10.1002/tee.24181\",\"RegionNum\":4,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEJ Transactions on Electrical and Electronic Engineering","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1002/tee.24181","RegionNum":4,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
引用
批量引用
Document‐to‐Document Retrieval Using Self‐Retrieval Learning and Automatic Keyword Extraction
In this study, we propose self‐retrieval learning, a self‐supervised learning method that does not require an annotated dataset. In self‐retrieval learning, keywords extracted from documents are used as queries to construct training data that imitate the relationship between query and corpus, such that the documents themselves are retrieved. In the usual supervised learning for information retrieval, a pair of query and corpus document is required as training data, but self‐retrieval learning does not require such data. In addition, it does not use information such as reference lists or other documents connected to the query, but only the text of the documents in the target domain. In our experiments, self‐retrieval learning was performed on the EU and UK legal document retrieval task using a retrieval model called DRMM. We found that self‐retrieval learning not only does not require supervised datasets, but also outperforms supervised learning with the same model in terms of retrieval accuracy. © 2024 Institute of Electrical Engineers of Japan and Wiley Periodicals LLC.