基于文本自动标注与分类的相关反馈搜索

International Conference on Language, Data, and Knowledge Pub Date : 1900-01-01 DOI:10.4230/OASIcs.LDK.2021.18

Rafael Leal, Joonas Kesäniemi, M. Koho, E. Hyvönen

{"title":"基于文本自动标注与分类的相关反馈搜索","authors":"Rafael Leal, Joonas Kesäniemi, M. Koho, E. Hyvönen","doi":"10.4230/OASIcs.LDK.2021.18","DOIUrl":null,"url":null,"abstract":"The idea behind Relevance Feedback Search (RFBS) is to build search queries as an iterative and interactive process in which they are gradually refined based on the results of the previous search round. This can be helpful in situations where the end user cannot easily formulate their information needs at the outset as a well-focused query, or more generally as a way to filter and focus search results. This paper concerns (1) a framework that integrates keyword extraction and unsupervised classification into the RFBS paradigm and (2) the application of this framework to the legal domain as a use case. We focus on the Natural Language Processing (NLP) methods underlying the framework and application, where an automatic annotation tool is used for extracting document keywords as ontology concepts, which are then transformed into word embeddings to form vectorial representations of the texts. An unsupervised classification system that employs similar techniques is also used in order to classify the documents into broad thematic classes. This classification functionality is evaluated using two different datasets. As the use case, we describe an application perspective in the semantic portal LawSampo – Finnish Legislation and Case Law on the Semantic Web . This online demonstrator uses a dataset of 82 145 sections in 3725 statutes of Finnish legislation and another dataset that comprises 13 470 court decisions.","PeriodicalId":377119,"journal":{"name":"International Conference on Language, Data, and Knowledge","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Relevance Feedback Search Based on Automatic Annotation and Classification of Texts\",\"authors\":\"Rafael Leal, Joonas Kesäniemi, M. Koho, E. Hyvönen\",\"doi\":\"10.4230/OASIcs.LDK.2021.18\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The idea behind Relevance Feedback Search (RFBS) is to build search queries as an iterative and interactive process in which they are gradually refined based on the results of the previous search round. This can be helpful in situations where the end user cannot easily formulate their information needs at the outset as a well-focused query, or more generally as a way to filter and focus search results. This paper concerns (1) a framework that integrates keyword extraction and unsupervised classification into the RFBS paradigm and (2) the application of this framework to the legal domain as a use case. We focus on the Natural Language Processing (NLP) methods underlying the framework and application, where an automatic annotation tool is used for extracting document keywords as ontology concepts, which are then transformed into word embeddings to form vectorial representations of the texts. An unsupervised classification system that employs similar techniques is also used in order to classify the documents into broad thematic classes. This classification functionality is evaluated using two different datasets. As the use case, we describe an application perspective in the semantic portal LawSampo – Finnish Legislation and Case Law on the Semantic Web . This online demonstrator uses a dataset of 82 145 sections in 3725 statutes of Finnish legislation and another dataset that comprises 13 470 court decisions.\",\"PeriodicalId\":377119,\"journal\":{\"name\":\"International Conference on Language, Data, and Knowledge\",\"volume\":\"17 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1900-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Conference on Language, Data, and Knowledge\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.4230/OASIcs.LDK.2021.18\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Language, Data, and Knowledge","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4230/OASIcs.LDK.2021.18","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

相关反馈搜索(RFBS)背后的思想是将搜索查询构建为一个迭代和交互的过程，在此过程中，它们根据前一轮搜索的结果逐渐改进。如果最终用户一开始不能很容易地将他们的信息需求表述为重点明确的查询，或者更一般地作为过滤和集中搜索结果的一种方式，那么这将很有帮助。本文关注(1)将关键字提取和无监督分类集成到RFBS范式中的框架，以及(2)将该框架作为用例应用于法律领域。我们专注于框架和应用程序的自然语言处理(NLP)方法，其中使用自动注释工具提取文档关键字作为本体概念，然后将其转换为词嵌入以形成文本的向量表示。采用类似技术的无监督分类系统也用于将文档分类为广泛的主题类。使用两个不同的数据集评估这种分类功能。作为用例，我们在语义门户LawSampo——语义Web上的芬兰立法和判例法中描述了一个应用程序透视图。这个在线演示使用了芬兰立法3725条法规中的82 145个章节的数据集和另一个包含13 470个法院判决的数据集。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Relevance Feedback Search Based on Automatic Annotation and Classification of Texts

The idea behind Relevance Feedback Search (RFBS) is to build search queries as an iterative and interactive process in which they are gradually refined based on the results of the previous search round. This can be helpful in situations where the end user cannot easily formulate their information needs at the outset as a well-focused query, or more generally as a way to filter and focus search results. This paper concerns (1) a framework that integrates keyword extraction and unsupervised classification into the RFBS paradigm and (2) the application of this framework to the legal domain as a use case. We focus on the Natural Language Processing (NLP) methods underlying the framework and application, where an automatic annotation tool is used for extracting document keywords as ontology concepts, which are then transformed into word embeddings to form vectorial representations of the texts. An unsupervised classification system that employs similar techniques is also used in order to classify the documents into broad thematic classes. This classification functionality is evaluated using two different datasets. As the use case, we describe an application perspective in the semantic portal LawSampo – Finnish Legislation and Case Law on the Semantic Web . This online demonstrator uses a dataset of 82 145 sections in 3725 statutes of Finnish legislation and another dataset that comprises 13 470 court decisions.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

International Conference on Language, Data, and Knowledge

自引率

0.00%

发文量