Automatic annotation of bibliographical references in digital humanities books, articles and blogs

Workshop on Research Advances in Large Digital Book Repositories Pub Date : 2011-10-24 DOI:10.1145/2064058.2064068

Young-Min Kim, P. Bellot, Elodie Faath, Marin Dacos

引用次数: 21

Abstract

In this paper, we deal with the problem of extracting and processing useful information from bibliographic references in Digital Humanities (DH) data. A machine learning technique for sequential data analysis, Conditional Random Field is applied to a corpus extracted from OpenEdition site, a web platform for journals and book collections in the humanities and social sciences. We present our ongoing project with this purpose that includes the construction of a proper corpus and a efficient CRF model on this as a preliminary. This project is supported by Google Grant for Digital Humanities. A number of experiments are conducted to find one of the best settings for a CRF model on the corpus, and we verify them both in an automatic and manual way of evaluation.

查看原文本刊更多论文

在数字人文书籍，文章和博客书目参考书目的自动注释

本文研究了数字人文学科(DH)数据中书目参考信息的提取和处理问题。一种用于顺序数据分析的机器学习技术，条件随机场应用于从OpenEdition网站提取的语料库，该网站是人文和社会科学期刊和图书收藏的网络平台。我们提出了我们正在进行的项目，其中包括构建一个适当的语料库和一个有效的CRF模型，作为初步的目的。该项目由谷歌数字人文科学基金支持。为了在语料库上找到一个最佳的CRF模型设置，我们进行了大量的实验，并以自动和手动的评估方式对它们进行了验证。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Workshop on Research Advances in Large Digital Book Repositories

自引率

0.00%

发文量