Mining semantics for culturomics: towards a knowledge-based approach

Proceedings of the 2013 international workshop on Mining unstructured big data using natural language processing Pub Date : 2013-10-28 DOI:10.1145/2513549.2513551

L. Borin, Devdatt P. Dubhashi, Markus Forsberg, Richard Johansson, D. Kokkinakis, P. Nugues

{"title":"Mining semantics for culturomics: towards a knowledge-based approach","authors":"L. Borin, Devdatt P. Dubhashi, Markus Forsberg, Richard Johansson, D. Kokkinakis, P. Nugues","doi":"10.1145/2513549.2513551","DOIUrl":null,"url":null,"abstract":"The massive amounts of text data made available through the Google Books digitization project have inspired a new field of big-data textual research. Named culturomics, this field has attracted the attention of a growing number of scholars over recent years. However, initial studies based on these data have been criticized for not referring to relevant work in linguistics and language technology. This paper provides some ideas, thoughts and first steps towards a new culturomics initiative, based this time on Swedish data, which pursues a more knowledge-based approach than previous work in this emerging field. The amount of new Swedish text produced daily and older texts being digitized in cultural heritage projects grows at an accelerating rate. These volumes of text being available in digital form have grown far beyond the capacity of human readers, leaving automated semantic processing of the texts as the only realistic option for accessing and using the information contained in them. The aim of our recently initiated research program is to advance the state of the art in language technology resources and methods for semantic processing of Big Swedish text and focus on the theoretical and methodological advancement of the state of the art in extracting and correlating information from large volumes of Swedish text using a combination of knowledge-based and statistical methods.","PeriodicalId":126426,"journal":{"name":"Proceedings of the 2013 international workshop on Mining unstructured big data using natural language processing","volume":"47 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2013 international workshop on Mining unstructured big data using natural language processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2513549.2513551","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 10

Abstract

The massive amounts of text data made available through the Google Books digitization project have inspired a new field of big-data textual research. Named culturomics, this field has attracted the attention of a growing number of scholars over recent years. However, initial studies based on these data have been criticized for not referring to relevant work in linguistics and language technology. This paper provides some ideas, thoughts and first steps towards a new culturomics initiative, based this time on Swedish data, which pursues a more knowledge-based approach than previous work in this emerging field. The amount of new Swedish text produced daily and older texts being digitized in cultural heritage projects grows at an accelerating rate. These volumes of text being available in digital form have grown far beyond the capacity of human readers, leaving automated semantic processing of the texts as the only realistic option for accessing and using the information contained in them. The aim of our recently initiated research program is to advance the state of the art in language technology resources and methods for semantic processing of Big Swedish text and focus on the theoretical and methodological advancement of the state of the art in extracting and correlating information from large volumes of Swedish text using a combination of knowledge-based and statistical methods.

查看原文本刊更多论文

为文化组挖掘语义:迈向基于知识的方法

通过谷歌图书数字化项目提供的大量文本数据激发了大数据文本研究的新领域。这一领域被称为文化组学，近年来吸引了越来越多学者的关注。然而，基于这些数据的初步研究因没有参考语言学和语言技术的相关工作而受到批评。本文提供了一些想法，想法和第一步，朝着一个新的文化学倡议，这一次基于瑞典的数据，追求更多的知识为基础的方法比以前的工作在这个新兴领域。瑞典每天产生的新文本和文化遗产项目中数字化的旧文本的数量正在加速增长。这些以数字形式提供的大量文本已经远远超出了人类读者的能力，使得文本的自动语义处理成为访问和使用其中包含的信息的唯一现实选择。我们最近启动的研究项目的目的是推进语言技术资源和瑞典语大文本语义处理方法的最新状态，并专注于使用基于知识和统计方法的组合从大量瑞典语文本中提取和关联信息的理论和方法的最新状态。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 2013 international workshop on Mining unstructured big data using natural language processing

自引率

0.00%

发文量