Creating Bengali Freebase Using Wikidata

电脑和通信(英文) Pub Date : 2023-01-01 DOI:10.4236/jcc.2023.115011

Rukaiya Habib, M. Ferdous, M. Anwar

{"title":"Creating Bengali Freebase Using Wikidata","authors":"Rukaiya Habib, M. Ferdous, M. Anwar","doi":"10.4236/jcc.2023.115011","DOIUrl":null,"url":null,"abstract":"Freebase is a large collaborative knowledge base and database of general, structured information for public use. Its structured data had been harvested from many sources, including individual, user-submitted wiki contributions. Its aim is to create a global resource so that people (and machines) can access common information more effectively which is mostly available in English. In this research work, we have tried to build the technique of creating the Free-base for Bengali language. Today the number of Bengali articles on the internet is growing day by day. So it has become a necessary to have a structured data store in Bengali. It consists of different types of concepts (topics) and relationships between those topics. These include different types of areas like popular culture (e.g. films, music, books, sports, television), location information (restaurants, geolocations, businesses), scholarly information (linguistics, biology, astronomy), birth place of (poets, politicians, actor, actress) and general knowledge (Wikipedia). It will be much more helpful for relation extraction or any kind of Natural Language Processing (NLP) works on Ben-gali language. In this work, we identified the technique of creating the Bengali Freebase and made a collection of Bengali data. We applied SPARQL query language to extract information from natural language (Bengali) documents such as Wikidata which is typically in RDF (Resource Description Format) triple format.","PeriodicalId":67799,"journal":{"name":"电脑和通信(英文)","volume":"1 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"电脑和通信(英文)","FirstCategoryId":"1093","ListUrlMain":"https://doi.org/10.4236/jcc.2023.115011","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Freebase is a large collaborative knowledge base and database of general, structured information for public use. Its structured data had been harvested from many sources, including individual, user-submitted wiki contributions. Its aim is to create a global resource so that people (and machines) can access common information more effectively which is mostly available in English. In this research work, we have tried to build the technique of creating the Free-base for Bengali language. Today the number of Bengali articles on the internet is growing day by day. So it has become a necessary to have a structured data store in Bengali. It consists of different types of concepts (topics) and relationships between those topics. These include different types of areas like popular culture (e.g. films, music, books, sports, television), location information (restaurants, geolocations, businesses), scholarly information (linguistics, biology, astronomy), birth place of (poets, politicians, actor, actress) and general knowledge (Wikipedia). It will be much more helpful for relation extraction or any kind of Natural Language Processing (NLP) works on Ben-gali language. In this work, we identified the technique of creating the Bengali Freebase and made a collection of Bengali data. We applied SPARQL query language to extract information from natural language (Bengali) documents such as Wikidata which is typically in RDF (Resource Description Format) triple format.

查看原文本刊更多论文

使用维基数据创建孟加拉语Freebase

Freebase是一个大型的协作知识库和数据库，提供一般的、结构化的信息供公众使用。它的结构化数据来自许多来源，包括个人的、用户提交的wiki贡献。它的目标是创建一个全球资源，以便人们(和机器)可以更有效地访问主要以英语提供的公共信息。在本研究工作中，我们尝试建立孟加拉语自由库的创建技术。如今，互联网上的孟加拉文文章数量与日俱增。因此，有一个孟加拉语的结构化数据存储是必要的。它由不同类型的概念(主题)和这些主题之间的关系组成。这些包括不同类型的领域，如流行文化(如电影，音乐，书籍，体育，电视)，位置信息(餐馆，地理位置，商业)，学术信息(语言学，生物学，天文学)，出生地(诗人，政治家，演员，女演员)和一般知识(维基百科)。这对关系提取或任何自然语言处理(NLP)的本加利语工作都有很大的帮助。在这项工作中，我们确定了创建孟加拉语Freebase的技术，并收集了孟加拉语数据。我们应用SPARQL查询语言从自然语言(孟加拉语)文档中提取信息，例如典型的RDF(资源描述格式)三重格式的Wikidata。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

电脑和通信(英文)

自引率

0.00%

发文量

784