一个全面的语音数据集欣德科数字识别。

IF 1 Q3 MULTIDISCIPLINARY SCIENCES

Data in Brief Pub Date : 2025-02-01 DOI:10.1016/j.dib.2024.111220

Tanveer Ahmed , Maqbool Khan , Khalil Khan , Ikram Syed , Syed Sajid Ullah

{"title":"一个全面的语音数据集欣德科数字识别。","authors":"Tanveer Ahmed , Maqbool Khan , Khalil Khan , Ikram Syed , Syed Sajid Ullah","doi":"10.1016/j.dib.2024.111220","DOIUrl":null,"url":null,"abstract":"<div><div>Hindko is a language primarily spoken in Northwestern areas of Pakistan. Approximately eight million people speak the Hindko language. According to its native speakers, it is 7<sup>th</sup> largest language of Pakistan and 2<sup>nd</sup> largest language of Khyber Pakhtunkhwa. The Hazara region is the cultural hub of Hindko language. About 80% of the population in districts like Haripur, Abbotabad and Mansehra speak Hindko. The spoken content of Hindko covers a wide range of subjects, including religion, education, poetry, politics, theater, and more. Despite all this, Hindko lacks a voice recognition system that could enhance accessibility, preserve the language, and promote digital inclusion for its speakers. This paper presents a voice recognition dataset that consists of 17,597 voice samples, and is accessible to the public for academic and research purposes. The dataset consists of 20 Hindko digits ranging from 1 to 20 and all the voice samples are taken from the students and staff and faculty of Pak-Austria Fachhochschule Institute of Applied Science and Technology.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"58 ","pages":"Article 111220"},"PeriodicalIF":1.0000,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11730949/pdf/","citationCount":"0","resultStr":"{\"title\":\"A comprehensive voice dataset for Hindko digit recognition\",\"authors\":\"Tanveer Ahmed , Maqbool Khan , Khalil Khan , Ikram Syed , Syed Sajid Ullah\",\"doi\":\"10.1016/j.dib.2024.111220\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Hindko is a language primarily spoken in Northwestern areas of Pakistan. Approximately eight million people speak the Hindko language. According to its native speakers, it is 7<sup>th</sup> largest language of Pakistan and 2<sup>nd</sup> largest language of Khyber Pakhtunkhwa. The Hazara region is the cultural hub of Hindko language. About 80% of the population in districts like Haripur, Abbotabad and Mansehra speak Hindko. The spoken content of Hindko covers a wide range of subjects, including religion, education, poetry, politics, theater, and more. Despite all this, Hindko lacks a voice recognition system that could enhance accessibility, preserve the language, and promote digital inclusion for its speakers. This paper presents a voice recognition dataset that consists of 17,597 voice samples, and is accessible to the public for academic and research purposes. The dataset consists of 20 Hindko digits ranging from 1 to 20 and all the voice samples are taken from the students and staff and faculty of Pak-Austria Fachhochschule Institute of Applied Science and Technology.</div></div>\",\"PeriodicalId\":10973,\"journal\":{\"name\":\"Data in Brief\",\"volume\":\"58 \",\"pages\":\"Article 111220\"},\"PeriodicalIF\":1.0000,\"publicationDate\":\"2025-02-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11730949/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Data in Brief\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S235234092401182X\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"MULTIDISCIPLINARY SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Data in Brief","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S235234092401182X","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}

引用次数: 0

摘要

欣德科语是巴基斯坦西北地区主要使用的一种语言。大约有800万人说欣德科语。根据其母语人士的说法，它是巴基斯坦的第七大语言和开伯尔-普赫图赫瓦的第二大语言。哈扎拉地区是欣德科语的文化中心。在哈里普尔、阿伯塔巴德和曼瑟拉等地区，大约80%的人口说欣德科语。欣德科语的口语内容涵盖了广泛的主题，包括宗教、教育、诗歌、政治、戏剧等等。尽管如此，欣德科语仍缺乏一种语音识别系统，可以增强其可访问性，保护该语言，并促进其使用者的数字包容。本文提出了一个由17597个语音样本组成的语音识别数据集，可供公众用于学术和研究目的。该数据集由20个欣德科数字组成，范围从1到20，所有语音样本均取自奥地利理工大学应用科学技术学院的学生和教职员工。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A comprehensive voice dataset for Hindko digit recognition

Hindko is a language primarily spoken in Northwestern areas of Pakistan. Approximately eight million people speak the Hindko language. According to its native speakers, it is 7^th largest language of Pakistan and 2^nd largest language of Khyber Pakhtunkhwa. The Hazara region is the cultural hub of Hindko language. About 80% of the population in districts like Haripur, Abbotabad and Mansehra speak Hindko. The spoken content of Hindko covers a wide range of subjects, including religion, education, poetry, politics, theater, and more. Despite all this, Hindko lacks a voice recognition system that could enhance accessibility, preserve the language, and promote digital inclusion for its speakers. This paper presents a voice recognition dataset that consists of 17,597 voice samples, and is accessible to the public for academic and research purposes. The dataset consists of 20 Hindko digits ranging from 1 to 20 and all the voice samples are taken from the students and staff and faculty of Pak-Austria Fachhochschule Institute of Applied Science and Technology.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Data in Brief MULTIDISCIPLINARY SCIENCES-

CiteScore

3.10

自引率

0.00%

发文量

996

审稿时长

70 days

期刊介绍： Data in Brief provides a way for researchers to easily share and reuse each other''s datasets by publishing data articles that: -Thoroughly describe your data, facilitating reproducibility. -Make your data, which is often buried in supplementary material, easier to find. -Increase traffic towards associated research articles and data, leading to more citations. -Open up doors for new collaborations. Because you never know what data will be useful to someone else, Data in Brief welcomes submissions that describe data from all research areas.