Yasser M. Seddiq, A. Meftah, Mansour Al-Ghamdi, Y. Alotaibi
{"title":"重新介绍KAPD作为机器学习和数据挖掘应用的数据集","authors":"Yasser M. Seddiq, A. Meftah, Mansour Al-Ghamdi, Y. Alotaibi","doi":"10.1109/EMS.2016.022","DOIUrl":null,"url":null,"abstract":"KACST Arabic Phonetic Database (KAPD) has been in use by researchers for around fifteen years since its initial release. Researches in acoustics and phonetics have benefited from its phonetically rich content. In fact, KAPD has the potential to go further steps with the research community. In this work, KAPD is subject to enhancements and improvements in order to serve as dataset for machine learning and data mining application. This work involves refining and reviewing the already existing metadata of KAPD and adding new material that are necessary for machine learning and data mining applications. The updated phoneme statistics after the corpus upgrade are presented from different perspectives. Data format and time units are made compatible with those of HTK. The paper discusses the potential of KAPD to serve as either a balanced or an imbalanced dataset.","PeriodicalId":446936,"journal":{"name":"2016 European Modelling Symposium (EMS)","volume":"2675 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Reintroducing KAPD as a Dataset for Machine Learning and Data Mining Applications\",\"authors\":\"Yasser M. Seddiq, A. Meftah, Mansour Al-Ghamdi, Y. Alotaibi\",\"doi\":\"10.1109/EMS.2016.022\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"KACST Arabic Phonetic Database (KAPD) has been in use by researchers for around fifteen years since its initial release. Researches in acoustics and phonetics have benefited from its phonetically rich content. In fact, KAPD has the potential to go further steps with the research community. In this work, KAPD is subject to enhancements and improvements in order to serve as dataset for machine learning and data mining application. This work involves refining and reviewing the already existing metadata of KAPD and adding new material that are necessary for machine learning and data mining applications. The updated phoneme statistics after the corpus upgrade are presented from different perspectives. Data format and time units are made compatible with those of HTK. The paper discusses the potential of KAPD to serve as either a balanced or an imbalanced dataset.\",\"PeriodicalId\":446936,\"journal\":{\"name\":\"2016 European Modelling Symposium (EMS)\",\"volume\":\"2675 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 European Modelling Symposium (EMS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/EMS.2016.022\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 European Modelling Symposium (EMS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/EMS.2016.022","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Reintroducing KAPD as a Dataset for Machine Learning and Data Mining Applications
KACST Arabic Phonetic Database (KAPD) has been in use by researchers for around fifteen years since its initial release. Researches in acoustics and phonetics have benefited from its phonetically rich content. In fact, KAPD has the potential to go further steps with the research community. In this work, KAPD is subject to enhancements and improvements in order to serve as dataset for machine learning and data mining application. This work involves refining and reviewing the already existing metadata of KAPD and adding new material that are necessary for machine learning and data mining applications. The updated phoneme statistics after the corpus upgrade are presented from different perspectives. Data format and time units are made compatible with those of HTK. The paper discusses the potential of KAPD to serve as either a balanced or an imbalanced dataset.