Ivanda Zevi Amalia, Akbar Noto Ponco Bimantoro, A. Arifin, Maryamah Faisol, R. Indraswari, Riska Wakhidatus Sholikah
{"title":"伪关联反馈查询扩展中印尼语翻译的圣训内容权重","authors":"Ivanda Zevi Amalia, Akbar Noto Ponco Bimantoro, A. Arifin, Maryamah Faisol, R. Indraswari, Riska Wakhidatus Sholikah","doi":"10.21107/kursor.v11i1.249","DOIUrl":null,"url":null,"abstract":"In general, hadith consists of isnad and matan (content). Matan can be separated into several components for example a story, main content, and some additional information. Other texts besides main content, such as isnad and story can interfere the retrieval process of relevant documents because most users typically use simple queries. Thus, in this paper, we proposed a Named Entity Recognition (NER) component weighting model in improving the Indonesian hadith retrieval system. We did 3 test scenarios, the first scenario (S1) did not separate the hadith into several components, the second scenario (S2) separated the hadith into 2 components, isnad and matan, and the third scenario separated the hadith into 4 components, isnad, background story, content, and additional information. From the experimental results, it is found that the TF-IDF with rocchio algorithm in query expansion outperforms DocVec. Also, separation and weighting of the hadith components affect the retrieval performance because isnad can be considered as noise in a query. Separation of 2 separate components had the best overall results in general although 4 separate components showed better results in some cases with precision up to 100% and 70% recall.","PeriodicalId":52605,"journal":{"name":"Jurnal Ilmiah Kursor Menuju Solusi Teknologi Informasi","volume":"15 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2021-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"INDONESIAN-TRANSLATED HADITH CONTENT WEIGHTING IN PSEUDO-RELEVANCE FEEDBACK QUERY EXPANSION\",\"authors\":\"Ivanda Zevi Amalia, Akbar Noto Ponco Bimantoro, A. Arifin, Maryamah Faisol, R. Indraswari, Riska Wakhidatus Sholikah\",\"doi\":\"10.21107/kursor.v11i1.249\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In general, hadith consists of isnad and matan (content). Matan can be separated into several components for example a story, main content, and some additional information. Other texts besides main content, such as isnad and story can interfere the retrieval process of relevant documents because most users typically use simple queries. Thus, in this paper, we proposed a Named Entity Recognition (NER) component weighting model in improving the Indonesian hadith retrieval system. We did 3 test scenarios, the first scenario (S1) did not separate the hadith into several components, the second scenario (S2) separated the hadith into 2 components, isnad and matan, and the third scenario separated the hadith into 4 components, isnad, background story, content, and additional information. From the experimental results, it is found that the TF-IDF with rocchio algorithm in query expansion outperforms DocVec. Also, separation and weighting of the hadith components affect the retrieval performance because isnad can be considered as noise in a query. Separation of 2 separate components had the best overall results in general although 4 separate components showed better results in some cases with precision up to 100% and 70% recall.\",\"PeriodicalId\":52605,\"journal\":{\"name\":\"Jurnal Ilmiah Kursor Menuju Solusi Teknologi Informasi\",\"volume\":\"15 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Jurnal Ilmiah Kursor Menuju Solusi Teknologi Informasi\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.21107/kursor.v11i1.249\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Jurnal Ilmiah Kursor Menuju Solusi Teknologi Informasi","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.21107/kursor.v11i1.249","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
INDONESIAN-TRANSLATED HADITH CONTENT WEIGHTING IN PSEUDO-RELEVANCE FEEDBACK QUERY EXPANSION
In general, hadith consists of isnad and matan (content). Matan can be separated into several components for example a story, main content, and some additional information. Other texts besides main content, such as isnad and story can interfere the retrieval process of relevant documents because most users typically use simple queries. Thus, in this paper, we proposed a Named Entity Recognition (NER) component weighting model in improving the Indonesian hadith retrieval system. We did 3 test scenarios, the first scenario (S1) did not separate the hadith into several components, the second scenario (S2) separated the hadith into 2 components, isnad and matan, and the third scenario separated the hadith into 4 components, isnad, background story, content, and additional information. From the experimental results, it is found that the TF-IDF with rocchio algorithm in query expansion outperforms DocVec. Also, separation and weighting of the hadith components affect the retrieval performance because isnad can be considered as noise in a query. Separation of 2 separate components had the best overall results in general although 4 separate components showed better results in some cases with precision up to 100% and 70% recall.