2018 Fourth International Conference on Information Retrieval and Knowledge Management (CAMP)最新文献

Functional Requirements for Creating Reliable Self-Screening Tests by Non-Developers 由非开发人员创建可靠的自筛选测试的功能需求

2018 Fourth International Conference on Information Retrieval and Knowledge Management (CAMP) Pub Date : 2018-09-13 DOI: 10.1109/INFRKM.2018.8464821

Ninie Sumarni Abdullah, Muthukkaruppan Annamalai

{"title":"Functional Requirements for Creating Reliable Self-Screening Tests by Non-Developers","authors":"Ninie Sumarni Abdullah, Muthukkaruppan Annamalai","doi":"10.1109/INFRKM.2018.8464821","DOIUrl":"https://doi.org/10.1109/INFRKM.2018.8464821","url":null,"abstract":"Self-screening test is a questionnaire created for public users to guide them to determine his/her health condition and its severity. Computer-based online self-screening tests are easily accessible and also allow the creation of more inclusive questions. However, there are two prevailing issues: 1) The need to depend on the application developers to create the tests and, 2) There are doubts about the reliability of the respondents' answers to the comprehensive questions. Existing self-screening tests do not provide adequate supports to tackle the dependency and reliability issues, which our research addresses. To deal with the dependency on the application developers, we propose to equip computer-based test setters with an editing template. It can facilitate test authors (authorised experts without programming skills) to create the self-screening tests on their own. To deal with the reliability concern that originates from inconsistent users responses, we propose to facilitate the test authors to specify definite rules to detect and handle inconsistent responses in the screening test. Following the initial stages of the Engineering research approach, we gathered and analysed a number of health screening questionnaires and reviewed various methods for detecting inconsistencies in users' feedback. Based on our review and analysis, we identified four key components of the test editing template: Question setting, Answer scoring, Consistency checking and Decision-making. Consequently, we specify the functional requirements of each of these components in this paper, which form the basis for the design and development of the screening test editing template.","PeriodicalId":196731,"journal":{"name":"2018 Fourth International Conference on Information Retrieval and Knowledge Management (CAMP)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125935710","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Feasibility of Using the Position as Feature for Idea Identification from Text 用位置作为特征进行文本思想识别的可行性

2018 Fourth International Conference on Information Retrieval and Knowledge Management (CAMP) Pub Date : 2018-09-13 DOI: 10.1109/INFRKM.2018.8464819

M. Alksher, A. Azman, R. Yaakob, R. A. Kadir, Abdulmajid Mohamed, Eissa Alshari

引用次数: 2

Analyzing Malay Stemmer Performance Towards Fuzzy Logic Ranking Function on Malay Text Corpus 马来语词干对马来语语料库模糊逻辑排序功能的性能分析

2018 Fourth International Conference on Information Retrieval and Knowledge Management (CAMP) Pub Date : 2018-09-13 DOI: 10.1109/INFRKM.2018.8464767

Shaiful Bakhtiar bin Rodzman, Mohamad Fitri Izuan Abdul Ronie, N. K. Ismail, Nurazzah Abd Rahman, F. Ahmad, Z. M. Nor

{"title":"Analyzing Malay Stemmer Performance Towards Fuzzy Logic Ranking Function on Malay Text Corpus","authors":"Shaiful Bakhtiar bin Rodzman, Mohamad Fitri Izuan Abdul Ronie, N. K. Ismail, Nurazzah Abd Rahman, F. Ahmad, Z. M. Nor","doi":"10.1109/INFRKM.2018.8464767","DOIUrl":"https://doi.org/10.1109/INFRKM.2018.8464767","url":null,"abstract":"In a way to make the result of Information Retrieval (IR) more accurate, a stemmer is needed to differentiate the words in searching useful information. This research aims to analyze both processing speed and accuracy of the Malay Language Stemmer such as Fatimah Stemmer and UniSZA Stemmer. This research will also compare the performance of Fuzzy Logic Ranking Function using the both stemmer. Evaluation of Recall and Precision using the relevant judgement list by the expert. The results presented UniSZA Stemmer clearly dominated the Fatimah Stemmer processing speed performance with faster times recorded in each set of the experiment, however, in term of accuracy, unfortunately Fatimah Stemmer has clearly dominated the UniSZA stemming accuracy performance with having much more correct stemmed words for each set of the experiment. The results also showed that Fuzzy Logic Ranking with Fatimah Stemmer has outperformed Fuzzy Logic Ranking with UniSZA Stemmer and English Porter Stemmer on 5 out of 8 Topic Set of query results on the Mean Average Precision measure. Fuzzy Logic Ranking with Fatimah Stemmer also gets the best result on the Precision at Rank 10, Mean Average Precision and the percentage of no relevant document in the top ten retrieved measures, on the topic that has most queries which is topic ‘Umum’ that has a total of 11 queries.","PeriodicalId":196731,"journal":{"name":"2018 Fourth International Conference on Information Retrieval and Knowledge Management (CAMP)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123514560","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

A Review on Building Bilingual Comparable Corpora for Resource-limited Languages 资源有限语言双语可比语料库建设综述

2018 Fourth International Conference on Information Retrieval and Knowledge Management (CAMP) Pub Date : 2018-09-13 DOI: 10.1109/INFRKM.2018.8464798

Nurul Amelina Nasharuddin, M. T. Abdullah, A. Azman, R. A. Kadir

引用次数: 1

Building the Classical Arabic Named Entity Recognition Corpus (CANERCorpus) 建立经典阿拉伯语命名实体识别语料库(CANERCorpus)

2018 Fourth International Conference on Information Retrieval and Knowledge Management (CAMP) Pub Date : 2018-09-13 DOI: 10.1109/INFRKM.2018.8464820

Ramzi Salah, Lailatul Qadri Binti Zakaria

{"title":"Building the Classical Arabic Named Entity Recognition Corpus (CANERCorpus)","authors":"Ramzi Salah, Lailatul Qadri Binti Zakaria","doi":"10.1109/INFRKM.2018.8464820","DOIUrl":"https://doi.org/10.1109/INFRKM.2018.8464820","url":null,"abstract":"The past decade has witnessed construction of the background information resources to overcome several challenges in text mining tasks. For non-English languages with poor knowledge sources such as Arabic, these challenges have become more salient especially for handling the natural language processing applications that require human annotation. In the Named Entity Recognition (NER) task, several researches have been introduced to address the complexity of Arabic in terms of morphological and syntactical variations. However, there are a small number of studies dealing with Classical Arabic (CA) that is the official language of Quran and Hadith. CA was also used for archiving the Islamic topics that contain a lot of useful information which could of great value if extracted. Therefore, in this paper, we introduce Classical Arabic Named Entity Recognition corpus as a new corpus of tagged data that can be useful for handling the issues in recognition of Arabic named entities. It is freely available and manual annotation by human experts, containing more than 7,000 Hadiths. Based on Islamic topics, we classify named entities into 20 types which include the specific-domain entities that have not been handled before such as Allah, Prophet, Paradise, Hell, and Religion. The differences between the standard and classical Arabic are described in details during this work. Moreover, the comprehensive statistical analysis is introduced to measure the factors that play important role in manual human annotation.","PeriodicalId":196731,"journal":{"name":"2018 Fourth International Conference on Information Retrieval and Knowledge Management (CAMP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129119216","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 14

Effectiveness of Latent Dirichlet Allocation Model for Semantic Information Retrieval on Malay Document 潜在Dirichlet分配模型在马来语文献语义信息检索中的有效性

2018 Fourth International Conference on Information Retrieval and Knowledge Management (CAMP) Pub Date : 2018-09-13 DOI: 10.1109/INFRKM.2018.8464782

Nurul Syeilla Syazhween Binti Zulkefli, N. A. Abdul Rahman, Mazidah Puteh, Zainab Binti Abu Bakar

{"title":"Effectiveness of Latent Dirichlet Allocation Model for Semantic Information Retrieval on Malay Document","authors":"Nurul Syeilla Syazhween Binti Zulkefli, N. A. Abdul Rahman, Mazidah Puteh, Zainab Binti Abu Bakar","doi":"10.1109/INFRKM.2018.8464782","DOIUrl":"https://doi.org/10.1109/INFRKM.2018.8464782","url":null,"abstract":"Current research usually adopts the standard process of Vector Space Model (VSM) in searching and retrieving information on Malay documents. However, this technique is less effective for semantic information retrieval from the collection. The system will only retrieve documents which contain the user's query terms and ignore semantic information among those terms. Therefore, several documents that have similar context are ignored and several document context that share a single term are retrieved. Due to this problem, Latent Dirichlet Allocation (LDA) model is applied for semantic information retrieval on Malay documents. An experiment was illustrated based on 6 queries text and 50 Hadith documents translated in Malay language, composed of Shahih Bukhari collections. Experimental results proved that the LDA model gives promising results in retrieving semantic information in Malay translated Hadith documents compare to existing techniques. Some limitation from this study can be explored for future work in order to improve the effectiveness of the retrieval results. Overall, LDA is an effective method for semantic information retrieval on Malay document, thus, it can help people to easily search and retrieve semantic information from Malay documents.","PeriodicalId":196731,"journal":{"name":"2018 Fourth International Conference on Information Retrieval and Knowledge Management (CAMP)","volume":"254 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114326103","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Investigation of Data Representation Methods with Machine Learning Algorithms for Biomedical Named Enttity Recognition 基于机器学习算法的生物医学命名实体识别数据表示方法研究

2018 Fourth International Conference on Information Retrieval and Knowledge Management (CAMP) Pub Date : 2018-09-13 DOI: 10.1109/INFRKM.2018.8464816

Maan Tareq Abd, M. Mohd, Mustafa Tareq Abd

{"title":"Investigation of Data Representation Methods with Machine Learning Algorithms for Biomedical Named Enttity Recognition","authors":"Maan Tareq Abd, M. Mohd, Mustafa Tareq Abd","doi":"10.1109/INFRKM.2018.8464816","DOIUrl":"https://doi.org/10.1109/INFRKM.2018.8464816","url":null,"abstract":"Biomedical entities recognition such as gene, protein, chemicals and diseases is the first and most fundamental biomedical literature mining task. Most of recent biomedical named entity recognition (Bio-NER) methods rely on predefined features which try to capture the specific surface properties of entity types. However, these empirically predefined feature sets differ between entity types and they are complex manually constructed which make their development costly. This paper presents a comparative evaluation of traditional feature representation method and new prototypical representation methods with three machine learning classifiers (Support Vector Machine (SVM), Naive Bayes (NB), and K-Nearest Neighbor (KNN)) for Bio-NER. Several comparative experiments are conducted on widely used standard Bio-NER dataset namely GENIA corpus. This paper demonstrates that prototypical word representation methods can be successfully used for Bio-NER. Experimental results show that the prototypical representation methods improved the performance of the three machine learning models. Finally, the experiments indicate that the SVM classifier with prototypical representation methods yields the best result.","PeriodicalId":196731,"journal":{"name":"2018 Fourth International Conference on Information Retrieval and Knowledge Management (CAMP)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127374965","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Morphological Analysis of Malay Words for Resolving Ambiguity 马来语词语歧义消解的形态分析

2018 Fourth International Conference on Information Retrieval and Knowledge Management (CAMP) Pub Date : 2018-09-13 DOI: 10.1109/INFRKM.2018.8464773

M. F. Yahaya, Nurazzah Abd Rahman, Z. Bakar

{"title":"Morphological Analysis of Malay Words for Resolving Ambiguity","authors":"M. F. Yahaya, Nurazzah Abd Rahman, Z. Bakar","doi":"10.1109/INFRKM.2018.8464773","DOIUrl":"https://doi.org/10.1109/INFRKM.2018.8464773","url":null,"abstract":"The issue of morphological uncertainty is broadly tended to in the cutting edge in Natural Language Processing (NLP). For the most part, vagueness is settled with the utilization of substantial physically explained corpora and machine learning. Be that as it may, such strategies do not generally accessible, as great preparing information is not available for all dialects. In this paper, we introduce a technique for disambiguation without highest quality level corpora utilizing a few factual models, to be specific, Braille Translation Algorithms and unambiguous N-grams from the naturally explained corpus. Every one of the strategies was tried on the Corpus of Glosbe and on the Corpus of Dewan Bahasa Pustaka (DBP). Therefore, more than a half of words with uncertain examinations were disambiguated in the two corpora, exhibiting high exactness. Our technique for morphological disambiguation shows that it is conceivable to dispose of a portion of the uncertain examinations in the corpus without particular phonetic assets, just with the utilization of crude information, where all conceivable morphological investigations for each word are shown.","PeriodicalId":196731,"journal":{"name":"2018 Fourth International Conference on Information Retrieval and Knowledge Management (CAMP)","volume":"123 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126468849","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Enhancing Multi-Aspect Collaborative Filtering for Personalized Recommendation 个性化推荐的多方面协同过滤增强

2018 Fourth International Conference on Information Retrieval and Knowledge Management (CAMP) Pub Date : 2018-09-13 DOI: 10.1109/INFRKM.2018.8464760

N. Khairudin, N. Sharef, N. Mustapha, Shahrul Azman Mohd Noah

引用次数: 1

Automated Semantic Query Formulation for Document Retrieval 用于文档检索的自动语义查询公式

2018 Fourth International Conference on Information Retrieval and Knowledge Management (CAMP) Pub Date : 2018-09-13 DOI: 10.1109/INFRKM.2018.8464786

R. A. Kadir, A. Yauri, A. Azman

{"title":"Automated Semantic Query Formulation for Document Retrieval","authors":"R. A. Kadir, A. Yauri, A. Azman","doi":"10.1109/INFRKM.2018.8464786","DOIUrl":"https://doi.org/10.1109/INFRKM.2018.8464786","url":null,"abstract":"Introduction to the Semantic Web is the chances for easier and effective access to the constantly increasing heterogeneous data on the Web. Currently, the data is able to be retrieved semantically rather than through traditional keyword based searches, which usually return lots of irrelevant information. However, one of the main challenges of the Semantic Web is that data are stored in a structured RDF triple format and are retrieved using complex structured triple represented queries, such as SPARQL, instead of preferred natural language queries and this problem remains subject to research. The proposed AutoSDoR, meaning Automated Semantic Document Retrieval, enables the semantic formulation of natural language queries to structured triple representation based on the machine learning approach in order to retrieve documents from the structured RDF triple format. Additionally the research goes beyond small fragment queries, such as in FREyA to paragraph length query. Automatic disambiguation of query terms that are not covered in WordNet is also proposed, which contributes to the increase in precision and recall of the retrieved document.","PeriodicalId":196731,"journal":{"name":"2018 Fourth International Conference on Information Retrieval and Knowledge Management (CAMP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121830539","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2