Proceedings of the 9th Annual Meeting of the Forum for Information Retrieval Evaluation最新文献

筛选
英文 中文
A Comparative Study of Named Entity Recognition for Telugu 泰卢固语命名实体识别的比较研究
SaiKiranmai Gorla, N. B. Murthy, Aruna Malapati
{"title":"A Comparative Study of Named Entity Recognition for Telugu","authors":"SaiKiranmai Gorla, N. B. Murthy, Aruna Malapati","doi":"10.1145/3158354.3158358","DOIUrl":"https://doi.org/10.1145/3158354.3158358","url":null,"abstract":"In this paper, we apply three classification learning algorithms to Telugu Named Entity Recognition (NER) task and we present a comparative study between these three learning algorithms on Telugu dataset (NER for South and South-East Asian Languages (NERSSEAL) Competition). The empirical results show that Support Vector Machine achieves the best F-measure of 54.78% on the dataset.","PeriodicalId":306212,"journal":{"name":"Proceedings of the 9th Annual Meeting of the Forum for Information Retrieval Evaluation","volume":"302 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133725832","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Feature Space of Deep Learning and its Importance: Comparison of Clustering Techniques on the Extended Space of ML-ELM 深度学习的特征空间及其重要性:ML-ELM扩展空间上聚类技术的比较
R. Roul, Amit Agarwal
{"title":"Feature Space of Deep Learning and its Importance: Comparison of Clustering Techniques on the Extended Space of ML-ELM","authors":"R. Roul, Amit Agarwal","doi":"10.1145/3158354.3158359","DOIUrl":"https://doi.org/10.1145/3158354.3158359","url":null,"abstract":"Based on the architecture of deep learning, Multilayer Extreme Learning Machine (ML-ELM) has many good characteristics which make it distinct and widespread classifier in the domain of text mining. Some of its salient features include non-linear mapping of features into a high dimensional space, high level of data abstraction, no backpropagation, higher rate of learning etc. This paper studies the importance of ML-ELM feature space and tested the performance of various traditional clustering techniques on this feature space. Empirical results show the efficiency and effectiveness of the feature space of ML-ELM compared to TF-IDF vector space which justifies the prominence of deep learning.","PeriodicalId":306212,"journal":{"name":"Proceedings of the 9th Annual Meeting of the Forum for Information Retrieval Evaluation","volume":"96 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121101298","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A Comparison of Automatic Search Query Enhancement Algorithms That Utilise Wikipedia as a Source of A Priori Knowledge 利用维基百科作为先验知识来源的自动搜索查询增强算法的比较
Kyle Goslin, M. Hofmann
{"title":"A Comparison of Automatic Search Query Enhancement Algorithms That Utilise Wikipedia as a Source of A Priori Knowledge","authors":"Kyle Goslin, M. Hofmann","doi":"10.1145/3158354.3158356","DOIUrl":"https://doi.org/10.1145/3158354.3158356","url":null,"abstract":"This paper describes the benchmarking and analysis of five Automatic Search Query Enhancement (ASQE) algorithms that utilise Wikipedia as the sole source for a priori knowledge. The contributions of this paper include: 1) A comprehensive review into current ASQE algorithms that utilise Wikipedia as the sole source for a priori knowledge; 2) benchmarking of five existing ASQE algorithms using the TREC-9 Web Topics on the ClueWeb12 data set and 3) analysis of the results from the benchmarking process to identify the strengths and weaknesses each algorithm. During the benchmarking process, 2,500 relevance assessments were performed. Results of these tests are analysed using the Average Precision @10 per query and Mean Average Precision @10 per algorithm. From this analysis we show that the scope of a priori knowledge utilised during enhancement and the available term weighting methods available from Wikipedia can further aid the ASQE process. Although approaches taken by the algorithms are still relevant, an over dependence on weighting schemes and data sources used can easily impact results of an ASQE algorithm.","PeriodicalId":306212,"journal":{"name":"Proceedings of the 9th Annual Meeting of the Forum for Information Retrieval Evaluation","volume":"143 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115631278","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Segmentation of Merged Lines and Script Identification in Handwritten Bilingual Documents 手写体双语文档中合并行分割与文字识别
Ranjana S. Zinjore, R. Ramteke, Varsha M. Pathak
{"title":"Segmentation of Merged Lines and Script Identification in Handwritten Bilingual Documents","authors":"Ranjana S. Zinjore, R. Ramteke, Varsha M. Pathak","doi":"10.1145/3158354.3158360","DOIUrl":"https://doi.org/10.1145/3158354.3158360","url":null,"abstract":"Text line segmentation is a challenging task in Optical Character Recognition, due to writing style of writers and touching characters or Matra between lines. In this paper, we have proposed an algorithm for dividing the merged lines into individual multiple lines from Handwritten Bilingual (Marathi-English) documents. The algorithm is tested on different images; we have obtained promising results. Afterward, script is identifying at word level using fusion of moment based features and visual discriminating features. Two different classifiers are evaluated on a dataset consisting of 242 Marathi-English words for training and 82 words for testing. We have received average identification accuracy of 67% in K-NN classifier and 80.14% in SVM classifier.","PeriodicalId":306212,"journal":{"name":"Proceedings of the 9th Annual Meeting of the Forum for Information Retrieval Evaluation","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122923167","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Language Identification in Mixed Script 混合文字中的语言识别
Nagesh Bhattu Sristy, N. S. Krishna, B. S. Krishna, V. Ravi
{"title":"Language Identification in Mixed Script","authors":"Nagesh Bhattu Sristy, N. S. Krishna, B. S. Krishna, V. Ravi","doi":"10.1145/3158354.3158357","DOIUrl":"https://doi.org/10.1145/3158354.3158357","url":null,"abstract":"The text exchanged in social media conversations is often noisy with a mixture of stylistic and misspelt variations of original words. Any standard NLP techniques applied on such data such as POS tagging, Named entity recognition suffer because of noisy nature of the input. Usage of mixed script text is also prevalent in social media users. The current work addresses the identification of language at word level in mixed script scenarios, where all the text is written in roman script but the words being used by the users are transliterations of original words in native language into english. The core part of the problem is identifying the language, looking at small fragments of text among a set of languages. We propose a two stage approach for word-level language identification. In the first stage a mixing language combination is identified by using character n-grams of the sentence. Second stage consists of using the previous mixing combination class to make the word level language identification. We apply Conditional Random Fields(CRF) further in second stage to improve the performance of the word level language identification. Such simplification is essential, otherwise the number of states of the model will be huge and resultant model predictions are very noisy. Our methods improve the F-score of word level language identification by over 10% compared to the base-line.","PeriodicalId":306212,"journal":{"name":"Proceedings of the 9th Annual Meeting of the Forum for Information Retrieval Evaluation","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124529960","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Improving Similar Question Retrieval using a Novel Tripartite Neural Network based Approach 基于三方神经网络的相似问题检索改进方法
Anirban Sen, Manjira Sinha, Sandya Mannarswamy
{"title":"Improving Similar Question Retrieval using a Novel Tripartite Neural Network based Approach","authors":"Anirban Sen, Manjira Sinha, Sandya Mannarswamy","doi":"10.1145/3158354.3158355","DOIUrl":"https://doi.org/10.1145/3158354.3158355","url":null,"abstract":"Collective intelligence of the crowds is distilled together in various Community Question Answering (CQA) Services such as Quora, Yahoo Answers, Stack Overflow forums, wherein users share their knowledge, providing both informational and experiential support to other users. As users often search for similar information, probabilities are high that for a new incoming question, there is a related question-answer pair existing in the CQA dataset. Therefore, an efficient technique for similar question identification is need of the hour. While data is not a bottleneck in this scenario, addressing the vocabulary diversity generated by a variety pool of users certainly is. This paper proposes a novel tripartite neural network based approach towards the similar question retrieval problem. The network takes inputs in the form of question-answer and new question triplet and learns internal representations from similarities among them. Our approach achieves classification performances upto 77% on a real world CQA dataset.We have also compared our method with two other baselines and found that it performs significantly better in handling the problem of vocabulary diversity and 'zero-lexical overlap' among questions.","PeriodicalId":306212,"journal":{"name":"Proceedings of the 9th Annual Meeting of the Forum for Information Retrieval Evaluation","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121387328","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Proceedings of the 9th Annual Meeting of the Forum for Information Retrieval Evaluation 第九届信息检索评估论坛年会论文集
{"title":"Proceedings of the 9th Annual Meeting of the Forum for Information Retrieval Evaluation","authors":"","doi":"10.1145/3158354","DOIUrl":"https://doi.org/10.1145/3158354","url":null,"abstract":"","PeriodicalId":306212,"journal":{"name":"Proceedings of the 9th Annual Meeting of the Forum for Information Retrieval Evaluation","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116363098","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信