S. Butt, Maaz Amjad, Fazlourrahman Balouchzah, Noman Ashraf, Rajesh Sharma, G. Sidorov, A. Gelbukh
{"title":"EmoThreat@FIRE2022: Shared Track on Emotions and Threat Detection in Urdu","authors":"S. Butt, Maaz Amjad, Fazlourrahman Balouchzah, Noman Ashraf, Rajesh Sharma, G. Sidorov, A. Gelbukh","doi":"10.1145/3574318.3574327","DOIUrl":"https://doi.org/10.1145/3574318.3574327","url":null,"abstract":"Many languages with a wealth of resources have been researched to solve the challenges of emotion and targeted abuse detection, i.e. threat. But when it comes to languages, such as Urdu, it is noted that there is a severe lack of both resources and approaches in terms of Urdu language processing. Therefore, this study concentrated on offering resources for Urdu by organizing a shared task called “EmoThreat: Emotions and Threat detection in Urdu\". The task offered two tasks: (i) multi-label emotion classification (Task A), and (ii) binary threat detection (Task B). Task B was a multi-class problem since it was further subdivided into the identification of threats posed by groups and individuals. This paper provides an overview of the methodology and results obtained by each of the 10 distinct teams who participated in the shared task. In addition, each group presented a detailed error analysis as part of their submission for the best model. The top-performing system in Task A received a macro-F1 score of 0.687. In contrast, subtask 1 of Task B received a score of 0.716 macro-F1 while subtask 2 of Task B obtained a 0.539 macro-F1 score.","PeriodicalId":270700,"journal":{"name":"Proceedings of the 14th Annual Meeting of the Forum for Information Retrieval Evaluation","volume":"80 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124259216","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Overview of the FIRE 2022 track: Information Retrieval from Microblogs during Disasters (IRMiDis)","authors":"Soham Poddar, Moumita Basu, Kripabandhu Ghosh, Saptarshi Ghosh","doi":"10.1145/3574318.3574319","DOIUrl":"https://doi.org/10.1145/3574318.3574319","url":null,"abstract":"Microblogging sites such as Twitter play an important role in dealing with various mass emergencies including natural disasters and pandemics. Over the last several years, the track on Information Retrieval from Microblogs during Disasters (IRMiDis), organized as part of the FIRE conference series, has provided annotated datasets for developing ML/NLP techniques for utilizing microblogs for various practical tasks that would help authorities better deal with disaster situations. In particular, the FIRE 2022 IRMiDis track focused on two important tasks – (i) to detect the vaccine-related stance of tweets related to COVID-19 vaccines, and (ii) to detect reporting of COVID-19 symptom in tweets.","PeriodicalId":270700,"journal":{"name":"Proceedings of the 14th Annual Meeting of the Forum for Information Retrieval Evaluation","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116927560","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jian Wu, Shaurya Rohatgi, Manoj K. Angadi, Kavya S. Puranik, C. Lee Giles
{"title":"Design Considerations for a Sustainable Scholarly Big Data Service","authors":"Jian Wu, Shaurya Rohatgi, Manoj K. Angadi, Kavya S. Puranik, C. Lee Giles","doi":"10.1145/3574318.3574340","DOIUrl":"https://doi.org/10.1145/3574318.3574340","url":null,"abstract":"The advancement of web programming techniques, such as Ajax and jQuery, and datastores, such as Apache Solr and Elasticsearch, have made it much easier to deploy small to medium scale web-based search engines. However, developing a sustainable search engine that supports scholarly big data services is still challenging often because of limited human resources and financial support. Such scenarios are typical in academic settings or small businesses. Here, we showcase how four key design decisions were made by trading-off competing factors such as performance, cost, and efficiency, when developing the Next Generation CiteSeerX (NGX), the successor of CiteSeerX, which was a pioneering digital library search engine that has been serving academic communities for more than two decades. This work extends our previous work in Wu et al. (2021) and discusses design considerations of infrastructure, web applications, indexing, and document filtering. These design considerations can be generalized to other web-based search engines with a similar scale that are deployed in small business or academic settings with limited resources.","PeriodicalId":270700,"journal":{"name":"Proceedings of the 14th Annual Meeting of the Forum for Information Retrieval Evaluation","volume":"76 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130387342","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Topic-Mono-BERT: A Joint Retrieval-Clustering System for Retrieving Overview Passages","authors":"Sumanta Kashyapi, Laura Dietz","doi":"10.1145/3574318.3574336","DOIUrl":"https://doi.org/10.1145/3574318.3574336","url":null,"abstract":"For most queries, the set of relevant documents spans multiple subtopics. Inspired by the neural ranking models and query-specific neural clustering models, we develop Topic-Mono-BERT which performs both tasks jointly. Based on text embeddings of BERT, our model learns a shared embedding that is optimized for both tasks. The clustering hypothesis would suggest that embeddings which place topically similar text in close proximity will also perform better on ranking tasks. Our model is trained with the Wikimarks approach to obtain training signals for relevance and subtopics on the same queries. Our task is to identify overview passages that can be used to construct a succinct answer to the query. Our empirical evaluation on two publicly available passage retrieval datasets suggests that including the clustering supervision in the ranking model leads to about improvement in identifying text passages that summarize different subtopics within a query.","PeriodicalId":270700,"journal":{"name":"Proceedings of the 14th Annual Meeting of the Forum for Information Retrieval Evaluation","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129103848","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Subhabrata Dutta, Rudra Dhar, Prantik Guha, Arpan Murmu, Dipankar Das
{"title":"A Multilingual Dataset for Identification of Factual Claims in Indian Twitter","authors":"Subhabrata Dutta, Rudra Dhar, Prantik Guha, Arpan Murmu, Dipankar Das","doi":"10.1145/3574318.3574348","DOIUrl":"https://doi.org/10.1145/3574318.3574348","url":null,"abstract":"The need for automated fact-checking is getting prominent with every passing day as the spread of misinformation is swelling over the ever-increasing stream of online content. We focus on fine-grained labelling of factual information in tweets to facilitate better fact-checking systems capable of providing improved justifications. In this paper, we present a token-level annotation of factual claims in tweets from Indian Twitter. To deal with the multilingual variety of the Indian diaspora, we deal with tweets in English, Bengali, Hindi, and their codemixed variants. To the best of our knowledge, this dataset is first of kind, both in terms of labelling scheme as well as data sources.","PeriodicalId":270700,"journal":{"name":"Proceedings of the 14th Annual Meeting of the Forum for Information Retrieval Evaluation","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126168004","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Classification of Waste Materials using CNN Based on Transfer Learning","authors":"Sujan Poudel, Prakash Poudyal","doi":"10.1145/3574318.3574345","DOIUrl":"https://doi.org/10.1145/3574318.3574345","url":null,"abstract":"Waste Management is important for humans as well as nature for healthy life and a clean environment. The major step for effective waste management is the segregation of waste according to its types. The advancement of technology such as hardware and artificial intelligence is used for the segregation of waste. There are several machine learning and deep learning algorithms available for image classification. Among them, Convolutional Neural Network is the most used one. The main objective of this work is to classify images of waste materials using CNN into seven categories (cardboard, glass, metal, organic, paper, plastic, and trash). Then, cardboard, organic, and paper class images are considered biodegradable waste, and other classes are considered non-biodegradable waste. The pre-trained CNN model such as InceptionV3, InceptionResNetV2, Xception, VGG19, MobileNet, ResNet50 and DenseNet201 have been trained and performed fine-tuning on the waste dataset. Among these models, the VGG19 model performed with less accuracy, whereas the InceptionV3 model performed with high learning accuracy. Overall, the obtained result is promising.","PeriodicalId":270700,"journal":{"name":"Proceedings of the 14th Annual Meeting of the Forum for Information Retrieval Evaluation","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129447689","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Triplet Loss based Siamese Networks for Automatic Short Answer Grading","authors":"Nagamani Yeruva, Sarada Venna, Hemalatha Indukuri, Mounika Marreddy","doi":"10.1145/3574318.3574337","DOIUrl":"https://doi.org/10.1145/3574318.3574337","url":null,"abstract":"Grading student work is critical for assessing their understanding and providing necessary feedback. However, answer grading can become monotonous for teachers. On the standard ASAG data set, our system shows substantial improvements in classification disparity of correct and incorrect answers from a reference answer compared to baseline methods. Our supervised model (1) utilizes recent advances in semantic word embeddings and (2) implements ideas from one-shot learning methods, which are proven to work with minimal. We present experimental results from a model based on different approaches and demonstrates decent performance on standard benchmark dataset.","PeriodicalId":270700,"journal":{"name":"Proceedings of the 14th Annual Meeting of the Forum for Information Retrieval Evaluation","volume":"100 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117256924","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"FIRE 2022 ILSUM Track: Indian Language Summarization","authors":"Shrey Satapara, Bhavan Modha, Sandip J Modha, Parth Mehta","doi":"10.1145/3574318.3574328","DOIUrl":"https://doi.org/10.1145/3574318.3574328","url":null,"abstract":"This abstract provides a short overview of the first edition of the shared task on Indian Language Summarization (ILSUM) organized at the 14th Forum for Information Retrieval Evaluation (FIRE 2022). A more detailed discussion is available in the track overview paper. The objective of this shared task was to create benchmark data for text summarization in Indian languages. This edition included three languages Hindi, Gujarati, and Indian English which is an officially recognized dialect of English mainly used in the Indian subcontinent. The task saw an enthusiastic response, with registrations from over 50 teams. A total of 12 teams submitted test runs across the three languages out of which 10 teams submitted working notes. Standard ROUGE metrics were used as the evaluation metric.","PeriodicalId":270700,"journal":{"name":"Proceedings of the 14th Annual Meeting of the Forum for Information Retrieval Evaluation","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127789607","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Thomas Mandl, Sandip J Modha, Gautam Kishore Shahi, Hiren Madhu, Shrey Satapara, Prasenjit Majumder, Johannes Schäfer, Tharindu Ranasinghe, Marcos Zampieri, D. Nandini, A. Jaiswal
{"title":"Overview of the HASOC Subtrack at FIRE 2022: Hate Speech and Offensive Content Identification in English and Indo-Aryan Languages","authors":"Thomas Mandl, Sandip J Modha, Gautam Kishore Shahi, Hiren Madhu, Shrey Satapara, Prasenjit Majumder, Johannes Schäfer, Tharindu Ranasinghe, Marcos Zampieri, D. Nandini, A. Jaiswal","doi":"10.1145/3574318.3574326","DOIUrl":"https://doi.org/10.1145/3574318.3574326","url":null,"abstract":"In recent years, the spread of online offensive content has become of great concern, motivating researchers to develop robust systems capable of identifying such content automatically. To carry out a fair evaluation of these systems, several international shared tasks have been organized, providing the community with essential benchmark data and evaluation methods for various languages. Organized since 2019, the HASOC (Hate Speech and Offensive Content Identification) shared task is one of these initiatives. In its fourth iteration, HASOC 2022 included three tasks for English-Hindi codemix, German and Marathi. Tasks 1 and 2 were on conversational hate speech detection. The idea is to detect supporting hate speech, profanity, or other forms of offensiveness depending on the surrounding context of Twitter posts. Task 1 was offered in Hindi-English codemix and German. Task 2 was provided for Hindi-English codemix, and it was focused on further classifying the problematic tweets in conversational hate speech into standalone and contextual hate. This paper presents a brief description of tasks, data, and participation.","PeriodicalId":270700,"journal":{"name":"Proceedings of the 14th Annual Meeting of the Forum for Information Retrieval Evaluation","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121852142","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Findings of shared task on Sentiment Analysis and Homophobia Detection of YouTube Comments in Code-Mixed Dravidian Languages","authors":"Subalalitha Chinnaudayar Navaneethakrishnan, Bharathi Raja Chakravarthi, Kogilavani Shanmugavadivel, Malliga Subramanian, Prasanna Kumar Kumaresan, Bharathi, Lavanya Sambath Kumar, Rahul Ponnusamy","doi":"10.1145/3574318.3574347","DOIUrl":"https://doi.org/10.1145/3574318.3574347","url":null,"abstract":"We present an overview of sentiment analysis and homophobia detection of YouTube comments in code-mixed Dravidian languages in this paper. We provide the details of this task and the submitted systems for the tasks. We introduce two studies: task A for detecting sentiment analysis and task B on homophobia detection, which is organized by the FIRE 2022. A total of 95 participants registered for the shared task, 13 teams finally submitted their results for task-A a, and 10 teams submitted their results for task B. The teams explored tasks A and B using traditional machine learning and deep learning models. Most of the benchmark systems have been analyzed by participants capable of handling code-mixed scenarios in Dravidian languages.","PeriodicalId":270700,"journal":{"name":"Proceedings of the 14th Annual Meeting of the Forum for Information Retrieval Evaluation","volume":"62 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116226393","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}