A. Nentidis, Georgios Katsimpras, Eirini Vandorou, Anastasia Krithara, Antonio Miranda-Escalada, Luis Gasco, Martin Krallinger, G. Paliouras
{"title":"Overview of BioASQ 2022: The Tenth BioASQ Challenge on Large-Scale Biomedical Semantic Indexing and Question Answering","authors":"A. Nentidis, Georgios Katsimpras, Eirini Vandorou, Anastasia Krithara, Antonio Miranda-Escalada, Luis Gasco, Martin Krallinger, G. Paliouras","doi":"10.1007/978-3-031-13643-6_22","DOIUrl":"https://doi.org/10.1007/978-3-031-13643-6_22","url":null,"abstract":"","PeriodicalId":232729,"journal":{"name":"Conference and Labs of the Evaluation Forum","volume":"901 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114085235","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Prerona Tarannum, Firoj Alam, Md. Arid Hasan, S. R. H. Noori
{"title":"Z-Index at CheckThat! Lab 2022: Check-Worthiness Identification on Tweet Text","authors":"Prerona Tarannum, Firoj Alam, Md. Arid Hasan, S. R. H. Noori","doi":"10.48550/arXiv.2207.07308","DOIUrl":"https://doi.org/10.48550/arXiv.2207.07308","url":null,"abstract":"The wide use of social media and digital technologies facilitates sharing various news and information about events and activities. Despite sharing positive information misleading and false information is also spreading on social media. There have been efforts in identifying such misleading information both manually by human experts and automatic tools. Manual effort does not scale well due to the high volume of information, containing factual claims, are appearing online. Therefore, automatically identifying check-worthy claims can be very useful for human experts. In this study, we describe our participation in Subtask-1A: Check-worthiness of tweets (English, Dutch and Spanish) of CheckThat! lab at CLEF 2022. We performed standard preprocessing steps and applied different models to identify whether a given text is worthy of fact checking or not. We use the oversampling technique to balance the dataset and applied SVM and Random Forest (RF) with TF-IDF representations. We also used BERT multilingual (BERT-m) and XLM-RoBERTa-base pre-trained models for the experiments. We used BERT-m for the official submissions and our systems ranked as 3rd, 5th, and 12th in Spanish, Dutch, and English, respectively. In further experiments, our evaluation shows that transformer models (BERT-m and XLM-RoBERTa-base) outperform the SVM and RF in Dutch and English languages where a different scenario is observed for Spanish.","PeriodicalId":232729,"journal":{"name":"Conference and Labs of the Evaluation Forum","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130497035","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Shoukat, Khubaib Ahmad, Naina Said, Nasir Ahmad, Mohammed Hasanuzzaman, Kashif Ahmad
{"title":"A Late Fusion Framework with Multiple Optimization Methods for Media Interestingness","authors":"M. Shoukat, Khubaib Ahmad, Naina Said, Nasir Ahmad, Mohammed Hasanuzzaman, Kashif Ahmad","doi":"10.48550/arXiv.2207.04762","DOIUrl":"https://doi.org/10.48550/arXiv.2207.04762","url":null,"abstract":"The recent advancement in Multimedia Analytical, Computer Vision (CV), and Artificial Intelligence (AI) algorithms resulted in several interesting tools allowing an automatic analysis and retrieval of multimedia content of users' interests. However, retrieving the content of interest generally involves analysis and extraction of semantic features, such as emotions and interestingness-level. The extraction of such meaningful information is a complex task and generally, the performance of individual algorithms is very low. One way to enhance the performance of the individual algorithms is to combine the predictive capabilities of multiple algorithms using fusion schemes. This allows the individual algorithms to complement each other, leading to improved performance. This paper proposes several fusion methods for the media interestingness score prediction task introduced in CLEF Fusion 2022. The proposed methods include both a naive fusion scheme, where all the inducers are treated equally and a merit-based fusion scheme where multiple weight optimization methods are employed to assign weights to the individual inducers. In total, we used six optimization methods including a Particle Swarm Optimization (PSO), a Genetic Algorithm (GA), Nelder Mead, Trust Region Constrained (TRC), and Limited-memory Broyden Fletcher Goldfarb Shanno Algorithm (LBFGSA), and Truncated Newton Algorithm (TNA). Overall better results are obtained with PSO and TNA achieving 0.109 mean average precision at 10. The task is complex and generally, scores are low. We believe the presented analysis will provide a baseline for future research in the domain.","PeriodicalId":232729,"journal":{"name":"Conference and Labs of the Evaluation Forum","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131254251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Solutions for Fine-grained and Long-tailed Snake Species Recognition in SnakeCLEF 2022","authors":"Cheng Zou, Furong Xu, Meng Wang, Wenqi Li, Yuan Cheng","doi":"10.48550/arXiv.2207.01216","DOIUrl":"https://doi.org/10.48550/arXiv.2207.01216","url":null,"abstract":"Automatic snake species recognition is important because it has vast potential to help lower deaths and disabilities caused by snakebites. We introduce our solution in SnakeCLEF 2022 for fine-grained snake species recognition on a heavy long-tailed class distribution. First, a network architecture is designed to extract and fuse features from multiple modalities, i.e. photograph from visual modality and geographic locality information from language modality. Then, logit adjustment based methods are studied to relieve the impact caused by the severe class imbalance. Next, a combination of supervised and self-supervised learning method is proposed to make full use of the dataset, including both labeled training data and unlabeled testing data. Finally, post processing strategies, such as multi-scale and multi-crop test-time-augmentation, location filtering and model ensemble, are employed for better performance. With an ensemble of several different models, a private score 82.65%, ranking the 3rd, is achieved on the final leaderboard.","PeriodicalId":232729,"journal":{"name":"Conference and Labs of the Evaluation Forum","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121428273","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Few-shot Long-Tailed Bird Audio Recognition","authors":"Marcos V. Conde, Ui-Jin Choi","doi":"10.48550/arXiv.2206.11260","DOIUrl":"https://doi.org/10.48550/arXiv.2206.11260","url":null,"abstract":"It is easier to hear birds than see them. However, they still play an essential role in nature and are excellent indicators of deteriorating environmental quality and pollution. Recent advances in Deep Neural Networks allow us to process audio data to detect and classify birds. This technology can assist researchers in monitoring bird populations and biodiversity. We propose a sound detection and classification pipeline to analyze complex soundscape recordings and identify birdcalls in the background. Our method learns from weak labels and few data and acoustically recognizes the bird species. Our solution achieved 18th place of 807 teams at the BirdCLEF 2022 Challenge hosted on Kaggle.","PeriodicalId":232729,"journal":{"name":"Conference and Labs of the Evaluation Forum","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134522513","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Anthony Miyaguchi, Jiangyue Yu, Bryan Cheungvivatpant, Dakota Dudley, Aniketh Swain
{"title":"Motif Mining and Unsupervised Representation Learning for BirdCLEF 2022","authors":"Anthony Miyaguchi, Jiangyue Yu, Bryan Cheungvivatpant, Dakota Dudley, Aniketh Swain","doi":"10.48550/arXiv.2206.04805","DOIUrl":"https://doi.org/10.48550/arXiv.2206.04805","url":null,"abstract":"We build a classification model for the BirdCLEF 2022 challenge using unsupervised methods. We implement an unsupervised representation of the training dataset using a triplet loss on spectrogram representation of audio motifs. Our best model performs with a score of 0.48 on the public leaderboard.","PeriodicalId":232729,"journal":{"name":"Conference and Labs of the Evaluation Forum","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128835638","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Stefan Schweter, Luisa März, Katharina Schmid, Erion cCano
{"title":"hmBERT: Historical Multilingual Language Models for Named Entity Recognition","authors":"Stefan Schweter, Luisa März, Katharina Schmid, Erion cCano","doi":"10.48550/arXiv.2205.15575","DOIUrl":"https://doi.org/10.48550/arXiv.2205.15575","url":null,"abstract":"Compared to standard Named Entity Recognition (NER), identifying persons, locations, and organizations in historical texts constitutes a big challenge. To obtain machine-readable corpora, the historical text is usually scanned and Optical Character Recognition (OCR) needs to be performed. As a result, the historical corpora contain errors. Also, entities like location or organization can change over time, which poses another challenge. Overall, historical texts come with several peculiarities that differ greatly from modern texts and large labeled corpora for training a neural tagger are hardly available for this domain. In this work, we tackle NER for historical German, English, French, Swedish, and Finnish by training large historical language models. We circumvent the need for large amounts of labeled data by using unlabeled data for pretraining a language model. We propose hmBERT, a historical multilingual BERT-based language model, and release the model in several versions of different sizes. Furthermore, we evaluate the capability of hmBERT by solving downstream NER as part of this year's HIPE-2022 shared task and provide detailed analysis and insights. For the Multilingual Classical Commentary coarse-grained NER challenge, our tagger HISTeria outperforms the other teams' models for two out of three languages.","PeriodicalId":232729,"journal":{"name":"Conference and Labs of the Evaluation Forum","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122978829","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Philipp Schaer, Timo Breuer, L. J. Castro, Benjamin Wolff, Johann Schaible, Narges Tavakolpoursaleh
{"title":"Overview of LiLAS 2021 - Living Labs for Academic Search (Extended Overview)","authors":"Philipp Schaer, Timo Breuer, L. J. Castro, Benjamin Wolff, Johann Schaible, Narges Tavakolpoursaleh","doi":"10.1007/978-3-030-85251-1_25","DOIUrl":"https://doi.org/10.1007/978-3-030-85251-1_25","url":null,"abstract":"","PeriodicalId":232729,"journal":{"name":"Conference and Labs of the Evaluation Forum","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114920205","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Nentidis, Anastasia Krithara, K. Bougiatiotis, Martin Krallinger, C. R. Penagos, Marta Villegas, G. Paliouras
{"title":"Overview of BioASQ 2021: The ninth BioASQ challenge on Large-Scale Biomedical Semantic Indexing and Question Answering","authors":"A. Nentidis, Anastasia Krithara, K. Bougiatiotis, Martin Krallinger, C. R. Penagos, Marta Villegas, G. Paliouras","doi":"10.1007/978-3-030-85251-1_18","DOIUrl":"https://doi.org/10.1007/978-3-030-85251-1_18","url":null,"abstract":"","PeriodicalId":232729,"journal":{"name":"Conference and Labs of the Evaluation Forum","volume":"145 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131969747","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Benedikt T. Boenninghoff, D. Kolossa, R. M. Nickel
{"title":"Self-Calibrating Neural-Probabilistic Model for Authorship Verification Under Covariate Shift","authors":"Benedikt T. Boenninghoff, D. Kolossa, R. M. Nickel","doi":"10.1007/978-3-030-85251-1_12","DOIUrl":"https://doi.org/10.1007/978-3-030-85251-1_12","url":null,"abstract":"","PeriodicalId":232729,"journal":{"name":"Conference and Labs of the Evaluation Forum","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126131666","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}