{"title":"Dataset Diversity: Measuring and Mitigating Geographical Bias in Image Search and Retrieval","authors":"Abhishek Mandal, Susan Leavy, S. Little","doi":"10.1145/3475731.3484956","DOIUrl":null,"url":null,"abstract":"Many popular visual datasets used to train deep neural networksfor computer vision applications, especially for facial analytics,are created by retrieving images from the internet. Search enginesare often used to perform this task. However, due to localisationand personalisation of search results by the search engines alongwith the image indexing method used by these search engines, theresultant images overrepresent the demographics of the region fromwhere they were queried from. As most of the visual datasets arecreated in western countries, they tend to have a western centricbias and when these datasets are used to train deep neural networks,they tend to inherit these biases. Researchers studying the issue ofbias in visual datasets have focused on the racial aspect of thesebiases. We approach this from a geographical perspective. In thispaper, we 1) study how linguistic variations in search queries andgeographical variations in the querying region affect the social andcultural aspects of retrieved images focusing on facial analytics, 2)explore how geographical bias in image search and retrieval cancause racial, cultural and stereotypical bias in visual datasets and3) propose methods to mitigate such biases.","PeriodicalId":355632,"journal":{"name":"Proceedings of the 1st International Workshop on Trustworthy AI for Multimedia Computing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 1st International Workshop on Trustworthy AI for Multimedia Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3475731.3484956","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 11
Abstract
Many popular visual datasets used to train deep neural networksfor computer vision applications, especially for facial analytics,are created by retrieving images from the internet. Search enginesare often used to perform this task. However, due to localisationand personalisation of search results by the search engines alongwith the image indexing method used by these search engines, theresultant images overrepresent the demographics of the region fromwhere they were queried from. As most of the visual datasets arecreated in western countries, they tend to have a western centricbias and when these datasets are used to train deep neural networks,they tend to inherit these biases. Researchers studying the issue ofbias in visual datasets have focused on the racial aspect of thesebiases. We approach this from a geographical perspective. In thispaper, we 1) study how linguistic variations in search queries andgeographical variations in the querying region affect the social andcultural aspects of retrieved images focusing on facial analytics, 2)explore how geographical bias in image search and retrieval cancause racial, cultural and stereotypical bias in visual datasets and3) propose methods to mitigate such biases.