Foundations and Trends in Computer Graphics and Vision最新文献

Semantic Image Segmentation: Two Decades of Research 语义图像分割:二十年的研究

IF 36.5

Foundations and Trends in Computer Graphics and Vision Pub Date : 2023-02-13 DOI: 10.1561/0600000095

G. Csurka, Riccardo Volpi, Boris Chidlovskii

{"title":"Semantic Image Segmentation: Two Decades of Research","authors":"G. Csurka, Riccardo Volpi, Boris Chidlovskii","doi":"10.1561/0600000095","DOIUrl":"https://doi.org/10.1561/0600000095","url":null,"abstract":"Semantic image segmentation (SiS) plays a fundamental role in a broad variety of computer vision applications, providing key information for the global understanding of an image. This survey is an effort to summarize two decades of research in the field of SiS, where we propose a literature review of solutions starting from early historical methods followed by an overview of more recent deep learning methods including the latest trend of using transformers. We complement the review by discussing particular cases of the weak supervision and side machine learning techniques that can be used to improve the semantic segmentation such as curriculum, incremental or self-supervised learning. State-of-the-art SiS models rely on a large amount of annotated samples, which are more expensive to obtain than labels for tasks such as image classification. Since unlabeled data is instead significantly cheaper to obtain, it is not surprising that Unsupervised Domain Adaptation (UDA) reached a broad success within the semantic segmentation community. Therefore, a second core contribution of this book is to summarize five years of a rapidly growing field, Domain Adaptation for Semantic Image Segmentation (DASiS) which embraces the importance of semantic segmentation itself and a critical need of adapting segmentation models to new environments. In addition to providing a comprehensive survey on DASiS techniques, we unveil also newer trends such as multi-domain learning, domain generalization, domain incremental learning, test-time adaptation and source-free domain adaptation. Finally, we conclude this survey by describing datasets and benchmarks most widely used in SiS and DASiS and briefly discuss related tasks such as instance and panoptic image segmentation, as well as applications such as medical image segmentation.","PeriodicalId":45662,"journal":{"name":"Foundations and Trends in Computer Graphics and Vision","volume":"8 1","pages":"1-162"},"PeriodicalIF":36.5,"publicationDate":"2023-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87903398","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

Learning-based Visual Compression 基于学习的视觉压缩

IF 36.5

Foundations and Trends in Computer Graphics and Vision Pub Date : 2023-01-01 DOI: 10.1561/0600000101

Ruolei Ji, Lina Karam

引用次数: 2

Computational Imaging Through Atmospheric Turbulence 通过大气湍流的计算成像

Foundations and Trends in Computer Graphics and Vision Pub Date : 2023-01-01 DOI: 10.1561/0600000103

Stanley H. Chan, Nicholas Chimitt

引用次数: 0

Vision-Language Pre-training: Basics, Recent Advances, and Future Trends 视觉语言预训练:基础、最新进展和未来趋势

IF 36.5

Foundations and Trends in Computer Graphics and Vision Pub Date : 2022-10-17 DOI: 10.48550/arXiv.2210.09263

Zhe Gan, Linjie Li, Chunyuan Li, Lijuan Wang, Zicheng Liu, Jianfeng Gao

引用次数: 70

Towards Better User Studies in Computer Graphics and Vision 迈向更好的计算机图形学和视觉用户研究

IF 36.5

Foundations and Trends in Computer Graphics and Vision Pub Date : 2022-06-23 DOI: 10.1561/0600000106

Z. Bylinskii, L. Herman, Aaron Hertzmann, Stefanie Hutka, Yile Zhang

{"title":"Towards Better User Studies in Computer Graphics and Vision","authors":"Z. Bylinskii, L. Herman, Aaron Hertzmann, Stefanie Hutka, Yile Zhang","doi":"10.1561/0600000106","DOIUrl":"https://doi.org/10.1561/0600000106","url":null,"abstract":"Online crowdsourcing platforms have made it increasingly easy to perform evaluations of algorithm outputs with survey questions like\"which image is better, A or B?\", leading to their proliferation in vision and graphics research papers. Results of these studies are often used as quantitative evidence in support of a paper's contributions. On the one hand we argue that, when conducted hastily as an afterthought, such studies lead to an increase of uninformative, and, potentially, misleading conclusions. On the other hand, in these same communities, user research is underutilized in driving project direction and forecasting user needs and reception. We call for increased attention to both the design and reporting of user studies in computer vision and graphics papers towards (1) improved replicability and (2) improved project direction. Together with this call, we offer an overview of methodologies from user experience research (UXR), human-computer interaction (HCI), and applied perception to increase exposure to the available methodologies and best practices. We discuss foundational user research methods (e.g., needfinding) that are presently underutilized in computer vision and graphics research, but can provide valuable project direction. We provide further pointers to the literature for readers interested in exploring other UXR methodologies. Finally, we describe broader open issues and recommendations for the research community.","PeriodicalId":45662,"journal":{"name":"Foundations and Trends in Computer Graphics and Vision","volume":"106 1","pages":"201-252"},"PeriodicalIF":36.5,"publicationDate":"2022-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80690753","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

An Introduction to Neural Data Compression 神经数据压缩导论

IF 36.5

Foundations and Trends in Computer Graphics and Vision Pub Date : 2022-02-14 DOI: 10.1561/0600000107

Yibo Yang, S. Mandt, Lucas Theis

引用次数: 49

Deep Learning for Image/Video Restoration and Super-resolution 图像/视频恢复和超分辨率的深度学习

IF 36.5

Foundations and Trends in Computer Graphics and Vision Pub Date : 2022-01-01 DOI: 10.1561/0600000100

A. Tekalp

引用次数: 1

Deep Learning for Multimedia Forensics 多媒体取证的深度学习

IF 36.5

Foundations and Trends in Computer Graphics and Vision Pub Date : 2021-01-01 DOI: 10.1561/0600000096

Irene Amerini, A. Anagnostopoulos, Luca Maiano, L. R. Celsi

{"title":"Deep Learning for Multimedia Forensics","authors":"Irene Amerini, A. Anagnostopoulos, Luca Maiano, L. R. Celsi","doi":"10.1561/0600000096","DOIUrl":"https://doi.org/10.1561/0600000096","url":null,"abstract":"In the last two decades, we have witnessed an immense increase in the use of multimedia content on the internet, for multiple applications ranging from the most innocuous to very critical ones. Naturally, this emergence has given rise to many types of threats posed when this content can be manipulated/used for malicious purposes. For example, fake media can be used to drive personal opinions, ruining the image of a public figure, or for criminal activities such as terrorist propaganda and cyberbullying. The research community has of course moved to counter attack these threats by designing manipulation-detection systems based on a variety of techniques, such as signal processing, statistics, and machine learning. This research and practice activity has given rise to the field of multimedia forensics. The success of deep learning in the last decade has led to its use in multimedia forensics as well. In this survey, we look at the latest trends and deep-learning-based techniques introduced to solve three main questions investigated in the field of multimedia forensics. We begin by examining the manipulations of images and videos produced with editing tools, reporting the deep-learning approaches adopted to Irene Amerini, Aris Anagnostopoulos, Luca Maiano and Lorenzo Ricciardi Celsi (2021), “Deep Learning for Multimedia Forensics”, Foundations and Trends® in Computer Graphics and Vision: Vol. 12, No. 4, pp 309–457. DOI: 10.1561/0600000096. Full text available at: http://dx.doi.org/10.1561/0600000096","PeriodicalId":45662,"journal":{"name":"Foundations and Trends in Computer Graphics and Vision","volume":"34 1","pages":"309-457"},"PeriodicalIF":36.5,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81106697","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Discrete Graphical Models - An Optimization Perspective 离散图形模型-优化视角

IF 36.5

Foundations and Trends in Computer Graphics and Vision Pub Date : 2019-12-09 DOI: 10.1561/0600000084

Bogdan Savchynskyy

{"title":"Discrete Graphical Models - An Optimization Perspective","authors":"Bogdan Savchynskyy","doi":"10.1561/0600000084","DOIUrl":"https://doi.org/10.1561/0600000084","url":null,"abstract":"This monograph is about discrete energy minimization for discrete graphical models. It considers graphical models, or, more precisely, maximum a posteriori inference for graphical models, purely as a combinatorial optimization problem. Modeling, applications, probabilistic interpretations and many other aspects are either ignored here or find their place in examples and remarks only. It covers the integer linear programming formulation of the problem as well as its linear programming, Lagrange and Lagrange decomposition-based relaxations. In particular, it provides a detailed analysis of the polynomially solvable acyclic and submodular problems, along with the corresponding exact optimization methods. Major approximate methods, such as message passing and graph cut techniques are also described and analyzed comprehensively. The monograph can be useful for undergraduate and graduate students studying optimization or graphical models, as well as for experts in optimization who want to have a look into graphical models. To make the monograph suitable for both categories of readers we explicitly separate the mathematical optimization background chapters from those specific to graphical models.","PeriodicalId":45662,"journal":{"name":"Foundations and Trends in Computer Graphics and Vision","volume":"85 1","pages":"160-429"},"PeriodicalIF":36.5,"publicationDate":"2019-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85411678","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 23

Publishing and Consuming 3D Content on the Web: A Survey 在网络上发布和消费3D内容:一项调查

IF 36.5

Foundations and Trends in Computer Graphics and Vision Pub Date : 2018-12-13 DOI: 10.1561/0600000083

Marco Potenziani, M. Callieri, M. Dellepiane, Roberto Scopigno

{"title":"Publishing and Consuming 3D Content on the Web: A Survey","authors":"Marco Potenziani, M. Callieri, M. Dellepiane, Roberto Scopigno","doi":"10.1561/0600000083","DOIUrl":"https://doi.org/10.1561/0600000083","url":null,"abstract":"Three-dimensional content is becoming an important component of the World Wide Web environment. From the advent of WebGL to the present, a wide number of solutions have been developed (including libraries, middleware, and applications), encouraging the establishment of 3D data as online media of practical use. The fast development of 3D technologies and related web-based resources makes it difficult to identify and properly understand the current trends and open issues. Starting from these premises, this survey analyzes the state of the art of 3D web publishing, reviews the possibilities provided by the major current approaches, proposes a categorization of the features supported by existing solutions, and cross-maps these with the requirements of a few main application domains. The results of this analysis should help in defining the technical characteristics needed to build efficient and effective 3D data presentation, taking into account the application contexts. Marco Potenziani, Marco Callieri, Matteo Dellepiane and Roberto Scopigno (2018), “Publishing and Consuming 3D Content on the Web: A Survey”, Foundations and Trends © in Computer Graphics and Vision: Vol. 10, No. 4, pp 244–333. DOI: 10.1561/0600000083. The version of record is available at: http://dx.doi.org/10.1561/0600000083","PeriodicalId":45662,"journal":{"name":"Foundations and Trends in Computer Graphics and Vision","volume":"34 1","pages":"244-333"},"PeriodicalIF":36.5,"publicationDate":"2018-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73068033","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13