Foundations and Trends in Computer Graphics and Vision最新文献

筛选
英文 中文
Semantic Image Segmentation: Two Decades of Research 语义图像分割:二十年的研究
IF 36.5
Foundations and Trends in Computer Graphics and Vision Pub Date : 2023-02-13 DOI: 10.1561/0600000095
G. Csurka, Riccardo Volpi, Boris Chidlovskii
{"title":"Semantic Image Segmentation: Two Decades of Research","authors":"G. Csurka, Riccardo Volpi, Boris Chidlovskii","doi":"10.1561/0600000095","DOIUrl":"https://doi.org/10.1561/0600000095","url":null,"abstract":"Semantic image segmentation (SiS) plays a fundamental role in a broad variety of computer vision applications, providing key information for the global understanding of an image. This survey is an effort to summarize two decades of research in the field of SiS, where we propose a literature review of solutions starting from early historical methods followed by an overview of more recent deep learning methods including the latest trend of using transformers. We complement the review by discussing particular cases of the weak supervision and side machine learning techniques that can be used to improve the semantic segmentation such as curriculum, incremental or self-supervised learning. State-of-the-art SiS models rely on a large amount of annotated samples, which are more expensive to obtain than labels for tasks such as image classification. Since unlabeled data is instead significantly cheaper to obtain, it is not surprising that Unsupervised Domain Adaptation (UDA) reached a broad success within the semantic segmentation community. Therefore, a second core contribution of this book is to summarize five years of a rapidly growing field, Domain Adaptation for Semantic Image Segmentation (DASiS) which embraces the importance of semantic segmentation itself and a critical need of adapting segmentation models to new environments. In addition to providing a comprehensive survey on DASiS techniques, we unveil also newer trends such as multi-domain learning, domain generalization, domain incremental learning, test-time adaptation and source-free domain adaptation. Finally, we conclude this survey by describing datasets and benchmarks most widely used in SiS and DASiS and briefly discuss related tasks such as instance and panoptic image segmentation, as well as applications such as medical image segmentation.","PeriodicalId":45662,"journal":{"name":"Foundations and Trends in Computer Graphics and Vision","volume":null,"pages":null},"PeriodicalIF":36.5,"publicationDate":"2023-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87903398","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Learning-based Visual Compression 基于学习的视觉压缩
IF 36.5
Foundations and Trends in Computer Graphics and Vision Pub Date : 2023-01-01 DOI: 10.1561/0600000101
Ruolei Ji, Lina Karam
{"title":"Learning-based Visual Compression","authors":"Ruolei Ji, Lina Karam","doi":"10.1561/0600000101","DOIUrl":"https://doi.org/10.1561/0600000101","url":null,"abstract":"","PeriodicalId":45662,"journal":{"name":"Foundations and Trends in Computer Graphics and Vision","volume":null,"pages":null},"PeriodicalIF":36.5,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75184914","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Computational Imaging Through Atmospheric Turbulence 通过大气湍流的计算成像
Foundations and Trends in Computer Graphics and Vision Pub Date : 2023-01-01 DOI: 10.1561/0600000103
Stanley H. Chan, Nicholas Chimitt
{"title":"Computational Imaging Through Atmospheric Turbulence","authors":"Stanley H. Chan, Nicholas Chimitt","doi":"10.1561/0600000103","DOIUrl":"https://doi.org/10.1561/0600000103","url":null,"abstract":"Seeing through a turbulent atmosphere has been one of the biggest challenges for ground-to-ground long-range incoherent imaging systems. The literature is very rich that can be dated back to Andrey Kolmogorov in the late 40’s, followed by a series of major developments by David Fried, Robert Noll, among others, during the 60’s and 70’s. However, even though we have a much better understanding of the atmosphere today, there remains a gap from the optics theory to image processing algorithms. In particular, training a deep neural network requires an accurate physical forward model that can synthesize training data at a large scale. Traditional wave propagation simulators are not an option here because they are computationally too expensive --- a 256x256 gray scale image would take several minutes to simulate.","PeriodicalId":45662,"journal":{"name":"Foundations and Trends in Computer Graphics and Vision","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135262457","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Vision-Language Pre-training: Basics, Recent Advances, and Future Trends 视觉语言预训练:基础、最新进展和未来趋势
IF 36.5
Foundations and Trends in Computer Graphics and Vision Pub Date : 2022-10-17 DOI: 10.48550/arXiv.2210.09263
Zhe Gan, Linjie Li, Chunyuan Li, Lijuan Wang, Zicheng Liu, Jianfeng Gao
{"title":"Vision-Language Pre-training: Basics, Recent Advances, and Future Trends","authors":"Zhe Gan, Linjie Li, Chunyuan Li, Lijuan Wang, Zicheng Liu, Jianfeng Gao","doi":"10.48550/arXiv.2210.09263","DOIUrl":"https://doi.org/10.48550/arXiv.2210.09263","url":null,"abstract":"This paper surveys vision-language pre-training (VLP) methods for multimodal intelligence that have been developed in the last few years. We group these approaches into three categories: ($i$) VLP for image-text tasks, such as image captioning, image-text retrieval, visual question answering, and visual grounding; ($ii$) VLP for core computer vision tasks, such as (open-set) image classification, object detection, and segmentation; and ($iii$) VLP for video-text tasks, such as video captioning, video-text retrieval, and video question answering. For each category, we present a comprehensive review of state-of-the-art methods, and discuss the progress that has been made and challenges still being faced, using specific systems and models as case studies. In addition, for each category, we discuss advanced topics being actively explored in the research community, such as big foundation models, unified modeling, in-context few-shot learning, knowledge, robustness, and computer vision in the wild, to name a few.","PeriodicalId":45662,"journal":{"name":"Foundations and Trends in Computer Graphics and Vision","volume":null,"pages":null},"PeriodicalIF":36.5,"publicationDate":"2022-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74656714","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 70
Towards Better User Studies in Computer Graphics and Vision 迈向更好的计算机图形学和视觉用户研究
IF 36.5
Foundations and Trends in Computer Graphics and Vision Pub Date : 2022-06-23 DOI: 10.1561/0600000106
Z. Bylinskii, L. Herman, Aaron Hertzmann, Stefanie Hutka, Yile Zhang
{"title":"Towards Better User Studies in Computer Graphics and Vision","authors":"Z. Bylinskii, L. Herman, Aaron Hertzmann, Stefanie Hutka, Yile Zhang","doi":"10.1561/0600000106","DOIUrl":"https://doi.org/10.1561/0600000106","url":null,"abstract":"Online crowdsourcing platforms have made it increasingly easy to perform evaluations of algorithm outputs with survey questions like\"which image is better, A or B?\", leading to their proliferation in vision and graphics research papers. Results of these studies are often used as quantitative evidence in support of a paper's contributions. On the one hand we argue that, when conducted hastily as an afterthought, such studies lead to an increase of uninformative, and, potentially, misleading conclusions. On the other hand, in these same communities, user research is underutilized in driving project direction and forecasting user needs and reception. We call for increased attention to both the design and reporting of user studies in computer vision and graphics papers towards (1) improved replicability and (2) improved project direction. Together with this call, we offer an overview of methodologies from user experience research (UXR), human-computer interaction (HCI), and applied perception to increase exposure to the available methodologies and best practices. We discuss foundational user research methods (e.g., needfinding) that are presently underutilized in computer vision and graphics research, but can provide valuable project direction. We provide further pointers to the literature for readers interested in exploring other UXR methodologies. Finally, we describe broader open issues and recommendations for the research community.","PeriodicalId":45662,"journal":{"name":"Foundations and Trends in Computer Graphics and Vision","volume":null,"pages":null},"PeriodicalIF":36.5,"publicationDate":"2022-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80690753","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
An Introduction to Neural Data Compression 神经数据压缩导论
IF 36.5
Foundations and Trends in Computer Graphics and Vision Pub Date : 2022-02-14 DOI: 10.1561/0600000107
Yibo Yang, S. Mandt, Lucas Theis
{"title":"An Introduction to Neural Data Compression","authors":"Yibo Yang, S. Mandt, Lucas Theis","doi":"10.1561/0600000107","DOIUrl":"https://doi.org/10.1561/0600000107","url":null,"abstract":"Neural compression is the application of neural networks and other machine learning methods to data compression. Recent advances in statistical machine learning have opened up new possibilities for data compression, allowing compression algorithms to be learned end-to-end from data using powerful generative models such as normalizing flows, variational autoencoders, diffusion probabilistic models, and generative adversarial networks. The present article aims to introduce this field of research to a broader machine learning audience by reviewing the necessary background in information theory (e.g., entropy coding, rate-distortion theory) and computer vision (e.g., image quality assessment, perceptual metrics), and providing a curated guide through the essential ideas and methods in the literature thus far.","PeriodicalId":45662,"journal":{"name":"Foundations and Trends in Computer Graphics and Vision","volume":null,"pages":null},"PeriodicalIF":36.5,"publicationDate":"2022-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79559107","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 49
Deep Learning for Image/Video Restoration and Super-resolution 图像/视频恢复和超分辨率的深度学习
IF 36.5
Foundations and Trends in Computer Graphics and Vision Pub Date : 2022-01-01 DOI: 10.1561/0600000100
A. Tekalp
{"title":"Deep Learning for Image/Video Restoration and Super-resolution","authors":"A. Tekalp","doi":"10.1561/0600000100","DOIUrl":"https://doi.org/10.1561/0600000100","url":null,"abstract":"","PeriodicalId":45662,"journal":{"name":"Foundations and Trends in Computer Graphics and Vision","volume":null,"pages":null},"PeriodicalIF":36.5,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91158980","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Deep Learning for Multimedia Forensics 多媒体取证的深度学习
IF 36.5
Foundations and Trends in Computer Graphics and Vision Pub Date : 2021-01-01 DOI: 10.1561/0600000096
Irene Amerini, A. Anagnostopoulos, Luca Maiano, L. R. Celsi
{"title":"Deep Learning for Multimedia Forensics","authors":"Irene Amerini, A. Anagnostopoulos, Luca Maiano, L. R. Celsi","doi":"10.1561/0600000096","DOIUrl":"https://doi.org/10.1561/0600000096","url":null,"abstract":"In the last two decades, we have witnessed an immense increase in the use of multimedia content on the internet, for multiple applications ranging from the most innocuous to very critical ones. Naturally, this emergence has given rise to many types of threats posed when this content can be manipulated/used for malicious purposes. For example, fake media can be used to drive personal opinions, ruining the image of a public figure, or for criminal activities such as terrorist propaganda and cyberbullying. The research community has of course moved to counter attack these threats by designing manipulation-detection systems based on a variety of techniques, such as signal processing, statistics, and machine learning. This research and practice activity has given rise to the field of multimedia forensics. The success of deep learning in the last decade has led to its use in multimedia forensics as well. In this survey, we look at the latest trends and deep-learning-based techniques introduced to solve three main questions investigated in the field of multimedia forensics. We begin by examining the manipulations of images and videos produced with editing tools, reporting the deep-learning approaches adopted to Irene Amerini, Aris Anagnostopoulos, Luca Maiano and Lorenzo Ricciardi Celsi (2021), “Deep Learning for Multimedia Forensics”, Foundations and Trends® in Computer Graphics and Vision: Vol. 12, No. 4, pp 309–457. DOI: 10.1561/0600000096. Full text available at: http://dx.doi.org/10.1561/0600000096","PeriodicalId":45662,"journal":{"name":"Foundations and Trends in Computer Graphics and Vision","volume":null,"pages":null},"PeriodicalIF":36.5,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81106697","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Discrete Graphical Models - An Optimization Perspective 离散图形模型-优化视角
IF 36.5
Foundations and Trends in Computer Graphics and Vision Pub Date : 2019-12-09 DOI: 10.1561/0600000084
Bogdan Savchynskyy
{"title":"Discrete Graphical Models - An Optimization Perspective","authors":"Bogdan Savchynskyy","doi":"10.1561/0600000084","DOIUrl":"https://doi.org/10.1561/0600000084","url":null,"abstract":"This monograph is about discrete energy minimization for discrete graphical models. It considers graphical models, or, more precisely, maximum a posteriori inference for graphical models, purely as a combinatorial optimization problem. Modeling, applications, probabilistic interpretations and many other aspects are either ignored here or find their place in examples and remarks only. It covers the integer linear programming formulation of the problem as well as its linear programming, Lagrange and Lagrange decomposition-based relaxations. In particular, it provides a detailed analysis of the polynomially solvable acyclic and submodular problems, along with the corresponding exact optimization methods. Major approximate methods, such as message passing and graph cut techniques are also described and analyzed comprehensively. The monograph can be useful for undergraduate and graduate students studying optimization or graphical models, as well as for experts in optimization who want to have a look into graphical models. To make the monograph suitable for both categories of readers we explicitly separate the mathematical optimization background chapters from those specific to graphical models.","PeriodicalId":45662,"journal":{"name":"Foundations and Trends in Computer Graphics and Vision","volume":null,"pages":null},"PeriodicalIF":36.5,"publicationDate":"2019-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85411678","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
Publishing and Consuming 3D Content on the Web: A Survey 在网络上发布和消费3D内容:一项调查
IF 36.5
Foundations and Trends in Computer Graphics and Vision Pub Date : 2018-12-13 DOI: 10.1561/0600000083
Marco Potenziani, M. Callieri, M. Dellepiane, Roberto Scopigno
{"title":"Publishing and Consuming 3D Content on the Web: A Survey","authors":"Marco Potenziani, M. Callieri, M. Dellepiane, Roberto Scopigno","doi":"10.1561/0600000083","DOIUrl":"https://doi.org/10.1561/0600000083","url":null,"abstract":"Three-dimensional content is becoming an important component of the World Wide Web environment. From the advent of WebGL to the present, a wide number of solutions have been developed (including libraries, middleware, and applications), encouraging the establishment of 3D data as online media of practical use. The fast development of 3D technologies and related web-based resources makes it difficult to identify and properly understand the current trends and open issues. Starting from these premises, this survey analyzes the state of the art of 3D web publishing, reviews the possibilities provided by the major current approaches, proposes a categorization of the features supported by existing solutions, and cross-maps these with the requirements of a few main application domains. The results of this analysis should help in defining the technical characteristics needed to build efficient and effective 3D data presentation, taking into account the application contexts. Marco Potenziani, Marco Callieri, Matteo Dellepiane and Roberto Scopigno (2018), “Publishing and Consuming 3D Content on the Web: A Survey”, Foundations and Trends © in Computer Graphics and Vision: Vol. 10, No. 4, pp 244–333. DOI: 10.1561/0600000083. The version of record is available at: http://dx.doi.org/10.1561/0600000083","PeriodicalId":45662,"journal":{"name":"Foundations and Trends in Computer Graphics and Vision","volume":null,"pages":null},"PeriodicalIF":36.5,"publicationDate":"2018-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73068033","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信