Jun Zhou, Fengchao Xiong, Lei Tong, Naoto Yokoya, Pedram Ghamisi
{"title":"客座编辑:光谱成像驱动的计算机视觉","authors":"Jun Zhou, Fengchao Xiong, Lei Tong, Naoto Yokoya, Pedram Ghamisi","doi":"10.1049/cvi2.12242","DOIUrl":null,"url":null,"abstract":"<p>The increasing accessibility and affordability of spectral imaging technology have revolutionised computer vision, allowing for data capture across various wavelengths beyond the visual spectrum. This advancement has greatly enhanced the capabilities of computers and AI systems in observing, understanding, and interacting with the world. Consequently, new datasets in various modalities, such as infrared, ultraviolet, fluorescent, multispectral, and hyperspectral, have been constructed, presenting fresh opportunities for computer vision research and applications.</p><p>Although significant progress has been made in processing, learning, and utilising data obtained through spectral imaging technology, several challenges persist in the field of computer vision. These challenges include the presence of low-quality images, sparse input, high-dimensional data, expensive data labelling processes, and a lack of methods to effectively analyse and utilise data considering their unique properties. Many mid-level and high-level computer vision tasks, such as object segmentation, detection and recognition, image retrieval and classification, and video tracking and understanding, still have not leveraged the advantages offered by spectral information. Additionally, the problem of effectively and efficiently fusing data in different modalities to create robust vision systems remains unresolved. Therefore, there is a pressing need for novel computer vision methods and applications to advance this research area. This special issue aims to provide a venue for researchers to present innovative computer vision methods driven by the spectral imaging technology.</p><p>This special issue has received 11 submissions. Among them, five papers have been accepted for publication, indicating their high quality and contribution to spectral imaging powered computer vision. Four papers have been rejected and sent to a transfer service for consideration in other journals or invited for re-submission after revision based on reviewers’ feedback.</p><p>The accepted papers can be categorised into three main groups based on the type of adopted data, that is, hyperspectral, multispectral, and X-ray images. Hyperspectral images provide material information about the scene and enable fine-grained object class classification. Multispectral images provide high spatial context and information beyond visible spectrum, such as infrared, providing enriched clues for visual computation. X-ray images can penetrate the surface of objects and provide internal structural information of targets, empowering medical applications, such as rib detection as exemplified by Tsai et al. Below is a brief summary of each paper in this special issue.</p><p>Zhong et al. proposed a lightweight criss-cross large kernel (CCLK) convolutional neural network for hyperspectral classification. The key component of this network is a CCLK module, which incorporates large kernels within the 1D convolutional layers and computes self-attention in orthogonal directions. Due to the large kernels and multiple stacks of the CCLK modules, the network can effectively capture long-range contextual features with a compact model size. The experimental results show that the network achieves enhanced classification performance and generalisation capability compared to alternative lightweight deep learning methods. Fewer parameters also make it suitable for deployment on devices with limited resources.</p><p>Ye et al. developed a domain-invariant attention network to address heterogeneous transfer learning in cross-scene hyperspectral classification. The network includes a feature-alignment convolutional neural networks (FACNN) and domain-invariant attention block (DIAB). FACNN extracts features from source and target scenes and projects the heterogeneous features from two scenes into a shared low-dimensional subspace, guaranteeing the class consistency between scenes. DIAB gains cross-domain consistency with a specially designed class-specific domain-invariance loss to obtain domain-invariant and discriminative attention weights for samples, reducing the domain shift. In this way, the knowledge of source scene is successfully transferred to the target scene, alleviating the small training samples in hyperspectral classification. The experiments prove that the network achieves promising hyperspectral classification.</p><p>Zuo et al. developed a method for multispectral pedestrian detection, focusing on scale-aware permutation attention and adjacent feature aggregation. The scale-aware permutated attention module uses both local and global attention to enhance pedestrian features of different scales in the feature pyramid, improving the quality of feature fusion. The adjacent-branch feature aggregation module considers both semantic context and spatial resolution, leading to improved detection accuracy for small-sized pedestrians. Extensive experimental evaluations showcase notable improvements in both efficiency and accuracy compared to several existing methods.</p><p>Guo et al. introduced a model called spatial-temporal-meteorological/long short-term memory network (STM-LSTM) to predict photovoltaic power generation. The proposed method integrates satellite image, historical meteorological data and historical power generation data, and uses cloud motion-aware learning to account for cloud movement and an attention mechanism to weigh the images in different bands from satellite cloud maps. The LSTM model combines the historical power generation sequence and meteorological change information for better accuracy. Experimental results show that the STM-LSTM model outperforms the baseline model to a certain margin, indicating its effectiveness in photovoltaic power generation prediction.</p><p>Tsai et al. created a fully annotated EDARib-CXR dataset for the identification and localization of fractured ribs in frontal and oblique chest X-ray images. The dataset consists of 369 frontal and 829 oblique chest X rays, providing valuable resources for research in this field. Based on YOLOv5, two detection models, namely AB-YOLOv5 and PB-YOLOv5, were introduced. AB-YOLOv5 incorporates an auxiliary branch that enhances the resolution of extracted feature maps in the final convolutional network layer, facilitating the determination of fracture location when relevant characteristics are identified in the data. On the other hand, PB-YOLOv5 employs image patches instead of the entire image to preserve the features of small objects in downsampled images during training, enabling the detection of subtle lesion features. Moreover, the researchers implemented a two-stage cascade detector that effectively integrates these two models to further improve the detection performance. Experimental results demonstrated superior performance of the introduced methods, providing an applicability in reducing diagnostic time and alleviating the heavy workload faced by clinicians.</p><p>Spectral imaging powered computer vision is still an emerging research area with great potential of creating new knowledge and methods. All of the accepted papers in this special issue highlight the crucial need for techniques that leverage information beyond the visual spectrum to help understand the world through spectral imaging devices. The rapid advancements in spectral imaging technology have paved the way for new opportunities and tasks in computer vision research and applications. We expect that more researchers will join this exciting area and develop solutions to handle tasks that cannot be solved well by traditional computer vision.</p><p>Jun Zhou and Fengchao Xiong led the organization of this special issue, including compiling the potential author’s list, calling for papers, handling paper reviews, and drafting the editorial. Lei Tong, Naoto Yokoya and Pedram Ghamisi provided valuable input on the scope of this special issue, promoted this special issue to potential authors, and gave feedback to the editorial.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"17 7","pages":"723-725"},"PeriodicalIF":1.5000,"publicationDate":"2023-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12242","citationCount":"0","resultStr":"{\"title\":\"Guest Editorial: Spectral imaging powered computer vision\",\"authors\":\"Jun Zhou, Fengchao Xiong, Lei Tong, Naoto Yokoya, Pedram Ghamisi\",\"doi\":\"10.1049/cvi2.12242\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>The increasing accessibility and affordability of spectral imaging technology have revolutionised computer vision, allowing for data capture across various wavelengths beyond the visual spectrum. This advancement has greatly enhanced the capabilities of computers and AI systems in observing, understanding, and interacting with the world. Consequently, new datasets in various modalities, such as infrared, ultraviolet, fluorescent, multispectral, and hyperspectral, have been constructed, presenting fresh opportunities for computer vision research and applications.</p><p>Although significant progress has been made in processing, learning, and utilising data obtained through spectral imaging technology, several challenges persist in the field of computer vision. These challenges include the presence of low-quality images, sparse input, high-dimensional data, expensive data labelling processes, and a lack of methods to effectively analyse and utilise data considering their unique properties. Many mid-level and high-level computer vision tasks, such as object segmentation, detection and recognition, image retrieval and classification, and video tracking and understanding, still have not leveraged the advantages offered by spectral information. Additionally, the problem of effectively and efficiently fusing data in different modalities to create robust vision systems remains unresolved. Therefore, there is a pressing need for novel computer vision methods and applications to advance this research area. This special issue aims to provide a venue for researchers to present innovative computer vision methods driven by the spectral imaging technology.</p><p>This special issue has received 11 submissions. Among them, five papers have been accepted for publication, indicating their high quality and contribution to spectral imaging powered computer vision. Four papers have been rejected and sent to a transfer service for consideration in other journals or invited for re-submission after revision based on reviewers’ feedback.</p><p>The accepted papers can be categorised into three main groups based on the type of adopted data, that is, hyperspectral, multispectral, and X-ray images. Hyperspectral images provide material information about the scene and enable fine-grained object class classification. Multispectral images provide high spatial context and information beyond visible spectrum, such as infrared, providing enriched clues for visual computation. X-ray images can penetrate the surface of objects and provide internal structural information of targets, empowering medical applications, such as rib detection as exemplified by Tsai et al. Below is a brief summary of each paper in this special issue.</p><p>Zhong et al. proposed a lightweight criss-cross large kernel (CCLK) convolutional neural network for hyperspectral classification. The key component of this network is a CCLK module, which incorporates large kernels within the 1D convolutional layers and computes self-attention in orthogonal directions. Due to the large kernels and multiple stacks of the CCLK modules, the network can effectively capture long-range contextual features with a compact model size. The experimental results show that the network achieves enhanced classification performance and generalisation capability compared to alternative lightweight deep learning methods. Fewer parameters also make it suitable for deployment on devices with limited resources.</p><p>Ye et al. developed a domain-invariant attention network to address heterogeneous transfer learning in cross-scene hyperspectral classification. The network includes a feature-alignment convolutional neural networks (FACNN) and domain-invariant attention block (DIAB). FACNN extracts features from source and target scenes and projects the heterogeneous features from two scenes into a shared low-dimensional subspace, guaranteeing the class consistency between scenes. DIAB gains cross-domain consistency with a specially designed class-specific domain-invariance loss to obtain domain-invariant and discriminative attention weights for samples, reducing the domain shift. In this way, the knowledge of source scene is successfully transferred to the target scene, alleviating the small training samples in hyperspectral classification. The experiments prove that the network achieves promising hyperspectral classification.</p><p>Zuo et al. developed a method for multispectral pedestrian detection, focusing on scale-aware permutation attention and adjacent feature aggregation. The scale-aware permutated attention module uses both local and global attention to enhance pedestrian features of different scales in the feature pyramid, improving the quality of feature fusion. The adjacent-branch feature aggregation module considers both semantic context and spatial resolution, leading to improved detection accuracy for small-sized pedestrians. Extensive experimental evaluations showcase notable improvements in both efficiency and accuracy compared to several existing methods.</p><p>Guo et al. introduced a model called spatial-temporal-meteorological/long short-term memory network (STM-LSTM) to predict photovoltaic power generation. The proposed method integrates satellite image, historical meteorological data and historical power generation data, and uses cloud motion-aware learning to account for cloud movement and an attention mechanism to weigh the images in different bands from satellite cloud maps. The LSTM model combines the historical power generation sequence and meteorological change information for better accuracy. Experimental results show that the STM-LSTM model outperforms the baseline model to a certain margin, indicating its effectiveness in photovoltaic power generation prediction.</p><p>Tsai et al. created a fully annotated EDARib-CXR dataset for the identification and localization of fractured ribs in frontal and oblique chest X-ray images. The dataset consists of 369 frontal and 829 oblique chest X rays, providing valuable resources for research in this field. Based on YOLOv5, two detection models, namely AB-YOLOv5 and PB-YOLOv5, were introduced. AB-YOLOv5 incorporates an auxiliary branch that enhances the resolution of extracted feature maps in the final convolutional network layer, facilitating the determination of fracture location when relevant characteristics are identified in the data. On the other hand, PB-YOLOv5 employs image patches instead of the entire image to preserve the features of small objects in downsampled images during training, enabling the detection of subtle lesion features. Moreover, the researchers implemented a two-stage cascade detector that effectively integrates these two models to further improve the detection performance. Experimental results demonstrated superior performance of the introduced methods, providing an applicability in reducing diagnostic time and alleviating the heavy workload faced by clinicians.</p><p>Spectral imaging powered computer vision is still an emerging research area with great potential of creating new knowledge and methods. All of the accepted papers in this special issue highlight the crucial need for techniques that leverage information beyond the visual spectrum to help understand the world through spectral imaging devices. The rapid advancements in spectral imaging technology have paved the way for new opportunities and tasks in computer vision research and applications. We expect that more researchers will join this exciting area and develop solutions to handle tasks that cannot be solved well by traditional computer vision.</p><p>Jun Zhou and Fengchao Xiong led the organization of this special issue, including compiling the potential author’s list, calling for papers, handling paper reviews, and drafting the editorial. Lei Tong, Naoto Yokoya and Pedram Ghamisi provided valuable input on the scope of this special issue, promoted this special issue to potential authors, and gave feedback to the editorial.</p>\",\"PeriodicalId\":56304,\"journal\":{\"name\":\"IET Computer Vision\",\"volume\":\"17 7\",\"pages\":\"723-725\"},\"PeriodicalIF\":1.5000,\"publicationDate\":\"2023-10-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12242\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IET Computer Vision\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1049/cvi2.12242\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IET Computer Vision","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1049/cvi2.12242","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
The increasing accessibility and affordability of spectral imaging technology have revolutionised computer vision, allowing for data capture across various wavelengths beyond the visual spectrum. This advancement has greatly enhanced the capabilities of computers and AI systems in observing, understanding, and interacting with the world. Consequently, new datasets in various modalities, such as infrared, ultraviolet, fluorescent, multispectral, and hyperspectral, have been constructed, presenting fresh opportunities for computer vision research and applications.
Although significant progress has been made in processing, learning, and utilising data obtained through spectral imaging technology, several challenges persist in the field of computer vision. These challenges include the presence of low-quality images, sparse input, high-dimensional data, expensive data labelling processes, and a lack of methods to effectively analyse and utilise data considering their unique properties. Many mid-level and high-level computer vision tasks, such as object segmentation, detection and recognition, image retrieval and classification, and video tracking and understanding, still have not leveraged the advantages offered by spectral information. Additionally, the problem of effectively and efficiently fusing data in different modalities to create robust vision systems remains unresolved. Therefore, there is a pressing need for novel computer vision methods and applications to advance this research area. This special issue aims to provide a venue for researchers to present innovative computer vision methods driven by the spectral imaging technology.
This special issue has received 11 submissions. Among them, five papers have been accepted for publication, indicating their high quality and contribution to spectral imaging powered computer vision. Four papers have been rejected and sent to a transfer service for consideration in other journals or invited for re-submission after revision based on reviewers’ feedback.
The accepted papers can be categorised into three main groups based on the type of adopted data, that is, hyperspectral, multispectral, and X-ray images. Hyperspectral images provide material information about the scene and enable fine-grained object class classification. Multispectral images provide high spatial context and information beyond visible spectrum, such as infrared, providing enriched clues for visual computation. X-ray images can penetrate the surface of objects and provide internal structural information of targets, empowering medical applications, such as rib detection as exemplified by Tsai et al. Below is a brief summary of each paper in this special issue.
Zhong et al. proposed a lightweight criss-cross large kernel (CCLK) convolutional neural network for hyperspectral classification. The key component of this network is a CCLK module, which incorporates large kernels within the 1D convolutional layers and computes self-attention in orthogonal directions. Due to the large kernels and multiple stacks of the CCLK modules, the network can effectively capture long-range contextual features with a compact model size. The experimental results show that the network achieves enhanced classification performance and generalisation capability compared to alternative lightweight deep learning methods. Fewer parameters also make it suitable for deployment on devices with limited resources.
Ye et al. developed a domain-invariant attention network to address heterogeneous transfer learning in cross-scene hyperspectral classification. The network includes a feature-alignment convolutional neural networks (FACNN) and domain-invariant attention block (DIAB). FACNN extracts features from source and target scenes and projects the heterogeneous features from two scenes into a shared low-dimensional subspace, guaranteeing the class consistency between scenes. DIAB gains cross-domain consistency with a specially designed class-specific domain-invariance loss to obtain domain-invariant and discriminative attention weights for samples, reducing the domain shift. In this way, the knowledge of source scene is successfully transferred to the target scene, alleviating the small training samples in hyperspectral classification. The experiments prove that the network achieves promising hyperspectral classification.
Zuo et al. developed a method for multispectral pedestrian detection, focusing on scale-aware permutation attention and adjacent feature aggregation. The scale-aware permutated attention module uses both local and global attention to enhance pedestrian features of different scales in the feature pyramid, improving the quality of feature fusion. The adjacent-branch feature aggregation module considers both semantic context and spatial resolution, leading to improved detection accuracy for small-sized pedestrians. Extensive experimental evaluations showcase notable improvements in both efficiency and accuracy compared to several existing methods.
Guo et al. introduced a model called spatial-temporal-meteorological/long short-term memory network (STM-LSTM) to predict photovoltaic power generation. The proposed method integrates satellite image, historical meteorological data and historical power generation data, and uses cloud motion-aware learning to account for cloud movement and an attention mechanism to weigh the images in different bands from satellite cloud maps. The LSTM model combines the historical power generation sequence and meteorological change information for better accuracy. Experimental results show that the STM-LSTM model outperforms the baseline model to a certain margin, indicating its effectiveness in photovoltaic power generation prediction.
Tsai et al. created a fully annotated EDARib-CXR dataset for the identification and localization of fractured ribs in frontal and oblique chest X-ray images. The dataset consists of 369 frontal and 829 oblique chest X rays, providing valuable resources for research in this field. Based on YOLOv5, two detection models, namely AB-YOLOv5 and PB-YOLOv5, were introduced. AB-YOLOv5 incorporates an auxiliary branch that enhances the resolution of extracted feature maps in the final convolutional network layer, facilitating the determination of fracture location when relevant characteristics are identified in the data. On the other hand, PB-YOLOv5 employs image patches instead of the entire image to preserve the features of small objects in downsampled images during training, enabling the detection of subtle lesion features. Moreover, the researchers implemented a two-stage cascade detector that effectively integrates these two models to further improve the detection performance. Experimental results demonstrated superior performance of the introduced methods, providing an applicability in reducing diagnostic time and alleviating the heavy workload faced by clinicians.
Spectral imaging powered computer vision is still an emerging research area with great potential of creating new knowledge and methods. All of the accepted papers in this special issue highlight the crucial need for techniques that leverage information beyond the visual spectrum to help understand the world through spectral imaging devices. The rapid advancements in spectral imaging technology have paved the way for new opportunities and tasks in computer vision research and applications. We expect that more researchers will join this exciting area and develop solutions to handle tasks that cannot be solved well by traditional computer vision.
Jun Zhou and Fengchao Xiong led the organization of this special issue, including compiling the potential author’s list, calling for papers, handling paper reviews, and drafting the editorial. Lei Tong, Naoto Yokoya and Pedram Ghamisi provided valuable input on the scope of this special issue, promoted this special issue to potential authors, and gave feedback to the editorial.
期刊介绍:
IET Computer Vision seeks original research papers in a wide range of areas of computer vision. The vision of the journal is to publish the highest quality research work that is relevant and topical to the field, but not forgetting those works that aim to introduce new horizons and set the agenda for future avenues of research in computer vision.
IET Computer Vision welcomes submissions on the following topics:
Biologically and perceptually motivated approaches to low level vision (feature detection, etc.);
Perceptual grouping and organisation
Representation, analysis and matching of 2D and 3D shape
Shape-from-X
Object recognition
Image understanding
Learning with visual inputs
Motion analysis and object tracking
Multiview scene analysis
Cognitive approaches in low, mid and high level vision
Control in visual systems
Colour, reflectance and light
Statistical and probabilistic models
Face and gesture
Surveillance
Biometrics and security
Robotics
Vehicle guidance
Automatic model aquisition
Medical image analysis and understanding
Aerial scene analysis and remote sensing
Deep learning models in computer vision
Both methodological and applications orientated papers are welcome.
Manuscripts submitted are expected to include a detailed and analytical review of the literature and state-of-the-art exposition of the original proposed research and its methodology, its thorough experimental evaluation, and last but not least, comparative evaluation against relevant and state-of-the-art methods. Submissions not abiding by these minimum requirements may be returned to authors without being sent to review.
Special Issues Current Call for Papers:
Computer Vision for Smart Cameras and Camera Networks - https://digital-library.theiet.org/files/IET_CVI_SC.pdf
Computer Vision for the Creative Industries - https://digital-library.theiet.org/files/IET_CVI_CVCI.pdf