{"title":"VSLM: Virtual Signal Large Model for Few-Shot Wideband Signal Detection and Recognition","authors":"Xiaoyang Hao, Shuyuan Yang, Ruoyu Liu, Zhixi Feng, Tongqing Peng, Bincheng Huang","doi":"10.1109/twc.2024.3496813","DOIUrl":"https://doi.org/10.1109/twc.2024.3496813","url":null,"abstract":"","PeriodicalId":13431,"journal":{"name":"IEEE Transactions on Wireless Communications","volume":"18 1","pages":""},"PeriodicalIF":10.4,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142673175","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Weiting Zhang, Yiqian He, Tao Zhang, Chenhao Ying, Jiawen Kang
{"title":"Intelligent Resource Adaptation for Diversified Service Requirements in Industrial IoT","authors":"Weiting Zhang, Yiqian He, Tao Zhang, Chenhao Ying, Jiawen Kang","doi":"10.1109/tccn.2024.3502493","DOIUrl":"https://doi.org/10.1109/tccn.2024.3502493","url":null,"abstract":"","PeriodicalId":13069,"journal":{"name":"IEEE Transactions on Cognitive Communications and Networking","volume":"2 1","pages":""},"PeriodicalIF":8.6,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142673187","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wanru Song, Xinyi Wang, Weimin Wu, Yuan Zhang, Feng Liu
{"title":"Channel enhanced cross-modality relation network for visible-infrared person re-identification","authors":"Wanru Song, Xinyi Wang, Weimin Wu, Yuan Zhang, Feng Liu","doi":"10.1007/s10489-024-06057-x","DOIUrl":"10.1007/s10489-024-06057-x","url":null,"abstract":"<div><p>Visible-infrared person re-identification (VI Re-ID) is designed to perform pedestrian retrieval on non-overlapping visible-infrared cameras, and it is widely employed in intelligent surveillance. For the VI Re-ID task, one of the main challenges is the huge modality discrepancy between the visible and infrared images. Therefore, mining more shared features in the cross-modality task turns into an important issue. To address this problem, this paper proposes a novel framework for feature learning and feature embedding in VI Re-ID, namely Channel Enhanced Cross-modality Relation Network (CECR-Net). More specifically, the network contains three key modules. In the first module, to shorten the distance between the original modalities, a channel selection operation is applied to the visible images, the robustness against color variations is improved by randomly generating three-channel R/G/B images. The module also exploits the low- and mid-level information of the visible and auxiliary modal images through a feature parameter-sharing strategy. Considering that the body sequences of pedestrians are not easy to change with modality, CECR-Net designs two modules based on relation network for VI Re-ID, namely the intra-relation learning and the cross-relation learning modules. These two modules help to capture the structural relationship between body parts, which is a modality-invariant information, disrupting the isolation between local features. Extensive experiments on the two public benchmarks indicate that CECR-Net is superior compared to the state-of-the-art methods. In particular, for the SYSU-MM01 dataset, the Rank1 and mAP reach 76.83% and 71.56% in the \"All Search\" mode, respectively.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"55 1","pages":""},"PeriodicalIF":3.4,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142672648","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"LetsGo: Large-Scale Garage Modeling and Rendering via LiDAR-Assisted Gaussian Primitives","authors":"Jiadi Cui, Junming Cao, Fuqiang Zhao, Zhipeng He, Yifan Chen, Yuhui Zhong, Lan Xu, Yujiao Shi, Yingliang Zhang, Jingyi Yu","doi":"10.1145/3687762","DOIUrl":"https://doi.org/10.1145/3687762","url":null,"abstract":"Large garages are ubiquitous yet intricate scenes that present unique challenges due to their monotonous colors, repetitive patterns, reflective surfaces, and transparent vehicle glass. Conventional Structure from Motion (SfM) methods for camera pose estimation and 3D reconstruction often fail in these environments due to poor correspondence construction. To address these challenges, we introduce LetsGo, a LiDAR-assisted Gaussian splatting framework for large-scale garage modeling and rendering. We develop a handheld scanner, Polar, equipped with IMU, LiDAR, and a fisheye camera, to facilitate accurate data acquisition. Using this Polar device, we present the GarageWorld dataset, consisting of eight expansive garage scenes with diverse geometric structures, which will be made publicly available for further research. Our approach demonstrates that LiDAR point clouds collected by the Polar device significantly enhance a suite of 3D Gaussian splatting algorithms for garage scene modeling and rendering. We introduce a novel depth regularizer that effectively eliminates floating artifacts in rendered images. Additionally, we propose a multi-resolution 3D Gaussian representation designed for Level-of-Detail (LOD) rendering. This includes adapted scaling factors for individual levels and a random-resolution-level training scheme to optimize the Gaussians across different resolutions. This representation enables efficient rendering of large-scale garage scenes on lightweight devices via a web-based renderer. Experimental results on our GarageWorld dataset, as well as on ScanNet++ and KITTI-360, demonstrate the superiority of our method in terms of rendering quality and resource efficiency.","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"55 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142672835","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Continuous Sculpting: Persistent Swarm Shape Formation Adaptable to Local Environmental Changes","authors":"Andrew G. Curtis, Mark Yim, Michael Rubenstein","doi":"10.1109/tro.2024.3502199","DOIUrl":"https://doi.org/10.1109/tro.2024.3502199","url":null,"abstract":"","PeriodicalId":50388,"journal":{"name":"IEEE Transactions on Robotics","volume":"18 1","pages":""},"PeriodicalIF":7.8,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142673305","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"ELMO: Enhanced Real-time LiDAR Motion Capture through Upsampling","authors":"Deok-Kyeong Jang, Dongseok Yang, Deok-Yun Jang, Byeoli Choi, Sung-Hee Lee, Donghoon Shin","doi":"10.1145/3687991","DOIUrl":"https://doi.org/10.1145/3687991","url":null,"abstract":"This paper introduces ELMO, a real-time upsampling motion capture framework designed for a single LiDAR sensor. Modeled as a conditional autoregressive transformer-based upsampling motion generator, ELMO achieves 60 fps motion capture from a 20 fps LiDAR point cloud sequence. The key feature of ELMO is the coupling of the self-attention mechanism with thoughtfully designed embedding modules for motion and point clouds, significantly elevating the motion quality. To facilitate accurate motion capture, we develop a one-time skeleton calibration model capable of predicting user skeleton off-sets from a single-frame point cloud. Additionally, we introduce a novel data augmentation technique utilizing a LiDAR simulator, which enhances global root tracking to improve environmental understanding. To demonstrate the effectiveness of our method, we compare ELMO with state-of-the-art methods in both image-based and point cloud-based motion capture. We further conduct an ablation study to validate our design principles. ELMO's fast inference time makes it well-suited for real-time applications, exemplified in our demo video featuring live streaming and interactive gaming scenarios. Furthermore, we contribute a high-quality LiDAR-mocap synchronized dataset comprising 20 different subjects performing a range of motions, which can serve as a valuable resource for future research. The dataset and evaluation code are available at https://movin3d.github.io/ELMO_SIGASIA2024/","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"36 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142672830","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Time-Dependent Inclusion-Based Method for Continuous Collision Detection between Parametric Surfaces","authors":"Xuwen Chen, Cheng Yu, Xingyu Ni, Mengyu Chu, Bin Wang, Baoquan Chen","doi":"10.1145/3687960","DOIUrl":"https://doi.org/10.1145/3687960","url":null,"abstract":"Continuous collision detection (CCD) between parametric surfaces is typically formulated as a five-dimensional constrained optimization problem. In the field of CAD and computer graphics, common approaches to solving this problem rely on linearization or sampling strategies. Alternatively, inclusion-based techniques detect collisions by employing 5D inclusion functions, which are typically designed to represent the swept volumes of parametric surfaces over a given time span, and narrowing down the earliest collision moment through subdivision in both spatial and temporal dimensions. However, when high detection accuracy is required, all these approaches significantly increases computational consumption due to the high-dimensional searching space. In this work, we develop a new time-dependent inclusion-based CCD framework that eliminates the need for temporal subdivision and can speedup conventional methods by a factor ranging from 36 to 138. To achieve this, we propose a novel time-dependent inclusion function that provides a continuous representation of a moving surface, along with a corresponding intersection detection algorithm that quickly identifies the time intervals when collisions are likely to occur. We validate our method across various primitive types, demonstrate its efficacy within the simulation pipeline and show that it significantly improves CCD efficiency while maintaining accuracy.","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"6 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142672832","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Charlie Hewitt, Fatemeh Saleh, Sadegh Aliakbarian, Lohit Petikam, Shideh Rezaeifar, Louis Florentin, Zafiirah Hosenie, Thomas J. Cashman, Julien Valentin, Darren Cosker, Tadas Baltrusaitis
{"title":"Look Ma, no markers: holistic performance capture without the hassle","authors":"Charlie Hewitt, Fatemeh Saleh, Sadegh Aliakbarian, Lohit Petikam, Shideh Rezaeifar, Louis Florentin, Zafiirah Hosenie, Thomas J. Cashman, Julien Valentin, Darren Cosker, Tadas Baltrusaitis","doi":"10.1145/3687772","DOIUrl":"https://doi.org/10.1145/3687772","url":null,"abstract":"We tackle the problem of highly-accurate, holistic performance capture for the face, body and hands simultaneously. Motion-capture technologies used in film and game production typically focus only on face, body or hand capture independently, involve complex and expensive hardware and a high degree of manual intervention from skilled operators. While machine-learning-based approaches exist to overcome these problems, they usually only support a single camera, often operate on a single part of the body, do not produce precise world-space results, and rarely generalize outside specific contexts. In this work, we introduce the first technique for markerfree, high-quality reconstruction of the complete human body, including eyes and tongue, without requiring any calibration, manual intervention or custom hardware. Our approach produces stable world-space results from arbitrary camera rigs as well as supporting varied capture environments and clothing. We achieve this through a hybrid approach that leverages machine learning models trained exclusively on synthetic data and powerful parametric models of human shape and motion. We evaluate our method on a number of body, face and hand reconstruction benchmarks and demonstrate state-of-the-art results that generalize on diverse datasets.","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"13 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142672873","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}