arXiv (Cornell University)最新文献_第5页

Can Authorship Attribution Models Distinguish Speakers in Speech Transcripts? 作者归因模型能否区分语音记录中的说话人?

arXiv (Cornell University) Pub Date : 2023-11-13 DOI: 10.48550/arxiv.2311.07564

Aggazzotti, Cristina, Andrews, Nicholas, Smith, Elizabeth Allyn

{"title":"Can Authorship Attribution Models Distinguish Speakers in Speech\u0000 Transcripts?","authors":"Aggazzotti, Cristina, Andrews, Nicholas, Smith, Elizabeth Allyn","doi":"10.48550/arxiv.2311.07564","DOIUrl":"https://doi.org/10.48550/arxiv.2311.07564","url":null,"abstract":"Authorship verification is the problem of determining if two distinct writing samples share the same author and is typically concerned with the attribution of written text. In this paper, we explore the attribution of transcribed speech, which poses novel challenges. The main challenge is that many stylistic features, such as punctuation and capitalization, are not available or reliable. Therefore, we expect a priori that transcribed speech is a more challenging domain for attribution. On the other hand, other stylistic features, such as speech disfluencies, may enable more successful attribution but, being specific to speech, require special purpose models. To better understand the challenges of this setting, we contribute the first systematic study of speaker attribution based solely on transcribed speech. Specifically, we propose a new benchmark for speaker attribution focused on conversational speech transcripts. To control for spurious associations of speakers with topic, we employ both conversation prompts and speakers' participating in the same conversation to construct challenging verification trials of varying difficulties. We establish the state of the art on this new benchmark by comparing a suite of neural and non-neural baselines, finding that although written text attribution models achieve surprisingly good performance in certain settings, they struggle in the hardest settings we consider.","PeriodicalId":496270,"journal":{"name":"arXiv (Cornell University)","volume":"106 6","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136353016","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Testing importance sampling on a quantum annealer for strong coupling SU(3) gauge theory 强耦合SU(3)规范理论在量子退火机上的重要抽样测试

arXiv (Cornell University) Pub Date : 2023-11-13 DOI: 10.48550/arxiv.2311.07209

Kim, Jangho, Luu, Thomas, Unger, Wolfgang

引用次数: 0

MonoDiffusion: Self-Supervised Monocular Depth Estimation Using Diffusion Model 使用扩散模型的自监督单目深度估计

arXiv (Cornell University) Pub Date : 2023-11-13 DOI: 10.48550/arxiv.2311.07198

Shao, Shuwei, Pei, Zhongcai, Chen, Weihai, Sun, Dingchi, Chen, Peter C. Y., Li, Zhengguo

{"title":"MonoDiffusion: Self-Supervised Monocular Depth Estimation Using\u0000 Diffusion Model","authors":"Shao, Shuwei, Pei, Zhongcai, Chen, Weihai, Sun, Dingchi, Chen, Peter C. Y., Li, Zhengguo","doi":"10.48550/arxiv.2311.07198","DOIUrl":"https://doi.org/10.48550/arxiv.2311.07198","url":null,"abstract":"Over the past few years, self-supervised monocular depth estimation that does not depend on ground-truth during the training phase has received widespread attention. Most efforts focus on designing different types of network architectures and loss functions or handling edge cases, e.g., occlusion and dynamic objects. In this work, we introduce a novel self-supervised depth estimation framework, dubbed MonoDiffusion, by formulating it as an iterative denoising process. Because the depth ground-truth is unavailable in the training phase, we develop a pseudo ground-truth diffusion process to assist the diffusion in MonoDiffusion. The pseudo ground-truth diffusion gradually adds noise to the depth map generated by a pre-trained teacher model. Moreover,the teacher model allows applying a distillation loss to guide the denoised depth. Further, we develop a masked visual condition mechanism to enhance the denoising ability of model. Extensive experiments are conducted on the KITTI and Make3D datasets and the proposed MonoDiffusion outperforms prior state-of-the-art competitors. The source code will be available at https://github.com/ShuweiShao/MonoDiffusion.","PeriodicalId":496270,"journal":{"name":"arXiv (Cornell University)","volume":"117 42","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136353128","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Time-Frequency Localization Characteristics of the Delay-Doppler Plane Orthogonal Pulse 延迟-多普勒平面正交脉冲的时频定位特性

arXiv (Cornell University) Pub Date : 2023-11-13 DOI: 10.48550/arxiv.2311.07238

Shafie, Akram, Yuan, Jinhong, Yang, Nan, Lin, Hai

{"title":"Time-Frequency Localization Characteristics of the Delay-Doppler Plane\u0000 Orthogonal Pulse","authors":"Shafie, Akram, Yuan, Jinhong, Yang, Nan, Lin, Hai","doi":"10.48550/arxiv.2311.07238","DOIUrl":"https://doi.org/10.48550/arxiv.2311.07238","url":null,"abstract":"The orthogonal delay-Doppler (DD) division multiplexing (ODDM) modulation has recently been proposed as a promising solution for ensuring reliable communications in high mobility scenarios. In this work, we investigate the time-frequency (TF) localization characteristics of the DD plane orthogonal pulse (DDOP), which is the prototype pulse of ODDM modulation. The TF localization characteristics examine how concentrated or spread out the energy of a pulse is in the joint TF domain. We first derive the TF localization metric, TF area (TFA), for the DDOP. Based on this result, we provide insights into the energy spread of the DDOP in the joint TF domain. Then, we delve into the potential advantages of the DDOP due to its energy spread, particularly in terms of leveraging both time and frequency diversities, and enabling high-resolution sensing. Furthermore, we determine the TFA for the recently proposed generalized design of the DDOP. Finally, we validate our analysis based on numerical results and show that the energy spread for the generalized design of the DDOP in the joint TF domain exhibits a step-wise increase as the duration of sub-pulses increases.","PeriodicalId":496270,"journal":{"name":"arXiv (Cornell University)","volume":"117 41","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136353129","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

High Rectification Ratio at Room Temperature in Rhenium(I) Compound 室温下铼(I)化合物的高整流比

arXiv (Cornell University) Pub Date : 2023-11-13 DOI: 10.48550/arxiv.2311.07258

Rajbangshi, Subas, Pal, Nila, Rahman, Robinur, Nesterov, Vladimir N., Roy, Lisa, Ghosh, Shishir, Mondal, Prakash Chandra

{"title":"High Rectification Ratio at Room Temperature in Rhenium(I) Compound","authors":"Rajbangshi, Subas, Pal, Nila, Rahman, Robinur, Nesterov, Vladimir N., Roy, Lisa, Ghosh, Shishir, Mondal, Prakash Chandra","doi":"10.48550/arxiv.2311.07258","DOIUrl":"https://doi.org/10.48550/arxiv.2311.07258","url":null,"abstract":"Electrical current rectification is an interesting electronic feature, popularly known as a diode. Achieving a high rectification ratio in a molecular junction has been a long-standing goal in molecular electronics. The present work describes mimicking electrical current rectification with pi-stacked rhenium(I) compound sandwiched between two electrical contacts. Among the two mononuclear rhenium compounds studied here, [Re(CO)4(PPh3){(N)-saccharinate}] (1) and [Re(CO)3(phen){(N)-saccharinate}] (2), the latter show strong pi-pi interactions-induced high rectification ratio of ~ 4000 at 2.0 V at room temperature. Alternating current (AC)-based electrical measurements ensuring AC to DC electrical signal conversion at a frequency f of 1 KHz showing 2 can act as an excellent half-wave rectifier. Asymmetric charge injection barrier height at the electrode/Re(I) interfaces of the devices with a stacking configuration of p++-Si/Re compound31nm(2)/ITO originates the flow of electrical current unidirectionally. The charge transport mechanism governed by thermally activated hopping phenomena, and charge carrier propagation is explained through an energy profile considering the Fermi levels of two electrodes, and the energy of frontier molecular orbitals, HOMO, and LUMO, confirming rectification is of a molecular origin. The present work paves the way to combine different organometallic compounds as circuit elements in nanoelectronic devices to achieve numerous exciting electronic features.","PeriodicalId":496270,"journal":{"name":"arXiv (Cornell University)","volume":"117 39","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136353131","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

FIRST: A Million-Entry Dataset for Text-Driven Fashion Synthesis and Design 第一:文本驱动时装合成与设计的百万条目数据集

arXiv (Cornell University) Pub Date : 2023-11-13 DOI: 10.48550/arxiv.2311.07414

Huang, Zhen, Li, Yihao, Pei, Dong, Zhou, Jiapeng, Ning, Xuliang, Han, Jianlin, Han, Xiaoguang, Chen, Xuejun

引用次数: 1

Machine Learning For Beamline Steering 光束转向的机器学习

arXiv (Cornell University) Pub Date : 2023-11-13 DOI: 10.48550/arxiv.2311.07519

Kante, Isaac

引用次数: 0

Lattice relaxation, electronic structure and continuum model for twisted bilayer MoTe$_2$ 扭曲双分子层MoTe$_2$的晶格弛豫、电子结构和连续介质模型

arXiv (Cornell University) Pub Date : 2023-11-13 DOI: 10.48550/arxiv.2311.07533

Mao, Ning, Xu, Cheng, Li, Jiangxu, Bao, Ting, Liu, Peitao, Xu, Yong, Felser, Claudia, Fu, Liang, Zhang, Yang

引用次数: 0

CASTER: A Computer-Vision-Assisted Wireless Channel Simulator for Gesture Recognition CASTER:用于手势识别的计算机视觉辅助无线通道模拟器

arXiv (Cornell University) Pub Date : 2023-11-13 DOI: 10.48550/arxiv.2311.07169

Ren, Zhenyu, Li, Guoliang, Ji, Chenqing, Yu, Chao, Wang, Shuai, Wang, Rui

引用次数: 0

SponTTS: modeling and transferring spontaneous style for TTS SponTTS:为TTS塑造和传递自发性风格

arXiv (Cornell University) Pub Date : 2023-11-13 DOI: 10.48550/arxiv.2311.07179

Li, Hanzhao, Zhu, Xinfa, Xue, Liumeng, Song, Yang, Chen, Yunlin, Xie, Lei

{"title":"SponTTS: modeling and transferring spontaneous style for TTS","authors":"Li, Hanzhao, Zhu, Xinfa, Xue, Liumeng, Song, Yang, Chen, Yunlin, Xie, Lei","doi":"10.48550/arxiv.2311.07179","DOIUrl":"https://doi.org/10.48550/arxiv.2311.07179","url":null,"abstract":"Spontaneous speaking style exhibits notable differences from other speaking styles due to various spontaneous phenomena (e.g., filled pauses, prolongation) and substantial prosody variation (e.g., diverse pitch and duration variation, occasional non-verbal speech like smile), posing challenges to modeling and prediction of spontaneous style. Moreover, the limitation of high-quality spontaneous data constrains spontaneous speech generation for speakers without spontaneous data. To address these problems, we propose SponTTS, a two-stage approach based on bottleneck (BN) features to model and transfer spontaneous style for TTS. In the first stage, we adopt a Conditional Variational Autoencoder (CVAE) to capture spontaneous prosody from a BN feature and involve the spontaneous phenomena by the constraint of spontaneous phenomena embedding prediction loss. Besides, we introduce a flow-based predictor to predict a latent spontaneous style representation from the text, which enriches the prosody and context-specific spontaneous phenomena during inference. In the second stage, we adopt a VITS-like module to transfer the spontaneous style learned in the first stage to target speakers. Experiments demonstrate that SponTTS is effective in modeling spontaneous style and transferring the style to the target speakers, generating spontaneous speech with high naturalness, expressiveness, and speaker similarity. The zero-shot spontaneous style TTS test further verifies the generalization and robustness of SponTTS in generating spontaneous speech for unseen speakers.","PeriodicalId":496270,"journal":{"name":"arXiv (Cornell University)","volume":"118 15","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136353280","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0