基于有限数据的Bharatanatyam手印分类的卷积神经网络解译

2020 10th Annual Computing and Communication Workshop and Conference (CCWC) Pub Date : 2020-01-01 DOI:10.1109/CCWC47524.2020.9031185

Anuja P. Parameshwaran, Heta P. Desai, M. Weeks, Rajshekhar Sunderraman

{"title":"基于有限数据的Bharatanatyam手印分类的卷积神经网络解译","authors":"Anuja P. Parameshwaran, Heta P. Desai, M. Weeks, Rajshekhar Sunderraman","doi":"10.1109/CCWC47524.2020.9031185","DOIUrl":null,"url":null,"abstract":"Non-verbal forms of communication are universal, being free of any language barrier and widely used in all art forms. For example, in Bharatanatyam, an ancient Indian dance form, artists use different hand gestures, body postures and facial expressions to convey the story line. As identification and classification of these complex and multivariant visual images are difficult, it is now being addressed with the help of advanced computer vision techniques and deep neural networks. This work deals with studies in automation of identification, classification and labelling of selected Bharatnatyam gestures, as part of our efforts to preserve this rich cultural heritage for future generations. The classification of the mudras against their true labels was carried out using different singular pre-trained / non-pre-trained as well as stacked ensemble convolutional neural architectures (CNNs). In all, twenty-seven classes of asamyukta hasta (single hand gestures) data were collected from Google, YouTube and few real time performances by artists. Since the background in many frames are highly diverse, the acquired data is real and dynamic, compared to images from closed laboratory settings. The cleansing of mislabeled data from the dataset was done through label transferring based on distance-based similarity metric using convolutional siamese neural network. The classification of mudras was done using different CNN architecture: i) singular models, ii) ensemble models, and iii) few specialized models. This study achieved an accuracy of >95%, both in single and double transfer learning models, as well as their stacked ensemble model. The results emphasize the crucial role of domain similarity of the pre-training / training datasets for improved classification accuracy and, also indicate that doubly pre-trained CNN model yield the highest accuracy.","PeriodicalId":161209,"journal":{"name":"2020 10th Annual Computing and Communication Workshop and Conference (CCWC)","volume":"222 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Unravelling of Convolutional Neural Networks through Bharatanatyam Mudra Classification with Limited Data\",\"authors\":\"Anuja P. Parameshwaran, Heta P. Desai, M. Weeks, Rajshekhar Sunderraman\",\"doi\":\"10.1109/CCWC47524.2020.9031185\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Non-verbal forms of communication are universal, being free of any language barrier and widely used in all art forms. For example, in Bharatanatyam, an ancient Indian dance form, artists use different hand gestures, body postures and facial expressions to convey the story line. As identification and classification of these complex and multivariant visual images are difficult, it is now being addressed with the help of advanced computer vision techniques and deep neural networks. This work deals with studies in automation of identification, classification and labelling of selected Bharatnatyam gestures, as part of our efforts to preserve this rich cultural heritage for future generations. The classification of the mudras against their true labels was carried out using different singular pre-trained / non-pre-trained as well as stacked ensemble convolutional neural architectures (CNNs). In all, twenty-seven classes of asamyukta hasta (single hand gestures) data were collected from Google, YouTube and few real time performances by artists. Since the background in many frames are highly diverse, the acquired data is real and dynamic, compared to images from closed laboratory settings. The cleansing of mislabeled data from the dataset was done through label transferring based on distance-based similarity metric using convolutional siamese neural network. The classification of mudras was done using different CNN architecture: i) singular models, ii) ensemble models, and iii) few specialized models. This study achieved an accuracy of >95%, both in single and double transfer learning models, as well as their stacked ensemble model. The results emphasize the crucial role of domain similarity of the pre-training / training datasets for improved classification accuracy and, also indicate that doubly pre-trained CNN model yield the highest accuracy.\",\"PeriodicalId\":161209,\"journal\":{\"name\":\"2020 10th Annual Computing and Communication Workshop and Conference (CCWC)\",\"volume\":\"222 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 10th Annual Computing and Communication Workshop and Conference (CCWC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CCWC47524.2020.9031185\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 10th Annual Computing and Communication Workshop and Conference (CCWC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCWC47524.2020.9031185","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

摘要

非语言形式的交流具有普遍性，没有任何语言障碍，广泛应用于各种艺术形式。例如，在古印度舞蹈形式Bharatanatyam中，艺术家使用不同的手势，身体姿势和面部表情来传达故事情节。由于这些复杂多变的视觉图像的识别和分类是困难的，现在正在利用先进的计算机视觉技术和深度神经网络来解决这个问题。这项工作涉及对选定婆罗那提姆手势的识别、分类和标签自动化的研究，作为我们为子孙后代保护这一丰富文化遗产的努力的一部分。使用不同的奇异预训练/非预训练以及堆叠集成卷积神经结构(cnn)对手印的真实标签进行分类。总共有27类asamyukta hasta(单手手势)数据是从谷歌、YouTube和一些艺术家的实时表演中收集的。由于许多帧的背景是高度多样化的，与封闭的实验室设置的图像相比，所获取的数据是真实和动态的。利用卷积连体神经网络，通过基于距离相似性度量的标签转移，对数据集中的错误标记数据进行清理。使用不同的CNN架构完成手印的分类:i)单一模型，ii)集成模型，iii)少数专门模型。无论在单迁移学习模型还是双迁移学习模型，以及它们的堆叠集成模型中，本研究的准确率都达到了>95%。结果强调了预训练/训练数据集的领域相似度对提高分类精度的关键作用，也表明双重预训练的CNN模型产生最高的准确率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Unravelling of Convolutional Neural Networks through Bharatanatyam Mudra Classification with Limited Data

Non-verbal forms of communication are universal, being free of any language barrier and widely used in all art forms. For example, in Bharatanatyam, an ancient Indian dance form, artists use different hand gestures, body postures and facial expressions to convey the story line. As identification and classification of these complex and multivariant visual images are difficult, it is now being addressed with the help of advanced computer vision techniques and deep neural networks. This work deals with studies in automation of identification, classification and labelling of selected Bharatnatyam gestures, as part of our efforts to preserve this rich cultural heritage for future generations. The classification of the mudras against their true labels was carried out using different singular pre-trained / non-pre-trained as well as stacked ensemble convolutional neural architectures (CNNs). In all, twenty-seven classes of asamyukta hasta (single hand gestures) data were collected from Google, YouTube and few real time performances by artists. Since the background in many frames are highly diverse, the acquired data is real and dynamic, compared to images from closed laboratory settings. The cleansing of mislabeled data from the dataset was done through label transferring based on distance-based similarity metric using convolutional siamese neural network. The classification of mudras was done using different CNN architecture: i) singular models, ii) ensemble models, and iii) few specialized models. This study achieved an accuracy of >95%, both in single and double transfer learning models, as well as their stacked ensemble model. The results emphasize the crucial role of domain similarity of the pre-training / training datasets for improved classification accuracy and, also indicate that doubly pre-trained CNN model yield the highest accuracy.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2020 10th Annual Computing and Communication Workshop and Conference (CCWC)

自引率

0.00%

发文量