Andréia dos Santos Sachete, Alba Valéria de Sant’anna de Freitas Loiola, Raquel Salcedo Gomes
{"title":"AdaptiveGPT: Towards Intelligent Adaptive Learning","authors":"Andréia dos Santos Sachete, Alba Valéria de Sant’anna de Freitas Loiola, Raquel Salcedo Gomes","doi":"10.1007/s11042-024-20144-8","DOIUrl":"https://doi.org/10.1007/s11042-024-20144-8","url":null,"abstract":"<p>Adaptive learning is an educational methodology that allows the personalization of learning according to the student’s pedagogical path. In digital environments, the strategic use of technologies enhances adaptive learning initiatives, enabling a dynamic understanding of intricate contextual nuances and the ability to identify and recommend appropriate learning activities. Therefore, this work proposes developing and evaluating a prototype that uses a large language model to create adaptive educational activities in face-to-face and virtual environments automatically. The applied methodology involves the implementation of a large language model with advanced cognitive capabilities to generate learning activities that adapt to individual needs. A proof of concept was developed to evaluate the practicality and usability of this approach. The research results indicate that the approach is practical and adaptable to different educational contexts, reinforcing the synergy between adaptive learning, artificial intelligence, and learning environments. The proof of concept evaluation showed that the prototype is highly usable, validating the proposal as an innovative solution to the growing needs of modern education.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"7 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203649","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Transfer learning for human gait recognition using VGG19: CASIA-A dataset","authors":"Veenu Rani, Munish Kumar","doi":"10.1007/s11042-024-20132-y","DOIUrl":"https://doi.org/10.1007/s11042-024-20132-y","url":null,"abstract":"<p>Identification of individuals based on physical characteristics has recently gained popularity and falls under the category of pattern recognition. Biometric recognition has emerged as an effective strategy for preventing security breaches, as no two people share the same physical characteristics. \"Gait recognition\" specifically refers to identifying individuals based on their walking patterns. Human gait is a method of locomotion that relies on the coordination of the brain, nerves, and muscles. Traditionally, human gait analysis was performed subjectively through visual observations. However, with advancements in technology and deep learning, human gait analysis can now be conducted empirically and without the need for subject cooperation, enhancing the quality of life. Deep learning methods have demonstrated excellent performance in human gait recognition. In this article, the authors employed the VGG19 transfer learning model for human gait recognition. They used the public benchmark dataset CASIA-A for their experimental work, which contains a total of 19,139 images captured from 20 individuals. The dataset was segmented into two different patterns: 70:30 and 80:20. To optimize the performance of the proposed model, the authors considered three hyperparameters: loss, validation loss (val_loss), and accuracy rate. They reported accuracy rates of 96.9% and 97.8%, with losses of 2.71% and 2.01% for the two patterns, respectively.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"2 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203560","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A geometric-approach based Combinatorial Transformative Scalogram analysis for multiclass identification of pathologies in a voice signal","authors":"Ranita Khumukcham, Kishorjit Nongmeikapam","doi":"10.1007/s11042-024-20067-4","DOIUrl":"https://doi.org/10.1007/s11042-024-20067-4","url":null,"abstract":"<p>Many researchers have preferred non-invasive techniques for recognizing the exact type of physiological abnormality in the vocal tract by training machine learning algorithms with feature descriptors extracted from the voice signal. However, until now, most techniques have been limited to classifying whether a voice is normal or abnormal. It is crucial that the trained Artificial Intelligence (AI) be able to identify the exact pathology associated with voice for implementation in a realistic environment. Another issue is the need to suppress the ambient noise that could be mixed up with the spectra of the voice. Current work proposes a robust, less time-consuming and non-invasive technique for the identification of pathology associated with a laryngeal voice signal. More specifically, a two-stage signal filtering approach that encompasses a score-based geometric approach and a glottal inverse filtering method is applied to the input voice signal. The aim here is to estimate the noise spectra, to regenerate a clean signal and finally to deliver a completely fundamental glottal flow-derived signal. For the next stage, clean glottal derivative signals are used in the formation of a novel fused-scalogram which is currently referred to as the \"Combinatorial Transformative Scalogram (CTS).\" The CTS is a time-frequency domain plot which is a combination of two time-frequency scalograms. There is a thorough investigation of the performance of the two individual scalograms as well as that of the CTS database.Nine classification metrics are used to investigate performance, which are: sensitivity, mean accuracy, error, precision, false positive rate, specificity, Cohen’s kappa, Matthews Correlation Coefficient, and F1 score. Implementation of the VOice ICar fEDerico II (VOICED) standard database provided the highest mean accuracy of 94.12<span>(%)</span> with a sensitivity of 93.85<span>(%)</span> and a specificity of 97.96<span>(%)</span> against other existing techniques. The current method performed well despite the data imbalance that exists between classes.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"13 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203579","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Insights into research on blockchain for smart contracts: a bibliometric analysis","authors":"Renu Singh, Ashlesha Gupta, Poonam Mittal","doi":"10.1007/s11042-024-20164-4","DOIUrl":"https://doi.org/10.1007/s11042-024-20164-4","url":null,"abstract":"<p>Over the past few years, blockchain technology has gained significant attention. This surge in popularity can be attributed to the emergence of cryptocurrencies and the development of smart contracts. Cryptocurrency is a digital currency that eliminates the problem of double spending. Cryptocurrencies like Bitcoin, Ethereum, Litecoin, Stellar, Zcash, Maker, Aave, etc. become popular and are preferred for money transfers. Smart contracts are the next popular technology on the blockchain after cryptocurrency. It can be considered a piece of code that can execute automatically when the predefined conditions are fulfilled. Researchers believe that the potential of blockchain with smart contracts is only in its initial stages and that its true potential has yet to be fully discovered. Hence, an extensive bibliometric analysis is conducted to understand blockchain trends for smart contracts and to give future directions in this field. For this analysis, various steps are followed, starting with formulating the research question, defining the scope of our research, extracting and analyzing data, answering the research question, and finally, drawing a conclusion. This research paper will be fruitful for scholars and researchers, providing an extensive statistical and network analysis of extracted smart contracts publications.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"6 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203629","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SUGrasping: a semantic grasping framework based on multi-head 3D U-Net","authors":"He Cao, Yunzhou Zhang, Zhexue Ge, Xin Chen, Xiaozheng Liu, Jiaqi Zhao","doi":"10.1007/s11042-024-20037-w","DOIUrl":"https://doi.org/10.1007/s11042-024-20037-w","url":null,"abstract":"<p>Object grasping is an important skill for robots to interact with the real world, especially in unstructured environments where occlusions and different shapes of target objects are present. In this work, we introduce a robot grasping pipeline called SUGrasping, which can obtain the grasping poses more precisely for target objects. The grasping pipeline treats the Truncated Signed Distance Function (TSDF) and point clouds of the grasping scene as input simultaneously. The proposed multi-head 3D U-Net accepts reconstructed TSDF representation and outputs the grasping configurations, including predicted grasp quality, orientation and width of the gripper. The point cloud is fed into PointNet to obtain the semantic segmentation results for all objects in the grasping workspace. With the help of point cloud inside the gripper, the relationship between the gripper and semantic information can be established. It makes robots know which object they are grasping, rather than just removing objects in the workspace like previous works. Experimental results show that the proposed method has an improvement in grasping success rate and percent cleared of target objects, which outperforms state-of-the-art methods compared in this paper.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"13 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203559","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chien-Hsing Chou, Cheng-Hou Chou, Yi-Zeng Hsieh, Tzu-Shien Yang
{"title":"Integrating cycleGAN and BERT for Chinese text style transfer","authors":"Chien-Hsing Chou, Cheng-Hou Chou, Yi-Zeng Hsieh, Tzu-Shien Yang","doi":"10.1007/s11042-024-20131-z","DOIUrl":"https://doi.org/10.1007/s11042-024-20131-z","url":null,"abstract":"<p>In this study, we integrate the Bidirectional Encoder Representations from Transformers (BERT) model with the Cycle Generative Adversarial Network (CycleGAN) to create a system for Chinese text style transfer. Natural language processing (NLP) involves converting human languages into data interpretable by computers, enabling applications like text classification, chatbots, and dialogue systems. Recent advancements, such as Google's transformer model and the BERT technique, have significantly improved NLP capabilities through self-attention mechanisms and unsupervised pretraining. Text style transfer modifies the style of texts without altering their semantics. Previous methods like StyIns and models based on disentangled representation learning highlight the challenges of retaining text meaning during style transfer. Our system leverages CycleGAN’s unsupervised learning to convert unpaired data between wuxia and fantasy styles while preserving semantics. Using the pretrained BERT model from the Chinese Knowledge and Information Processing (CKIP) Lab, our experimental results demonstrate successful style conversion, maintaining the original meanings of texts. This integration of BERT and CycleGAN shows promise for further advancements in NLP applications.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"2 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203628","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Enhancing spoken dialect identification with stacked generalization of deep learning models","authors":"Khaled Lounnas, Mohamed Lichouri, Mourad Abbas","doi":"10.1007/s11042-024-20143-9","DOIUrl":"https://doi.org/10.1007/s11042-024-20143-9","url":null,"abstract":"<p>As dialects are widely used in many countries, there is growing interest in incorporating them into various applications, including conversational systems. Processing spoken dialects is an important module in such systems, yet it remains a challenging task due to the lack of resources and the inherent ambiguity and complexity of dialects. This paper presents a comparison of two approaches for identifying spoken Maghrebi dialects, tested on an in-house corpus composed of four dialects: Algerian Arabic Dialect (AAD), Algerian Berber Dialect (ABD), Moroccan Arabic Dialect (MAD), and Moroccan Berber Dialect (MBD), as well as two variants of Modern Standard Arabic (MSA): MSA_ALG and MSA_MAR. The first method uses a fully connected neural network (NN2) to retrain several Transfer Learning (TL) models with varying layer numbers, including Residual Networks (ResNet50, ResNet101), Visual Geometric Group networks (VGG16, VGG19), Dense Convolutional Networks (DenseNet121, DenseNet169), and Efficient Convolutional Neural Networks for Mobile Vision Applications (MobileNet, MobileNetV2). These models were chosen based on their proven ability to capture different levels of feature abstraction: deeper models like ResNet and DenseNet are capable of capturing more complex and nuanced patterns, which is critical for distinguishing subtle differences in dialects, while VGG and MobileNet models offer computational efficiency, making them suitable for applications with limited resources. The second approach employs a “stacked generalization” strategy, which merges predictions from the previously trained models to enhance the final classification performance. Our results show that this cascade strategy improves the overall performance of the Language/Dialect Identification system, with an accuracy increase of up to 5% for specific dialect pairs. Notably, the best performance was achieved with DenseNet and ResNet models, reaching an accuracy of 99.11% for distinguishing between Algerian Berber Dialect and Moroccan Berber Dialect. These findings indicate that despite the limited size of the employed dataset, the cascade strategy and the selection of robust TL models significantly enhance the system’s performance in dialect identification. By leveraging the unique strengths of each model, our approach demonstrates a robust and efficient solution to the challenge of spoken dialect processing.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"35 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203574","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Unsupervised dual-teacher knowledge distillation for pseudo-label refinement in domain adaptive person re-identification","authors":"Sidharth Samanta, Debasish Jena, Suvendu Rup","doi":"10.1007/s11042-024-20147-5","DOIUrl":"https://doi.org/10.1007/s11042-024-20147-5","url":null,"abstract":"<p>Unsupervised Domain Adaptation (UDA) in person re-identification (reID) addresses the challenge of adapting models trained on labeled source domains to unlabeled target domains, which is crucial for real-world applications. A significant problem in clustering-based UDA methods is the noise in pseudo-labels generated due to inter-domain disparities, which can degrade the performance of reID models. To address this issue, we propose the Unsupervised Dual-Teacher Knowledge Distillation (UDKD), an efficient learning scheme designed to enhance robustness against noisy pseudo-labels in UDA for person reID. The proposed UDKD method combines the outputs of two source-trained classifiers (teachers) to train a third classifier (student) using a modified soft-triplet loss-based metric learning approach. Additionally, a weighted averaging technique is employed to rectify the noise in the predicted labels generated from the teacher networks. Experimental results demonstrate that the proposed UDKD significantly improves performance in terms of mean Average Precision (mAP) and Cumulative Match Characteristic curve (Rank 1, 5, and 10). Specifically, UDKD achieves an mAP of <b>84.57</b> and <b>73.32</b>, and Rank 1 scores of <b>94.34</b> and <b>88.26</b> for Duke to Market and Market to Duke scenarios, respectively. These results surpass the state-of-the-art performance, underscoring the efficacy of UDKD in advancing UDA techniques for person reID and highlighting its potential to enhance performance and robustness in real-world applications.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"8 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203573","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sainul Islam Ansary, Atul Mishra, Sankha Deb, Alok Kanti Deb
{"title":"A framework for robotic grasping of 3D objects in a tabletop environment","authors":"Sainul Islam Ansary, Atul Mishra, Sankha Deb, Alok Kanti Deb","doi":"10.1007/s11042-024-20178-y","DOIUrl":"https://doi.org/10.1007/s11042-024-20178-y","url":null,"abstract":"<p><i>A</i>utomatic grasping of unknown 3D objects is still a very challenging problem in robotics. Such challenges mainly originate from the limitations of perception systems and implementations of the grasp planning methods for handling arbitrary 3D objects on real robot platforms. This paper presents a complete framework for robotic grasping of unknown 3D objects in a tabletop environment. The framework comprises of a 3D perception system for obtaining the complete point cloud of the objects, followed by a module for finding the best grasp by an object-slicing based grasp planner, a module for trajectory generation for pick and place operations, and finally performing the planned grasps on a real robot platform. The proposed 3D object perception captures the complete geometry information of the target object using two depth cameras placed at different locations. A hole-filling algorithm is also proposed to quickly fill the missing data points in the captured point cloud of target object. The object-slicing based grasp planner is extended to handle the obstacles posed by the neighbouring objects on a tabletop environment. Then, the proposed framework is tested on common household objects by performing pick and place operations on a real robot fitted with an adaptive gripper. Moreover, finding the best feasible grasp in the presence of neighbouring objects is also demonstrated such as avoiding the table-top and surrounding objects.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"2 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203575","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An autoencoder based unsupervised clustering approach to analyze the effect of E-learning on the mental health of Indian students during the Covid-19 pandemic","authors":"Pritha Banerjee, Chandan Jana, Jayita Saha, Chandreyee Chowdhury","doi":"10.1007/s11042-024-19983-2","DOIUrl":"https://doi.org/10.1007/s11042-024-19983-2","url":null,"abstract":"<p>Due to the Covid-19 pandemic, the education system in India has changed to remote that is, online study mode. Though there are works on the effect of teaching learning on Indian students, the effect of online mode and associated mental state, particularly when the entire country is going through a crisis could not be found in the literature. Our goal is to analyze data and find some pattern through which we can understand the effectiveness of the online study and also try to figure out the stress level. The dataset we collected from 500 undergraduate college students during April-May, 2021 is in questionnaire format. Our contribution in this paper are - (i) publishing a dataset of student feedbacks, and (ii) designing a data processing pipeline involving autoencoders followed by clustering approach. The dataset is in text format so for our analysis we have converted the dataset into a numerical format using the concept of a binary bag of words. Dimensionality reduction is applied through autoencoder for an effective latent space representation. Finally, for finding patterns out of this dimensionally reduced feature space, we have applied unsupervised learning algorithms - kMeans and DBSCAN. A thorough analysis of the clustering process reveals that the absence of social communication in purely online education provokes isolation irrespective of the urban or rural background of the students. However, it could supplement offline classes as a substantial number of students welcomed the concept of online learning as reported in the data.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"7 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203583","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}