Jonggwon Park, Kyoyun Choi, Seola Oh, Leekyung Kim, Jonghun Park
{"title":"Note-level singing melody transcription with transformers","authors":"Jonggwon Park, Kyoyun Choi, Seola Oh, Leekyung Kim, Jonghun Park","doi":"10.3233/ida-227077","DOIUrl":"https://doi.org/10.3233/ida-227077","url":null,"abstract":"Recognizing a singing melody from an audio signal in terms of the music notes’ pitch onset and offset, referred to as note-level singing melody transcription, has been studied as a critical task in the field of automatic music transcription. The task is challenging due to the different timbre and vibrato of each vocal and the ambiguity of onset and offset of the human voice compared with other instrumental sounds. This paper proposes a note-level singing melody transcription model using sequence-to-sequence Transformers. The singing melody annotation is expressed as a monophonic melody sequence and used as a decoder sequence. Overlapping decoding is introduced to solve the problem of the context between segments being broken. Applying pitch augmentation and and adding noisy dataset with data cleansing turns out to be effective in preventing overfitting and generalizing the model performance. Ablation studies demonstrate the effects of the proposed techniques in note-level singing melody transcription, both quantitatively and qualitatively. The proposed model outperforms other models in note-level singing melody transcription performance for all the metrics considered. For fundamental frequency metrics, the voice detection performance of the proposed model is comparable to that of a vocal melody extraction model. Finally, subjective human evaluation demonstrates that the results of the proposed models are perceived as more accurate than the results of a previous study.","PeriodicalId":50355,"journal":{"name":"Intelligent Data Analysis","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135823984","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Performance analysis of deep transfer learning approaches in detecting and classifying brain tumor from magnetic resonance images","authors":"P.L. Deepa, P.D. Narain, V.G. Sreena","doi":"10.3233/ida-227321","DOIUrl":"https://doi.org/10.3233/ida-227321","url":null,"abstract":"The Central Nervous System (CNS) is one of the most crucial parts of the human body. Brain tumor is one of the deadliest diseases that affect CNS and they should be detected earlier to avoid serious health implications. As it is one of the most dangerous types of cancer, its diagnosis is a crucial part of the healthcare sector. A brain tumor can be malignant or benign and its grade recognition is a tedious task for the radiologist. In the recent past, researchers have proposed various automatic detection and classification techniques that use different imaging modalities focusing on increased accuracy. In this paper, we have done an in-depth study of 19 different trained deep learning models like Alexnet, VGGnet, DarkNet, DenseNet, ResNet, InceptionNet, ShuffleNet, NasNet and their variants for the detection of brain tumors using deep transfer learning. The performance parameters show that NASNet-Large is outperforming others with an accuracy of 98.03% for detection and 97.87% for classification. The thresholding algorithm is used for segmenting out the tumor region if the detected output is other than normal.","PeriodicalId":50355,"journal":{"name":"Intelligent Data Analysis","volume":"254 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135823986","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Linlin Hou, Haixiang Zhang, Qing-Hu Hou, Alan J.X. Guo, Ou Wu, Ting Yu, Ji Zhang
{"title":"SARW: Similarity-Aware Random Walk for GCN","authors":"Linlin Hou, Haixiang Zhang, Qing-Hu Hou, Alan J.X. Guo, Ou Wu, Ting Yu, Ji Zhang","doi":"10.3233/ida-227085","DOIUrl":"https://doi.org/10.3233/ida-227085","url":null,"abstract":"Graph Convolutional Network (GCN) is an important method for learning graph representations of nodes. For large-scale graphs, the GCN could meet with the neighborhood expansion phenomenon, which makes the model complexity high and the training time long. An efficient solution is to adopt graph sampling techniques, such as node sampling and random walk sampling. However, the existing sampling methods still suffer from aggregating too many neighbor nodes and ignoring node feature information. Therefore, in this paper, we propose a new subgraph sampling method, namely, Similarity-Aware Random Walk (SARW), for GCN with large-scale graphs. A novel similarity index between two adjacent nodes is proposed, describing the relationship of nodes with their neighbors. Then, we design a sampling probability expression between adjacent nodes using node feature information, degree information, neighbor set information, etc. Moreover, we prove the unbiasedness of the SARW-based GCN model for node representations. The simplified version of SARW (SSARW) has a much smaller variance, which indicates the effectiveness of our subgraph sampling method in large-scale graphs for GCN learning. Experiments on six datasets show our method achieves superior performance over the state-of-the-art graph sampling approaches for the large-scale graph node classification task.","PeriodicalId":50355,"journal":{"name":"Intelligent Data Analysis","volume":"182 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135781890","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Heterogeneous information fusion based graph collaborative filtering recommendation","authors":"Ruihui Mu, Xiaoqin Zeng, Jiying Zhang","doi":"10.3233/ida-227025","DOIUrl":"https://doi.org/10.3233/ida-227025","url":null,"abstract":"Nowadays, with the application of 5G, graph-based recommendation algorithms have become a research hotspot. Graph neural networks encode the graph structure information in the node representation through an iterative neighbor aggregation method, which can effectively alleviate the problem of data sparsity. In addition, more and more information graph can be used in collaborative filtering recommendation, such as user social information graph, user or item attributed information graph, etc. In this paper, we propose a novel heterogeneous information fusion based graph collaborative filtering method, which models graph data from different heterogeneous graph, and combines them together to enhance presentation learning. Through information propagation and aggregation, our model can learn the latent embeddings effectively and enhance the performance of recommendation. Experimental results on different datasets validate the outperformance of the proposed framework.","PeriodicalId":50355,"journal":{"name":"Intelligent Data Analysis","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135781891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Cheng Zhang, Jianqi Zhong, Wenming Cao, Jianhua Ji
{"title":"Multiple Distilling-based spatial-temporal attention networks for unsupervised human action recognition","authors":"Cheng Zhang, Jianqi Zhong, Wenming Cao, Jianhua Ji","doi":"10.3233/ida-230399","DOIUrl":"https://doi.org/10.3233/ida-230399","url":null,"abstract":"Unsupervised action recognition based on spatiotemporal fusion feature extraction has attracted much attention in recent years. However, existing methods still have several limitations: (1) The long-term dependence relationship is not effectively extracted at the time level. (2) The high-order motion relationship between non-adjacent nodes is not effectively captured at the spatial level. (3) The model complexity is too high when the cascade layer input sequence is long, or there are many key points. To solve these problems, a Multiple Distilling-based spatial-temporal attention (MD-STA) networks is proposed in this paper. This model can extract temporal and spatial features respectively and fuse them. Specifically, we first propose a Screening Self-attention (SSA) module; this module can find long-term dependencies in distant frames and high-order motion patterns between non-adjacent nodes in a single frame through a sparse metric on dot product pairs. Then, we propose the Frames and Keypoint-Distilling (FKD) module, which uses extraction operations to halve the input of the cascade layer to eliminate invalid key points and time frame features, thus reducing time and memory complexity. Finally, the Dim-reduction Fusion (DRF) module is proposed to reduce the dimension of existing features to further eliminate redundancy. Numerous experiments were conducted on three distinct datasets: NTU-60, NTU-120, and UWA3D, showing that MD-STA achieves state-of-the-art standards in skeleton-based unsupervised action recognition.","PeriodicalId":50355,"journal":{"name":"Intelligent Data Analysis","volume":"131 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135923367","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Generate custom travel magazine layouts","authors":"Xiangping Wu, Shuaiwei Yao, Zheng Zhang, Jun Hu","doi":"10.3233/ida-230063","DOIUrl":"https://doi.org/10.3233/ida-230063","url":null,"abstract":"Among the problems of specifying the style and number of elements of a travel magazine, the problem of generating magazine layout by constraining text, and constraining graph layout remains a complex and unsolved problem. In this paper, we generate layouts of text satisfying constraints via GAN. Due to the complexity and variety of graph designs, we enhance the performance of the discriminator and the generator so that the layouts generated by the generator are more constrained. Add non-corresponding constraint text and real layout pairs to the discriminator to enhance the performance of the discriminator; then add a spatial attention mechanism to the layout encoder to extract the features of the layout and generate high-quality layouts. We demonstrate that the proposed method can generate high-quality layouts of text satisfying the constraints, and we validate the effectiveness of this method through user ratings.","PeriodicalId":50355,"journal":{"name":"Intelligent Data Analysis","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136014463","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"CMCEE: A joint learning framework for cascade decoding with multi-feature fusion and conditional enhancement for overlapping event extraction","authors":"Zerui Dai, Shengwei Tian, Long Yu, Qimeng Yang","doi":"10.3233/ida-230284","DOIUrl":"https://doi.org/10.3233/ida-230284","url":null,"abstract":"Event extraction (EE) is an important natural language processing task. With the passage of time, many powerful and effective models for event extraction tasks have been developed. However, there has been limited research on complex overlapping event extraction. Therefore, we propose a new cascade decoding model: A Joint Learning Framework for Cascade Decoding with Multi-Feature Fusion and Conditional Enhancement for Overlapping Event Extraction. 1) In this model, we introduce a cascade decoding mechanism with multi-feature fusion to better capture the interaction between decoding layers. 2) Additionally, we introduce an enhanced conditional layer normalization (ECLN) mechanism to enhance the interaction between subtasks. Simultaneously, the use of a cascade decoding model effectively addresses the problem of overlapping events. The model successively performs three subtasks, type detection, trigger word extraction and argument extraction. All three subtasks learned together in a framework, and a new conditional normalization mechanism is used to capture dependencies among these subtasks. The experiments are conducted using the overlapping event benchmark, FewFC dataset. The experimental evaluation demonstrates that our model achieves a higher F1 score on the overlapping event extraction task compared to the original overlapping event extraction model.","PeriodicalId":50355,"journal":{"name":"Intelligent Data Analysis","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136014462","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Ordination-based verification of feature selection in pattern evolution research","authors":"Gábor Hosszú","doi":"10.3233/ida-230326","DOIUrl":"https://doi.org/10.3233/ida-230326","url":null,"abstract":"This article explains the idea of pattern systems that develop gradually. These systems involve symbolic communication that includes symbols, syntax, and layout rules. Some pattern systems change over time, like historical scripts. The scientific study of pattern systems is called pattern evolution research, and scriptinformatics is concerned with the modelling of the evolution of scripts. The symbol series consists of symbols from a pattern system, while the graph sequence is a symbol sequence applied with a specific technology. This article describes a method for examining tested pattern systems to confirm their classification, which focuses on more ancient features. The method’s effectiveness was tested on Rovash scripts and graph sequences. Multivariate analysis was carried out by using PAST4 software, employing principal coordinates analysis ordination and k-means clustering algorithms.","PeriodicalId":50355,"journal":{"name":"Intelligent Data Analysis","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136057884","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Learning bayesian multinets from labeled and unlabeled data for knowledge representation","authors":"Meng Pang, Limin Wang, Qilong Li, Guo Lu, Kuo Li","doi":"10.3233/ida-227068","DOIUrl":"https://doi.org/10.3233/ida-227068","url":null,"abstract":"The Bayesian network classifiers (BNCs) learned from labeled training data are expected to generalize to fit unlabeled testing data based on the independent and identically distributed (i.i.d.) assumption, whereas the asymmetric independence assertion demonstrates the uncertainty of significance of dependency or independency relationships mined from data. A highly scalable BNC should form a distinct decision boundary that can be especially tailored to specific testing instance for knowledge representation. To address the issue of asymmetric independence assertion, in this paper we propose to learn k-dependence Bayesian multinet classifiers in the framework of multistage classification. By partitioning training set and pseudo training set according to high-confidence class labels, the dependency or independency relationships can be fully mined and represented in the topologies of the committee members. Extensive experimental results indicate that the proposed algorithm achieves competitive classification performance compared to single-topology BNCs (e.g., CFWNB, AIWNB and SKDB) and ensemble BNCs (e.g., WATAN, SA2DE, ATODE and SLB) in terms of zero-one loss, root mean square error (RMSE), Friedman test and Nemenyi test.","PeriodicalId":50355,"journal":{"name":"Intelligent Data Analysis","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135148421","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A multi-instance multi-label learning algorithm based on radial basis functions and multi-objective particle swarm optimization","authors":"Xiang Bao, Fei Han, Qing-Hua Ling, Yan-Qiong Ren","doi":"10.3233/ida-227042","DOIUrl":"https://doi.org/10.3233/ida-227042","url":null,"abstract":"Radial basis function (RBF) neural networks for Multi-Instance Multi-Label (MIML) directly can exploit the connections between instances and labels so that they can preserve useful prior information, but they only adopt Gaussian radial basis function as their RBF whose parameters are difficult to determine. In this paper, parameters can be obtained by multi-objective optimization methods with multi performance measures treated as objectives, specifically, parameter estimation of different RBFs by an improved multi-objective particle swarm optimization (MOPSO) is proposed where Recall rate and Precision rate are chosen to obtain the most desirable Pareto optimal solution set. Furthermore, share-learning factor is proposed to modify the particle velocity in standard MOPSO to improve the global search ability and group cooperative ability. It is experimentally demonstrated that the proposed method can estimate the reliable parameters of different RBFs, and it is also very competitive with the state of art MIML methods.","PeriodicalId":50355,"journal":{"name":"Intelligent Data Analysis","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135250827","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}