{"title":"An Improved K-means Algorithm Based on Multiple Clustering and Density","authors":"Yulong Ling, Xiao Zhang","doi":"10.1145/3457682.3457695","DOIUrl":"https://doi.org/10.1145/3457682.3457695","url":null,"abstract":"The initial clustering center set of the k-means algorithm is randomly selected, which leads to unstable clustering results. To address this shortcoming, many improved k-means algorithms based on density have propersed, but the time complexity of these algorithms is too high. In order to improve clustering stability and reduce the clustering time, this paper proposes an improved algorithm based on multiple clustering and density. This algorithm firstly calls the k-means algorithm for many time, and adaptively selects excellent sample set according to the distance between samples and the corresponding cluster center. Then the initial cluster center set is selected according to the principle of the furthest distance and high density. The experiment on the UCI data sets shows that the algorithm in this paper not only improves the performance but also ensures the stability of clustering result compared with the k-means algorithm and the kmeans++ algorithm. Compare to improved density-based k-means algorithms, the proposed algorithm can greatly save the clustering time.","PeriodicalId":142045,"journal":{"name":"2021 13th International Conference on Machine Learning and Computing","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116456638","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Active Learning for Concept Prerequisite Learning in Wikipedia","authors":"Xinying Hu, Yu He, Guangzhong Sun","doi":"10.1145/3457682.3457771","DOIUrl":"https://doi.org/10.1145/3457682.3457771","url":null,"abstract":"The prerequisite relationship of the concept plays an important role in education. Previously, the prerequisites were given by experts, which is very costly. With the development of the Internet, many new concepts have emerged. And there are a growing number of electronic materials available. In this case, it's important to produce an efficient and accessible prerequisite annotator that is beneficial to make an efficient learning plan. This paper proposes a method to mine prerequisite relationships of concepts from Wikipedia by using active learning, which can use fewer artificial labels to obtain an accurate model. The proposed method extracts features from Wikipedia articles, and designs a new active learning algorithm based on the characteristics of concept prerequisites. Experimental results show that the proposed model outperforms existing active learning methods for concept prerequisite learning.","PeriodicalId":142045,"journal":{"name":"2021 13th International Conference on Machine Learning and Computing","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128220378","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ebenezer Nanor, Wei-Ping Wu, S. Bayitaa, V. K. Agbesi, Brighter Agyemang
{"title":"Algorithmic Generation of Positive Samples for Compound-Target Interaction Prediction","authors":"Ebenezer Nanor, Wei-Ping Wu, S. Bayitaa, V. K. Agbesi, Brighter Agyemang","doi":"10.1145/3457682.3457689","DOIUrl":"https://doi.org/10.1145/3457682.3457689","url":null,"abstract":"Machine Learning (ML) methods have become the preferred computational methods for Compound-Target Interaction (CTI) prediction in small drug development in Bioinformatics, because they have been proven to be very efficient. However, the extremely imbalance nature of CTI datasets presents a major challenge when ML methods are leveraged to predict CTIs. To a large extent, these methods inaccurately predict the class of the minority samples, i.e. positive samples, which are rather of much interest to players in the business of drug development. In this study, we aim to improve the performance of ML-based methods for prediction of CTIs, particularly the positive samples, by addressing the challenge of class imbalance. We applied the technique of deep generative modeling to oversample selected positive samples from the original dataset in order to construct balance datasets. The process of oversampling espoused the General-based approach and a novel Domain Specific-based approach. In the experimental section, 3 Deep Learning (DL) methods and 6 classical ML methods were trained on the original imbalance dataset and two constructed sets of balance data to investigate their performance in the prediction of CTIs. To ensure robustness of the ML-based predictive methods, a Grid Search with 5-fold Cross Validation (CV) was performed to estimate the best hyperparameters for training. Convolutional Neural Network (CNN) produced the most competitive results in predicting positive samples following evaluation carried out with Recall metric.","PeriodicalId":142045,"journal":{"name":"2021 13th International Conference on Machine Learning and Computing","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128799153","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fan Shi, Hong-hua Wang, Tianhang Lu, Chengliang Wang
{"title":"Multi-Objective Optimal Design of Excitation Systems of Synchronous Condensers for HVDC Systems Based on MOEA/D","authors":"Fan Shi, Hong-hua Wang, Tianhang Lu, Chengliang Wang","doi":"10.1145/3457682.3457770","DOIUrl":"https://doi.org/10.1145/3457682.3457770","url":null,"abstract":"In order to optimize the reactive power characteristics of synchronous condensers and improve the capability of condensers to support the voltage of AC systems, in this paper, the outer loop control of the reactive power of condensers and the outer loop control of the voltage of AC systems are introduced into the design of the main excitation systems of condensers in high voltage direct current (HVDC) systems. Meanwhile, taking the integral values, peak values and steady-state values of voltage deviations of AC systems as objective functions, the multi-objective optimization design of the proportional adjustment coefficients in the outer loop control of the reactive power of condensers and the voltage of AC systems is carried out via utilizing a multi-objective evolutionary algorithm based on decomposition (MOEA/D) combining with fuzzy control method. Its purpose is to alleviate the overvoltage problems of power grids caused by the feedback of the reactive power of condensers and the voltage of AC systems. Lastly, the simulation model of ±100 kV HVDC system with a synchronous condenser is established. The simulation results show that the optimal design method of excitation systems of synchronous condensers proposed in this paper can optimize the reactive power characteristics of the condenser, ensure the rapid regulation of the voltage of the AC system by the condenser, and solve the overvoltage problem in the AC system caused by the reactive power regulation of the condenser which can not change suddenly and the feedback links of the reactive power of the condenser and the voltage of the AC system in the excitation system.","PeriodicalId":142045,"journal":{"name":"2021 13th International Conference on Machine Learning and Computing","volume":"68 6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116282801","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Practical Indoor and Outdoor Seamless Navigation System Based on Electronic Map and Geomagnetism","authors":"K. Qiu, Ruizhi Chen, He Huang","doi":"10.1145/3457682.3457772","DOIUrl":"https://doi.org/10.1145/3457682.3457772","url":null,"abstract":"In order to solve the problem that the transition point facing indoor and outdoor seamless positioning is low in accuracy and the coordinates are difficult to be uniformly converted, in this paper, a combination of Baidu map app positioning technology using GPS, base station and Wi-Fi signal positioning and indoor geomagnetic fingerprint node is developed to develop a system for seamless positioning and navigation indoors and outdoors. We propose a novel and rapid method for establishing coordinate uniformity to solve the key problem of indoor and outdoor seamless positioning - coordinate smoothing conversion. Through the combination of 3D laser scanning technology and GPS positioning technology, the data from multiple viewing angles are organized into the same coordinate system according to the transformation matrix. The iterative closest point algorithm registration technique is used to obtain a three-dimensional model of the high-precision local coordinate system of indoor and outdoor critical points.","PeriodicalId":142045,"journal":{"name":"2021 13th International Conference on Machine Learning and Computing","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114864866","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Biological Named Entity Recognition and Role Labeling via Deep Multi-task Learning","authors":"Fei Deng, Dongdong Zhang, Jing Peng","doi":"10.1145/3457682.3457751","DOIUrl":"https://doi.org/10.1145/3457682.3457751","url":null,"abstract":"Bioscience is an experimental science. The qualitative and quantitative findings of the biological experiments are often exclusively available in the form of figures in published papers. In this paper, we introduce the SourceData model, which captures a key aspect of the biological experimental design by categorizing biological entity involved in the experiment into one of the six roles. Our work aims at determining whether a given entity is subjected to a perturbation or is the object of a measurement (entity role labeling) through automatic natural language algorithms. We use state-of-the-art transformer models (e.g., Bert and its variants) as a strong baseline, find that after jointly trained with biological named entity recognition task by deep multi-task learning (MTL), the F1 score gets improved by 2% compared to previous single-task architecture. Also, for named entity recognition task, the MTL method achieves comparable performance in five public datasets. Further analysis reveals the importance of fusing entity information at the input layer of entity role labeling task and incorporating global context.","PeriodicalId":142045,"journal":{"name":"2021 13th International Conference on Machine Learning and Computing","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133927145","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Leveraging CNN and Bi-LSTM in Indonesian G2P Using Transformer","authors":"A. Rachman, S. Suyanto, Ema Rachmawati","doi":"10.1145/3457682.3457706","DOIUrl":"https://doi.org/10.1145/3457682.3457706","url":null,"abstract":"We apply a transformer called tensor2tensor toolkit, which is based on Tensorflow, to overcome the Grapheme-to-Phoneme conversion problem. This study performs conversions to produce pronunciation symbols for certain letter sequences in Indonesian particularly. The unavailability of the G2P conversion system in Indonesian is currently being faced, so research is being carried out to create a system that can solve this problem by applying the Transformer. The transformer has a simple network architecture based solely on the attention mechanism, so we took advantage of eliminating convolution and redundancies—complex recurrent and convolution neural networks including encoders and decoders as the basis for the sequence transduction model. The excellent performance of the model is obtained through the attention mechanism by connecting the encoder and decoder. By using this tool, we carry out to compare among KBBI and CMU dictionary datasets. We attained a word error rate (WER) of 6,7% on the KBBI data set after training for three days on two core CPUs, which has an accuracy of 93,3%, improving over the existing best results CMU dictionary dataset for 26% word error rate. In this study, we carried out a detailed experimental evaluation by assessing the processing time and the error rate of words and then compared it with state of the art. By demonstrating this Transformer, this tool successfully generalizes and then applies it to several Indonesian elements with limited training data and large training data. We concluded that the transformer model is suitable for dealing with the G2P problem at hand for this task.","PeriodicalId":142045,"journal":{"name":"2021 13th International Conference on Machine Learning and Computing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130889070","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Visualization Analysis of Library Research in the Context of Big Data Based on Knowledge Map","authors":"Chen Ke","doi":"10.1145/3457682.3457775","DOIUrl":"https://doi.org/10.1145/3457682.3457775","url":null,"abstract":"The development of big data technology has brought a series of new content, opportunities and challenges to the library, and scholars have conducted many studies around this. This study obtained 98 related papers from the core collection of Web of Science, using the knowledge map research method, and using the CiteSpace software to analyze the number of annual papers, journals, authors, institutions, keywords and topic changes. The results show that scholars’ attention to this field has gradually increased, and the number of annual papers has increased year by year. China is the country with the highest contribution to the research, and the contribution of Chinese scholars is higher than that of other countries. Big data, university library, data management and information service are the key research contents of this field. In the end, this paper makes a research prospect, and scholars should further strengthen the research on user behavior, user portrait and intellectual property risk.","PeriodicalId":142045,"journal":{"name":"2021 13th International Conference on Machine Learning and Computing","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133386306","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhiyong Li, Yunqi Wang, Jinghan Mu, Wei Liao, Kui Zhang
{"title":"InSAR Deformation Time-series Reconstruction for Rainfall-induced Landslides Based on Gaussian Process Regression","authors":"Zhiyong Li, Yunqi Wang, Jinghan Mu, Wei Liao, Kui Zhang","doi":"10.1145/3457682.3457700","DOIUrl":"https://doi.org/10.1145/3457682.3457700","url":null,"abstract":"Multi-baseline interferometric synthetic aperture radar (InSAR) techniques have been accepted as effective remote sensing tools for detecting and monitoring landslide movements. With the use of stacked synthetic aperture radar (SAR) imageries, it is capable of generating precise ground displacement time-series. In order to further suppress noise induced by atmospheric effects, a post-process step, named as temporal filter, is required to be applied to the final displacement time-series in most applications. As displacement signals are strongly correlated in time, the traditional window-based/least squares filter is widely adopted. Since the window-based filter balances a tradeoff between noise smoothing and signal smoothing, the resulting time-series may strongly deviate from the true values when ground displacements appear high nonlinearity. In this paper, a new approach is proposed to reconstruct the InSAR deformation time-series for rainfall-induced landslides. This method establishes a nonparametric model based on the idea of Gaussian process regression (GPR) and introduces precipitation data as a priori knowledge. A strong relationship between rainfall history and ground movements is therefore constructed, which is extremely helpful in preventing the loss of high-frequency displacement signals. The proposed approach was applied to the InSAR landslide displacement time-series obtained from 108 European Space Agency (ESA) Sentinel-1A satellite SAR images. Experimental results demonstrate that it is capable of preserving the details of the temporal evolution of ground displacements effectively compared to the traditional window-based method, in particular on the surface of sliding mass.","PeriodicalId":142045,"journal":{"name":"2021 13th International Conference on Machine Learning and Computing","volume":"148 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124146100","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Xie, Haifeng Xu, Jiang Liu, Yan Zhang, Danjv Lv
{"title":"Bird Songs Recognition Based on Ensemble Extreme Learning Machine","authors":"S. Xie, Haifeng Xu, Jiang Liu, Yan Zhang, Danjv Lv","doi":"10.1145/3457682.3457750","DOIUrl":"https://doi.org/10.1145/3457682.3457750","url":null,"abstract":"ELM (Extreme Learning Machine) is a random method for Single-hidden layer feedforward neural network construction, and MFCC (Mel-frequency Cepstrum Coefficient) is a kind of feature parameter for speech recognition. Based on Ensemble ELM research on bird songs recognition technology, this paper firstly preprocesses the bird songs data collected by web crawler, then extracts MFCC feature parameters from the songs data, and gets the improved MFCC feature parameters through differential calculation. Finally, Ensemble ELM is used for bird songs classification and recognition. The experimental results show that the Ensemble ELM method can achieve a recognition rate of 90.42% in the classification of 10 kinds of birds.","PeriodicalId":142045,"journal":{"name":"2021 13th International Conference on Machine Learning and Computing","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115084191","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}