{"title":"Application of Next-generation Sequencing Method for Elucidating Evolutionary History of Chloroplast Genome in Plant Kingdom","authors":"Hoang Dang Nguyen, Hoang Dang Khoa Do","doi":"10.1109/KSE50997.2020.9287768","DOIUrl":"https://doi.org/10.1109/KSE50997.2020.9287768","url":null,"abstract":"Next-generation sequencing (NGS) method resulted in a flood of genomic data (i.e., nuclear and organelle genomes) which provided deeper insights into the evolution of living organisms (including plants, animals, and microorganisms). Additionally, the NGS enabled various applications in different fields such as rapid diagnosis of genetic diseases, developing molecular markers for valuable plants, and detection of food-related microbiomes. In this review, we present an overview of the evolution of chloroplast genome in plant kingdom inferred from NGS data. The rapidly increased chloroplast genome data allowed us to explore different aspects of land plants such as the evolution of chloroplast genomes, mining barcodes, patterns of gene loss, and phylogenetic relationships. Specifically, protein-coding regions in chloroplast genomes contributed to reconstructing the phylogenetic relationship among plant species and to making a new classification system. Genomic events (i.e., deletion, inversion, and duplication) provided useful information for a better understanding of the differentiation of chloroplast genomes as well as the patterns of parasitism in plants. Also, the future perspective of chloroplast genome studies was discussed.","PeriodicalId":275683,"journal":{"name":"2020 12th International Conference on Knowledge and Systems Engineering (KSE)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114203307","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Vietnamese Antonyms Detection Based on Specialized Word Embeddings using Semantic Knowledge and Distributional Information","authors":"Van-Tan Bui, Khac-Quy Dinh, Phuong-Thai Nguyen","doi":"10.1109/KSE50997.2020.9287542","DOIUrl":"https://doi.org/10.1109/KSE50997.2020.9287542","url":null,"abstract":"Antonymy is one of the fundamental relations shaping the organization of the semantic lexicon. Therefore, automatic detection of antonymy can be leveraged to make contributions to different NLP tasks, such as Machine Translation, Sentiment Analysis, and Information Retrieval. Currently, most prior studies just focus on discriminating between antonyms and synonyms. However, not only synonymy but other semantic relations, such as hypernymy, co-hyponyms, which also get high similarities thereby making it hard to discriminate. Therefore, it is necessary to make a thorough research on identifying antonyms from a wide variety of other semantic relations. In this paper, we aim to identify Vietnamese antonyms pairs according to the vector semantics approach. Specifically, we build up specialized word embedding models by incorporating lexical-semantic resource and distributional information. In addition, we propose specialized Vietnamese features and utilize mutual information between words in order to integrate with word embedding vectors. This aims to generate more meaningful feature vectors for supervised classifiers solving antonym detection problems. Furthermore, we construct three reliable Vietnamese testing datasets consisting of AntSynlOOO, AntHyplOOO, and AntMixlOOO, for this task. Experimental results conducted on the datasets demonstrated that our model performs effectively.","PeriodicalId":275683,"journal":{"name":"2020 12th International Conference on Knowledge and Systems Engineering (KSE)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131787588","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Visualizing Vietnam’s Scientific Research Projects Based on Pre-trained Language Models and UMAP","authors":"Hien T. Nguyen, Duy V. Huynh, H. Duong, N. Thoai","doi":"10.1109/KSE50997.2020.9287782","DOIUrl":"https://doi.org/10.1109/KSE50997.2020.9287782","url":null,"abstract":"This paper presents a method for vector representations and dimensionality reduction of documents using pretrained language models and Uniform Manifold Approximation and Projection (UMAP). The method aims at visualizing Vietnam’s scientific research projects in order to help searching for, as well as exploring, similar projects given a new proposal or research topic. First, documents are vectorized using a pretrained language model. Then, the obtained document vectors are projected onto a two-dimensional space using UMAP. Given a query, it is also passed through two steps as a document. In the two-dimensional space, each document is represented as a circle and the nearest circles are, the more similar the corresponding documents are. We consider the abstract or title of a project as its representative and call each as a document. We conduct experiments in order to compare the representation power of multilingual BERT-base and PhoBERT by training classifiers using softmax, support vector machines, and multilayer perception; and visualizing the representations using PCA, t-SNE and UMAP, respectively. The experimental results show the representation power of PhoBERT is better than that of multilingual BERT-base and UMAP is superior to PCA and t-SNE. We also present a visualizing tool allowing human intervention in similarity search.","PeriodicalId":275683,"journal":{"name":"2020 12th International Conference on Knowledge and Systems Engineering (KSE)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123411772","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Keyphrase generation for Vietnamese administrative documents: a collaborative approach","authors":"Thi-Thu-Trang Nguyen, Thi-Hai-Yen Vuong, Van-Lien Tran, Le-Minh Nguyen, X. Phan","doi":"10.1109/KSE50997.2020.9287477","DOIUrl":"https://doi.org/10.1109/KSE50997.2020.9287477","url":null,"abstract":"Keyphrases of a given document can be considered as its condensed summary. Unsupervised models focus on extracting keyphrases based only on the information contained in that document without interacting with other documents. While a good performance supervised learning model for keyphrase generation requires a massive effort to build training data, which can not generalize to new domains. Moreover, according to human perception, a user would comprehend the topic expressed in a document better if that user has already read other documents that express the same topic. Based on the above idea, we proposed a collaborative keyphrase generation system (CollabKG): a novel semi-supervised method by leveraging limited labeled data. The amount of labeled data will be enriched over time by the user. In our work, we conduct research on a large scale dataset consisting of 500,000 Vietnamese administrative documents. In CollabKG, each document is represented as a feature vector, and a cluster pruning algorithm is employed to accelerate finding the most similar documents. The generated keyphrases were manually evaluated for relevance and accuracy. In the final, the result we achieved shows high ratification. Therefore, we can conclude that CollabKG has good performance and fits a real-time system.","PeriodicalId":275683,"journal":{"name":"2020 12th International Conference on Knowledge and Systems Engineering (KSE)","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130025051","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Vinh Tran Tuan, Pham Van Ha, N. T. Thuy, N. T. Thanh
{"title":"Analysis of CALIPSO satellite imagery for air pollution source identification in Hanoi, Vietnam","authors":"Vinh Tran Tuan, Pham Van Ha, N. T. Thuy, N. T. Thanh","doi":"10.1109/kse50997.2020.9287409","DOIUrl":"https://doi.org/10.1109/kse50997.2020.9287409","url":null,"abstract":"Identification of air pollution is a significant task for environmental control, manage, and policy decision. In traditional approach, chemical composition analysis is very costly to be applied frequently and largely, especially in developing countries. This paper proposes the use of CALIPSO satellite image to analyze the aerosol sources, highly linking with particulate matter sources, in Hanoi in the periods from 2016 to 2019. Other datasets including Hanoi land-cover map and the monthly average wind direction from MERRA-2 reanalysis were utilized to explain the spatial distribution of aerosol sources. The result shows that polluted continental/smoke accounted for the largest proportion with 40%, followed by polluted dust, smoke, dust and clean continental with a percentage of 35%, 14%, 6% and 5%, respectively. The monthly variation of the aerosol type shown a high frequency of elevated smoke in March, April and October meanwhile polluted continental/smoke was a peak in the dry season (November to March) and lower in the rainy season (May to September). The aerosol types were observed mostly at high attitude including polluted dust, polluted continental and elevated smoke could be related to long-range transport from other places to Hanoi. This study highlights the potentials of using CALIPSO products for identification of air pollution sources in Vietnam.","PeriodicalId":275683,"journal":{"name":"2020 12th International Conference on Knowledge and Systems Engineering (KSE)","volume":"81 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131119656","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A BERT-based Hierarchical Model for Vietnamese Aspect Based Sentiment Analysis","authors":"Oanh T. K. Tran, Viet The Bui","doi":"10.1109/KSE50997.2020.9287650","DOIUrl":"https://doi.org/10.1109/KSE50997.2020.9287650","url":null,"abstract":"Aspect based sentiment analysis (ABSA) is the task of identifying sentiment polarity towards specific entities and their aspects mentioned in customers’ reviews. This paper presents a new and effective hierarchical model using the pre-trained language model, Bidirectional Encoder Representations from Transformers (BERT). This model integrates the context information of the previous layer (i.e. entity type) into the prediction for the following layer (i.e. aspect type) and optimizes the global loss functions to capture the entire information from all layers. Experimental results on two public benchmark datasets in Vietnamese showed that the proposed model is superior to the existing ones. Specifically, the model achieved 84.23% and 82.06% in the F1_micro scores in detecting entities and their aspects on the domains of restaurants and hotels, respectively. In identifying aspect sentiment polarity, the model gained 71.3% and 74.69% in the F1_micro scores on the domains of restaurants and hotels, respectively. These results outperformed the best submission of the campaign by a large margin and gained a new state of the art.","PeriodicalId":275683,"journal":{"name":"2020 12th International Conference on Knowledge and Systems Engineering (KSE)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132392935","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Extracting triples from Vietnamese text to create knowledge graph","authors":"Huong Duong To, P. Do","doi":"10.1109/KSE50997.2020.9287471","DOIUrl":"https://doi.org/10.1109/KSE50997.2020.9287471","url":null,"abstract":"Knowledge graph (KG) plays an increasingly important role in the current technology era. It is very useful in many fields such as searching for information, supporting question answering systems and in other AI applications, etc. Besides the private Knowledge Graphs like Google's \"Knowledge graph\", we also have Open Knowledge graphs as DBpedia, YAGO, ... But generally, these Open Knowledge graphs contain very little data in Vietnamese. Due to this practice, our team proposed a way to create Vietnamese Knowledge graph by automatically scratching the Vietnamese text on the website as input, then using Named-entity recognition (NER) to recognize entities as nouns and combined with POS tag identifies words as verbs to extract triple in the simple sentences of the paragraph. The triple was then loaded into Neo4j to visualize the Knowledge graph.","PeriodicalId":275683,"journal":{"name":"2020 12th International Conference on Knowledge and Systems Engineering (KSE)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128115851","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tung Pham Thanh, M. Chau, T. N. Manh, Linh Le Dinh, L. T. Ha
{"title":"Compression Artifacts Image Patch database for Perceptual Quality Assessment","authors":"Tung Pham Thanh, M. Chau, T. N. Manh, Linh Le Dinh, L. T. Ha","doi":"10.1109/KSE50997.2020.9287704","DOIUrl":"https://doi.org/10.1109/KSE50997.2020.9287704","url":null,"abstract":"Ground truth is one of the most important component for training, testing, and benchmarking algorithms for objective quality assessment In this paper, we propose an image patch quality database with compression artifacts. We create a new database of image patches with High Efficiency Video Coding (HEVC) compression artifacts. Then, the subjective test is conducted in a controlled environment to obtain the ground truth of image patch quality, where we collect differential mean opinion scores (DMOS) from a larger amount of observers. Finally, the rank order correlation factors between DMOS and a set of popular image quality metrics are calculated and presented. The proposed database is expected for learning patch based IQA model for block size in video rate-distortion optimization.","PeriodicalId":275683,"journal":{"name":"2020 12th International Conference on Knowledge and Systems Engineering (KSE)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128139028","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A deep learning approach for solving Poisson’s equations","authors":"Thanh Nguyen, B. Pham, Trung T. Nguyen, B. Nguyen","doi":"10.1109/KSE50997.2020.9287419","DOIUrl":"https://doi.org/10.1109/KSE50997.2020.9287419","url":null,"abstract":"Partial differential equations (PDEs) have a lot of applications in different fields of research during the last decades. In this paper, we study a mesh-free deep learning method for solving PDE systems, especially for Poisson’s equations. Different from traditional techniques using finite volume or finite element method, we design suitable neural networks that can approximate solutions of a PDE hy formulating it as an optimization problem. To minimize a loss function, we use the gradient descent algorithm to obtain the neural networks’ optimal set of parameters. The experimental results show that the proposed methods can achieve promising results in solving three types of PDEs: Burgers’ equation, Laplace’s equation, and Poisson’s equation, where the mean square errors vary from 10-7 to 10-10.","PeriodicalId":275683,"journal":{"name":"2020 12th International Conference on Knowledge and Systems Engineering (KSE)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127464154","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Real-time vehicle detection and counting based on YOLO and DeepSORT","authors":"Thanh-Nghi Doan, Minh-Tuyen Truong","doi":"10.1109/KSE50997.2020.9287483","DOIUrl":"https://doi.org/10.1109/KSE50997.2020.9287483","url":null,"abstract":"Intelligent vehicle detection and counting are becoming increasingly important in the field of highway and transport infrastructure management. Traditional methods based on image information have shown several limitations. Especially in real-world environment conditions, real-time detection, classification and counting each type of vehicle are still a big challenge. The main purpose of this study is to develop an adaptive model that combine YOLOv4 and DeepSORT. The new model can detect object with high accuracy and fast calculation time by taking the benefits of tracking with a focus on simple, effective algorithms. Experiment results have shown that our proposed approach outperforms the original one at least 11% of AP and 12% of AP50 for most field scenarios of our dataset at a real-time speed of ~32 FPS.","PeriodicalId":275683,"journal":{"name":"2020 12th International Conference on Knowledge and Systems Engineering (KSE)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124292310","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}