{"title":"Big Data Framework for Scalable and Efficient Biomedical Literature Mining in the Cloud","authors":"Zhengru Shen, Xi Wang, M. Spruit","doi":"10.1145/3342827.3342843","DOIUrl":"https://doi.org/10.1145/3342827.3342843","url":null,"abstract":"The massive size of available biomedical literature requires researchers to utilize novel big data technologies in data storage and analysis. Among them is cloud computing which has become the most popular solution for big data applications in industry. However, many bioinformaticians still rely on expensive and inefficient in-house infrastructure to discover knowledge from biomedical literature. Although some cloud-based solutions were constructed recently, they failed to sufficiently address a few key issues including scalability, flexibility, and reusability. Moreover, no study has taken computational cost into consideration. To fill the gap, we proposed a cloud-based big data framework that enables researchers to perform reproducible and scalable large-scale biomedical literature mining in an efficient and cost-effective way. Additionally, a cloud agnostic platform was constructed and then evaluated on two open access corpora with millions of full-text biomedical articles. The results indicate that our framework supports scalable and efficient large-scale biomedical literature mining.","PeriodicalId":254461,"journal":{"name":"Proceedings of the 2019 3rd International Conference on Natural Language Processing and Information Retrieval","volume":"144 5-6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129464721","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Speech Error Detection depending on Linguistic Units","authors":"Seiya Komatsu, M. Sasayama","doi":"10.1145/3342827.3342840","DOIUrl":"https://doi.org/10.1145/3342827.3342840","url":null,"abstract":"In this research, we aim at the construction of a system which detects, points out and corrects speech error (slip of the tongue) of a human speech that occurs in a dialogue system (example: Pepper, Amazon Echo, Google Home) and a human dialogue. In the present dialogue system, even if human makes a speech error, the system cannot recognize it, which could lead to broken communication. So far, we have created a system to detect speech error using deep learning. In this study, we propose a method to augmented training data used for deep learning. The training data is a corpus that collects examples of speech error. At present, the number of training data is insufficient to detect with high accuracy. Therefore, it is necessary to augment the training data. Specifically, the feature of the speech error is examined from an existing speech error corpus, and extended rules are created. The data augmentation of training data is performed by generating dialogue sentence which made the speech error based on the rule. As a result of evaluation experiment, detection accuracy was improved in LSTM model by data augmentation.","PeriodicalId":254461,"journal":{"name":"Proceedings of the 2019 3rd International Conference on Natural Language Processing and Information Retrieval","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116298707","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kanika, S. Chakraverty, P. Chakraborty, Shikhar Agnihotri, S. Mohapatra, Prakriti Bansal
{"title":"KELDEC","authors":"Kanika, S. Chakraverty, P. Chakraborty, Shikhar Agnihotri, S. Mohapatra, Prakriti Bansal","doi":"10.1145/3342827.3342849","DOIUrl":"https://doi.org/10.1145/3342827.3342849","url":null,"abstract":"We develop an innovative personalized recommendation system called KELDEC that links the notes that students take in class with their outdoor experiences captured with camera, to suggest websites that extend their knowledge. Despite the plethora of educational recommendation systems, there is a dearth of effective tools that make evident the practical application of theory in the real world. KELDEC extracts the core learning points from class notes and distinctive labels that describe objects in a picture. It then mines the web to first extract the technical context of the picture, and subsequently culls out websites that establish linkages between notes and the picture. Response to user surveys garnered from students studying Software Engineering in the undergraduate Computer Engineering course reveal that they gain new and practical extension of classroom knowledge.","PeriodicalId":254461,"journal":{"name":"Proceedings of the 2019 3rd International Conference on Natural Language Processing and Information Retrieval","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127595522","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The Study of Learning Achievement of Learners Classified VARK Learning Style in Blended Learning","authors":"Beesuda Daoruang, Krich Sintanakul, A. Mingkhwan","doi":"10.1145/3342827.3342839","DOIUrl":"https://doi.org/10.1145/3342827.3342839","url":null,"abstract":"There are many learning methods presented. How could learners know which method is suitable for their learning style? In this paper, we have the objective to classify learning style base on the VARK model using Blended learning method on media creation, learning activities on Multimedia Design and Development subject. The research samples were 47 undergraduates from Information Technology Department who enrolled in the second semester of the academic year 2018 and selected by using the purposive random sampling method. We concluded that teaching/learning methods do not have equally achievement for the different group of learning style. In our case, the performance base plan combined with blended learning is better with the multimodal VAK.","PeriodicalId":254461,"journal":{"name":"Proceedings of the 2019 3rd International Conference on Natural Language Processing and Information Retrieval","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121690708","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Applying Deep Learning in Word Embedding for Making a Diagnosis Prediction Model from Orthopedic Clinical Note","authors":"Tanakorn Rattanajariya, K. Piromsopa","doi":"10.1145/3342827.3342848","DOIUrl":"https://doi.org/10.1145/3342827.3342848","url":null,"abstract":"We propose deep learning in word embedding for making a diagnostic prediction model. One factor that causes uncertainties in diagnostic is the inexperience of physicians. The diagnosis errors lead to incorrect and delay in treatment, waste of time and money. To solve the problem, a differential diagnosis is a critical tool. It is powerful and does not introduce additional work to physician. Our method applied a deep learning tool together with word embedding from existing diagnosis texts in medical system. The model takes the clinical notes from a physician. The note is then used to analyze the possibilities of diseases. The output is sorted by model confidence. We validate our model with True Positive Rate (Recall), False Positive Rate (Precision) and accuracy. Our model achieves a new record of accuracy at 99.95% The highest recall rate is at 86.64% in top first prediction.","PeriodicalId":254461,"journal":{"name":"Proceedings of the 2019 3rd International Conference on Natural Language Processing and Information Retrieval","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121502655","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An automated new approach in fast text classification (fastText): A case study for Turkish text classification without pre-processing","authors":"Birol Kuyumcu, Cüneyt Aksakalli, Selman Delil","doi":"10.1145/3342827.3342828","DOIUrl":"https://doi.org/10.1145/3342827.3342828","url":null,"abstract":"Any Text Classification (TC) problem need pre-processing steps which may affect the classification accuracy. Especially pre-processing steps need substantial effort particularly in agglutinative languages such as Turkish. In this context, a traditional text categorization problem requires pre-processing steps such as tokenization, stop-word removal, lower-case conversion, stemming and feature dimension reduction. Before classification, one or more of these steps are applied to text and then a classifier is trained to evaluate the corresponding precision. Deep neural network classifiers combined with word embedding is one of the solutions to eliminate the pre-processing prerequisites. Another novel approach is fastText word embedding based classifier which was developed by Facebook. In this study, we evaluate a fastText classifier on TTC-3600 Turkish dataset without using any pre-processing steps and present the performance of the algorithm.","PeriodicalId":254461,"journal":{"name":"Proceedings of the 2019 3rd International Conference on Natural Language Processing and Information Retrieval","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134526216","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pannathorn Naksung, Chayaphat Nicrothanon, Putthichot Chunjiree, Thodsaporn Chay-intr, T. Theeramunkong
{"title":"A Construction of Hybrid Structural Thai Treebank","authors":"Pannathorn Naksung, Chayaphat Nicrothanon, Putthichot Chunjiree, Thodsaporn Chay-intr, T. Theeramunkong","doi":"10.1145/3342827.3342842","DOIUrl":"https://doi.org/10.1145/3342827.3342842","url":null,"abstract":"It is possible to include complicated structures into an individual syntactic tree, to enhance the usefulness of parsed text corpus. In this part, existing works on Thai treebank construction have been developed in order to address the lack of high-level syntactic resources. However, it has yet to be sufficient for Thai Natural Language Processing. Furthermore, Thai treebanks have either syntactic or dependency structure only. This paper presents a construction of hybrid structural Thai treebank which includes both syntactic/dependency structure, a tool for conversion between constituency and dependency parse tree, and a web-based GUI for parse tree visualization. Towards the hybrid treebank construction, hundreds of constituent tree are manually annotated with predicate header to each phrase. Once the set of annotated constituent trees are obtained, the conversion procedure will be performed by determining the annotated head and its dependents. As our experiments, features of hybrid treebank are extracted and illustrated. Finally, difficulties and issues in constructing the hybrid Thai treebank are discussed.","PeriodicalId":254461,"journal":{"name":"Proceedings of the 2019 3rd International Conference on Natural Language Processing and Information Retrieval","volume":"151 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121995600","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The Evaluation of Thai Poem's Content Consistency using Siamese Network","authors":"Nuttachot Promrit, S. Waijanya, Kran Thaweesith","doi":"10.1145/3342827.3342855","DOIUrl":"https://doi.org/10.1145/3342827.3342855","url":null,"abstract":"Many research describes Textual Entailment model for compare pair of the sentence but two sentences in term of the poem content consistency are not the same. The content consistency is very important for storytelling in Thai poem composing. In this article, we propose the model and result of The evaluation of Thai poem's content consistency using The Siamese Network 3 models comprise 1) Merge Vector Model 2) Siamese Absolute Different Model and 3) Siamese Dot Vector Model compare with the Basic CNN model. The training data is Thai poem 14,173 pair (batt) and validation data is Thai poem 3,544 pair. All models learn by apply one shot learning technic. The accuracy of Siamese Absolute Different Model near 100%. The macro average of F1-score shows 99.27%. The Area Under Curve shows 0.997 near the perfect value.","PeriodicalId":254461,"journal":{"name":"Proceedings of the 2019 3rd International Conference on Natural Language Processing and Information Retrieval","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131997286","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Zero-Shot Multilingual Sentiment Analysis using Hierarchical Attentive Network and BERT","authors":"A. Sarkar, Sujeeth Reddy, Raghu Sesha Iyengar","doi":"10.1145/3342827.3342850","DOIUrl":"https://doi.org/10.1145/3342827.3342850","url":null,"abstract":"Sentiment analysis is considered an important downstream task in language modelling. We propose Hierarchical Attentive Network using BERT for document sentiment classification. We further showed that importing representation from Multiplicative LSTM model in our architecture results in faster convergence. We then propose a method to build a sentiment classifier for a language in which we have no labelled sentiment data. We exploit the possible semantic invariance across languages in the context of sentiment to achieve this.","PeriodicalId":254461,"journal":{"name":"Proceedings of the 2019 3rd International Conference on Natural Language Processing and Information Retrieval","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126346999","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Using Centroid Keywords and Word Mover's Distance for Single Document Extractive Summarization","authors":"Dauken Seitkali, R. Mussabayev","doi":"10.1145/3342827.3342852","DOIUrl":"https://doi.org/10.1145/3342827.3342852","url":null,"abstract":"This paper presents unsupervised method of single document extractive summarization. The main idea behind the method is in selecting sentences based on Word Mover's Distance Similarity between each sentence and set of centroid keywords. This approach leverages both compositional property of word embeddings and advantages of recently discovered powerful text to text distance metric. ROUGE results on DUC 2002 data set showed that quality of produced summaries can compete with well-known state of the art systems. In this work we also discuss limitations of gold summaries in evaluating quality of summarization systems.","PeriodicalId":254461,"journal":{"name":"Proceedings of the 2019 3rd International Conference on Natural Language Processing and Information Retrieval","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114631555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}