{"title":"A New Term Weight Scheme and Ensemble Technique for Authorship Identification","authors":"Hanan Alshaher, Jinsheng Xu","doi":"10.1145/3388142.3388159","DOIUrl":"https://doi.org/10.1145/3388142.3388159","url":null,"abstract":"A few of the previous studies on authorship identification have applied term weighting to features. The present study introduced a new term weight scheme, called 1/sigma, that rescales the values of a feature set to a mean of zero and a standard deviation of one. In other words, the 1/sigma scheme standardizes the values of a feature set. Three experiments showed the robustness of the proposed term weight scheme from different perspectives. These experiments showed that the proposed term weight scheme worked perfectly with different feature sets and classifiers in comparison to two popular term weight scemes: TF and TF-IDF. Furthermore, 1/sigma was shown to work successfully with the following different types of datasets: literary texts (fiction) and online messages (blogs, emails, and tweets). Although these experiments did not directly examine the effects of the numbers of documents and authors, the results indicated that these factors did not have any effects because the numbers of documents and authors vary from dataset to dataset.","PeriodicalId":409298,"journal":{"name":"Proceedings of the 2020 the 4th International Conference on Compute and Data Analysis","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125091169","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Exponential triplet loss","authors":"Ē. Urtāns, A. Ņikitenko, Valters Vecins","doi":"10.1145/3388142.3388163","DOIUrl":"https://doi.org/10.1145/3388142.3388163","url":null,"abstract":"This paper introduces a novel variant of the Triplet Loss function that converges faster and gives better results. This function can separate class instances homogeneously through the whole embedding space. With Exponential Triplet Loss function we also introduce a novel type of embedding space regularization Unit-Range and Unit-Bounce that utilizes euclidean space more efficiently and resembles features of the cosine distance. We also examined factors for choosing the best embedding vector size for specific embedding spaces. Finally, we also demonstrate how new function can train models for one-shot learning and re-identification tasks.","PeriodicalId":409298,"journal":{"name":"Proceedings of the 2020 the 4th International Conference on Compute and Data Analysis","volume":"74 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125424288","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Review of Applications of Formal Specification in Safety-Critical System Development","authors":"Emanuel S. Grant, S. P. Nanda","doi":"10.1145/3388142.3388175","DOIUrl":"https://doi.org/10.1145/3388142.3388175","url":null,"abstract":"Since the advent of the computer and computer programming there have been many attempts to improve the quality of the software systems developed. At various stages in this evolution of development techniques, processes, and methodologies, a review of the current trend in software development is conducted. One such current trend is in the realm of safety-critical system development. Safety-critical systems are characterized by the resulting potential of harm to or loss of life if such systems should fail during operation. A strategy applied in developing such systems is the use of formal specification techniques. Formal specification techniques are the application of rigorous techniques to assess the correctness of system design. The use of formal specification techniques in safety-critical system development has been in place for a number of decades and there have been multiple reviews and comparisons of the successful and failed application of formal specification techniques. This report reviews examples of the application of formal specification techniques in a number of application domains, with a focus on the types of error detection and correction associated with the particular technique. The benefit of this work is towards the assessment of the suitable of a specific formal specification technique with a particular problem domain.","PeriodicalId":409298,"journal":{"name":"Proceedings of the 2020 the 4th International Conference on Compute and Data Analysis","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125065364","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Comparative Study of Subject-Dependent and Subject-Independent Strategies for EEG-Based Emotion Recognition using LSTM Network","authors":"Debarshi Nath, Anubhav, Mrigank Singh, Divyashikha Sethia, Diksha Kalra, S. Indu","doi":"10.1145/3388142.3388167","DOIUrl":"https://doi.org/10.1145/3388142.3388167","url":null,"abstract":"This paper addresses the problem of EEG-based emotion recognition and classification and investigates the performance of classifiers for subject-independent and subject-dependent models separately. The results are compared with other classifiers and also with existing work in the concerned domain as well. We perform the experiments on the publicly available DEAP dataset with band power as the feature and classification accuracies are found pertaining to the widely accepted Valence-Arousal Model. The best results were reported by the LSTM model in case of the subject-dependent model with accuracies of 94.69% and 93.13% on valence and arousal scales respectively. SVM performed the best for the subject-independent model with accuracies of 72.19% on valence scale and 71.25% on arousal scale.","PeriodicalId":409298,"journal":{"name":"Proceedings of the 2020 the 4th International Conference on Compute and Data Analysis","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130533759","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Attack-tolerant Unequal Probability Sampling Methods over Sliding Window for Distributed Streams","authors":"Yann Busnel, Yves Tillé","doi":"10.1145/3388142.3388162","DOIUrl":"https://doi.org/10.1145/3388142.3388162","url":null,"abstract":"Distributed systems increasingly require the processing of large amounts of data, for metrology, safety or security purposes. The online processing of these large data streams requires the development of algorithms to efficiently calculate parameters. If elegant solutions have been proposed recently, their approximation is commonly calculated from the inception of the data stream. In a distributed execution context, it would be preferable to collect information only on the recent past (for resource saving or relevancy of most recent information). We therefore consider here the sliding window model. In this article, we propose a family of new sampling techniques that take into account both the sliding window model and the presence of a malicious adversary. Wayne Fuller proposed in 1970 a very ingenious method of sampling with unequal inclusion probabilities. After doing justice to this precursor paper and proposing a fast and simple implementation of it, we completely generalize Fuller's method in order to enable the use of a tuning parameter of spreading. The analytical results of these techniques show the excellent performance of the generalized pivotal approach. This generalization makes the sampling method less predictable and seems appropriate to be protected from malicious attacks when sampling from a stream.","PeriodicalId":409298,"journal":{"name":"Proceedings of the 2020 the 4th International Conference on Compute and Data Analysis","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121225281","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Detecting Active Sites in Protein 3D Structures","authors":"Jimmy Li, S. Wang","doi":"10.1145/3388142.3388151","DOIUrl":"https://doi.org/10.1145/3388142.3388151","url":null,"abstract":"Active sites in proteins are three dimensional structures appear on the surface of proteins. Drug designers often look for certain active sites that can be used to inhibit some specific pathway. Detecting active sites of proteins has been a very popular research area. Previous research efforts in this area often use the one dimensional sequence of the protein. Many approaches have been developed to identify a potential active site representing as a segment in the protein sequence. However, an active site can function only in its 3D structure when folded appropriately. In other words, a potential active site detected in the sequence still needs to be verified in the 3D structure. In this paper, we introduce an approach that takes the three dimensional structure of a protein and discovers potential active sites from the 3D structure directly.","PeriodicalId":409298,"journal":{"name":"Proceedings of the 2020 the 4th International Conference on Compute and Data Analysis","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123994877","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Developing Practical Management Support System for Regional Public Transportation Service Provided by Municipalities","authors":"Chinasa Sueyoshi, Hideya Takagi, K. Inenaga","doi":"10.1145/3388142.3388155","DOIUrl":"https://doi.org/10.1145/3388142.3388155","url":null,"abstract":"Public transportation is especially important in regions with decreasing population. In Japan, regional transportation suffers financially, because the transportation of local residents alone cannot support through taxes the costs of fixed-route public transportation. Although the situation varies between municipalities, human, and financial constrains prevent these local communities from appropriately addressing this problem. Instead, municipalities hire external traffic consultants to conduct surveys, at a significant cost. As a result, the municipalities receive regular reports from the external traffic consultant on their public transportation situation. However, despite of the large cost of this service burdened by the municipality, the results are not fully available and merely indicative of a temporary situation both in terms of quality and quantity. These results cannot be used for actually improving the improvement in bus service management through timetable revision and adjustment of the fixed route. This would instead require medium-to-long term data on the usage of their community buses. For this reason, our laboratory has been developing a practical service management support system for the regional public transportation provided by municipalities. Within this system, we developed two applications for tablets named ASHIYA and SHINGU. ASHIYA can be used for conducting simple questionnaire surveys for passengers inside community buses, whereas SHINGU records for the number of get-on or get-off passengers.","PeriodicalId":409298,"journal":{"name":"Proceedings of the 2020 the 4th International Conference on Compute and Data Analysis","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127364658","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
O. Thinnukool, Phasit Charoenkwan, P. Khuwuthyakorn, Pachara Tinamat
{"title":"Word Cloud Analysis of Customer Satisfaction in Cosmetic Products in Thailand","authors":"O. Thinnukool, Phasit Charoenkwan, P. Khuwuthyakorn, Pachara Tinamat","doi":"10.1145/3388142.3388152","DOIUrl":"https://doi.org/10.1145/3388142.3388152","url":null,"abstract":"This research aims to investigate customer satisfaction in cosmetic products by utilizing a low-cost tool, word cloud, to analyze online reviews, based on the research questions: how customers feel about each type of cosmetic products? what their feedbacks are? and which words are positive and which ones are negative? The dataset for the investigation comprises with reviews over a 4-year duration of data collection from 2015-2018, collected from popular social networking sites in Thailand including Facebook and Pantip. A hierarchical clustering approach, the Linkage algorithm, was employed in the context of text mining. The result shows that factors that influence customer satisfaction are based on customer experience affecting positive or negative words in online reviews of each product.","PeriodicalId":409298,"journal":{"name":"Proceedings of the 2020 the 4th International Conference on Compute and Data Analysis","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124386148","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Framework for Intelligent Navigation Using Latent Dirichlet Allocation on Reddit Posts About Opiates","authors":"Peter Akioyamen, Levi C Nicklas, R. Sanchez-Arias","doi":"10.1145/3388142.3388156","DOIUrl":"https://doi.org/10.1145/3388142.3388156","url":null,"abstract":"Many people look to the internet for support and assistance when faced with issues in life, particularly when these issues are related to behaviors or conditions that are stigmatized within society, generally making open discussion difficult. In this study, we utilize the unique characteristics of the news aggregation and discussion internet forum, reddit, to demonstrate the potential for text mining as an intelligent content filtering and navigation framework; we use online discussion surrounding opiates as a case study. Topic modeling is used as a text mining approach to organize and discover hidden semantic structures within reddit posts, developing a representation of a post through the topics and the words which comprise them. These characterizations may act as an intelligent navigation system of an online community, providing users the ability to actively navigate through similar posts and identify dissimilar ones based on their specific interests.","PeriodicalId":409298,"journal":{"name":"Proceedings of the 2020 the 4th International Conference on Compute and Data Analysis","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114203282","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Design of the base isolation system with artificial neural network models","authors":"Samer M. Barakat","doi":"10.1145/3388142.3388169","DOIUrl":"https://doi.org/10.1145/3388142.3388169","url":null,"abstract":"This work presents the application of the artificial neural networks (ANN) for modeling and designing Seismic-Isolation (SI) systems consisting of Natural Rubber Bearings and Viscous Fluid Dampers subject to Near-Field (NF) earthquake ground motion. Four lumped-mass stick models representing a realistic five, ten, fifteen, and 20-story base-isolated buildings are used. The key response parameters selected to represent the behavior of SI system are the Damper Force (PDF), Total Maximum Displacement (DTM), the Peak the Top Story Acceleration Ratio (TSAR) of the isolated structure compared to the fixed-base structure and the maximum amplified drift ratio (δmax). Twenty-four NF earthquake records representing two seismic hazard levels are used. The commercial analysis program SAP2000 was used to perform the Time-History Analysis (THA) of the MDOF system (stick model representing a realistic N-story base-isolated building) subject to all 24 records. Different combinations of damping coefficients (c) and damping exponents (ą) are investigated under the 24 earthquake records to develop the database of feasible combinations for the SI system. The total number of considered THA combinations is 751680 and were used for training and testing the neural network. Mathematical models for the key response parameters are established via ANN and produced acceptable results with significantly less computation. The results of this study show that ANN models can be a powerful tool to be included in the design process of Seismic-Isolation (SI) systems, especially at the preliminary stages.","PeriodicalId":409298,"journal":{"name":"Proceedings of the 2020 the 4th International Conference on Compute and Data Analysis","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114805473","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}