M. Nguyen, Takuma Nakajima, Masato Yoshimi, N. Thoai
{"title":"Analyzing and Predicting the Popularity of Online Contents","authors":"M. Nguyen, Takuma Nakajima, Masato Yoshimi, N. Thoai","doi":"10.1145/3366030.3366047","DOIUrl":"https://doi.org/10.1145/3366030.3366047","url":null,"abstract":"With the rapid growth of Internet technology and infrastructure, we have entered the era of data explosion. Following this is the emergence of social networks, which have brought an enormous and ever-growing amount of online content into our digital world. Knowing precisely the popularity of online contents is of great importance for developing advanced caching algorithms as well as content distribution strategies. In this study, we provide some crucial insights into the characteristics of online content popularity over time in different locations and propose a simple predictive model to estimate the popularity of online contents in particular periods. By experiencing with the real datasets of MovieLens and Youtube, our model not only achieves considerable accuracy but also shows an impressive reduction in computation time, from 80 to 250 times faster comparing to some baseline methods. At last, we also provide the potentials and limitations of our model in practice.","PeriodicalId":446280,"journal":{"name":"Proceedings of the 21st International Conference on Information Integration and Web-based Applications & Services","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126187813","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Damir Nesic, Jad El-khoury, Jonas Westman, M. Nyberg
{"title":"Building a Web-Based Federated Toolchain: Lessons Learned From a Four-Year Industrial Project","authors":"Damir Nesic, Jad El-khoury, Jonas Westman, M. Nyberg","doi":"10.1145/3366030.3366043","DOIUrl":"https://doi.org/10.1145/3366030.3366043","url":null,"abstract":"Big companies use many tools, jointly referred to as the toolchain, to manage vast amounts of engineering data being generated across an application lifecycle. Individual tools are typically designed to perform specific engineering tasks, and rely on specific data formats. This leads to problems when attempting to automate engineering tasks that are not supported by a particular tool, and which require data from multiple tools. This paper presents the experiences and lessons learned from an industrial research-project within the heavy vehicle manufacturer Scania, where the project goal was to identify and industrialize technologies and principles that solve the above problem. The presented lessons cover architectural, technological, and organizational aspects of a toolchain development-process. In addition, as a consequence of the lessons learned, the toolchain architecture and tool-interface architecture is also presented.","PeriodicalId":446280,"journal":{"name":"Proceedings of the 21st International Conference on Information Integration and Web-based Applications & Services","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128387999","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
G. A. Schreiner, Ronan Knob, Denio Duarte, Patrícia Vilain, R. Mello
{"title":"NewSQL Through the Looking Glass","authors":"G. A. Schreiner, Ronan Knob, Denio Duarte, Patrícia Vilain, R. Mello","doi":"10.1145/3366030.3366080","DOIUrl":"https://doi.org/10.1145/3366030.3366080","url":null,"abstract":"Several applications require to handle large and heterogeneous data volumes as well as thousands of OLTP transactions per second. Traditional relational databases are not suitable for these requirements. On the other hand, NoSQL databases are able to deal with Big Data, but lacks the support to the traditional ACID properties. NewSQL is a new class of databases that combines the support to OLTP transactions of relational databases with the high availability and scalability of NoSQL databases. However, few works in the literature explore the differences among different NewSQL solutions. In this paper, we discuss the main features of the most prominent NewSQL products, besides we present benchmarking results for analyzing their performance. We believe that both analysis can be useful as a guide to a future choice of NewSQL technologies.","PeriodicalId":446280,"journal":{"name":"Proceedings of the 21st International Conference on Information Integration and Web-based Applications & Services","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125699661","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Study on Characterizing the Ecosystem of Monetizing Video Spams on YouTube Platform","authors":"Alpika Tripathi, K. Bharti, Mohona Ghosh","doi":"10.1145/3366030.3366078","DOIUrl":"https://doi.org/10.1145/3366030.3366078","url":null,"abstract":"In this work, we specifically study YouTube videos promoting easy money gaining tricks. We observe a noticeable presence of such videos on YouTube that are being used to - 1) mislead users into sharing their personal identifiable information, i.e., phone numbers publically or 2) direct users away from YouTube on sites causing negative user experience. For our study, we call them - phone spam and link spam based videos respectively. We provide a detailed analysis of the behavioral characteristics of these two categories of videos based on metadata content and graph based features. For our study, a total of 80 phone spam videos and 2000 link spam videos were collected and analyzed. Our analysis shows that such videos are highly related to each other, and specific motifs discriminating such videos exist. To our knowledge this is the first attempt towards characterizing such types of videos on YouTube.","PeriodicalId":446280,"journal":{"name":"Proceedings of the 21st International Conference on Information Integration and Web-based Applications & Services","volume":"79 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130782124","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Named Entity Recognition for Biomedical Patent Text using Bi-LSTM Variants","authors":"Farag Saad","doi":"10.1145/3366030.3366104","DOIUrl":"https://doi.org/10.1145/3366030.3366104","url":null,"abstract":"Recent years have shown a substantial increase in biomedical publications (patents or scientific articles) that are multiplying at a daily pace. This has led to an increased interest in the extraction of meaningful information (e.g., named entities) from these publications. Traditional NER approaches demand a considerable level of engineering skills and domain expertise in designing rules and features for better algorithm accuracy. In addition, due to the structure and linguistic complexity of the patent text, constructing such rules and features is often a challenging task. In this paper, we investigate various variants of the Bi-LSTM model performance for NER task based on features generated automatically from an unlabelled genes and proteins patent corpora. The proposed model is able to capture the context representation of an input sequence and globally assign the related labels for each token. The CHARS-Bi-LSTM-EMA variant yielded the best performance and significantly outperformed the state-of-the art approach.","PeriodicalId":446280,"journal":{"name":"Proceedings of the 21st International Conference on Information Integration and Web-based Applications & Services","volume":"144 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123250372","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The Synergy of Simulation and Time Series Forecasting for Live Performance Testing of Smart Buildings","authors":"Elena Markoska, S. Lazarova-Molnar","doi":"10.1145/3366030.3366093","DOIUrl":"https://doi.org/10.1145/3366030.3366093","url":null,"abstract":"Differences in requirements for reliability in buildings imply the different needs for calculation of expected building behaviour. In this paper we examine four techniques for calculating expected behaviour of buildings. Two of them are simulation techniques, namely, a white box EnergyPlus model and a æ static tool as per the requirements of the Danish government. The other two are machine learning techniques, namely an ARIMA model, and an long short-term memory artificial recurrent neural network, used in deep learning. We compare and contrast these four techniques based on their accuracy of forecast, as well as execution time to forecast a new data point. Furthermore, we provide an algorithm for selection of forecasting technique based on terms such as availability, accuracy, and execution time requirements, to facilitate real time threshold generation in light of building performance testing.","PeriodicalId":446280,"journal":{"name":"Proceedings of the 21st International Conference on Information Integration and Web-based Applications & Services","volume":"107 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122432342","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Rule-Based Inquiry Service to Elderly at Home for Efficient Mind Sensing","authors":"H. Maeda, S. Saiki, Masahide Nakamura, K. Yasuda","doi":"10.1145/3366030.3366114","DOIUrl":"https://doi.org/10.1145/3366030.3366114","url":null,"abstract":"To support in-home long-term care, we are studying techniques of Mind Sensing, which externalizes internal states of elderly people as words through conversations with agents or robots. We have previously developed a prototype system of Mind Sensing, integrated with an activity recognition system and an LINE chatbot. However, the system was tightly coupled with the fixed systems, it was difficult to add or change the setting of questions from the chatbot to individual elderly people. In this paper, we propose the Mind Sensing Service, which allows a service operator to define and manage the questions flexibly, and to automate the delivery of the questions and the collection of the answers. The proposed service consist of two elements: actions and rules. An action defines the contents of specific questions such as what message is sent to which elderly people. A rule defines the conditions on when, where, and by what event, in order to execute the action. The proposed service makes it possible to implement more systematic and flexible Mind Sensing.","PeriodicalId":446280,"journal":{"name":"Proceedings of the 21st International Conference on Information Integration and Web-based Applications & Services","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115489379","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Building Classifier Models for on-off Javanese Character Recognition","authors":"Lucia D. Krisnawati, Aditya W. Mahastama","doi":"10.1145/3366030.3366050","DOIUrl":"https://doi.org/10.1145/3366030.3366050","url":null,"abstract":"In this paper, we demostrated the building process of four classifier models as a part of an on-off character recognition system for Javanese characters. As Javanese character is no longer used in everyday writing and books, the dataset were collected by scanning the historical manuscripts and a reading lesson book. The rough dataset comprises 15.414 annotated characters and 633 classes. However, only 162 classes have sufficient data samples to be the training and testing one. Using this dataset, we measured the performance of four classifiers, namely k-NN, LDA, SVM, and Gaussian NB on the accuracy, micro-averaged precision, micro-averaged sensitivity and weighted-averaged precision and sensitivity metrices. The experiment shows that k-NN outperforms any other classifiers almost in most metrices, while SVM suffers the poorest performance. The research byproduct worth mentioning here is that it has identified 633 classes of distinct Javanese characters which comprise both common characters and compound characters found in modern Javanese writing as well as the archaic characters found in the literary works only.","PeriodicalId":446280,"journal":{"name":"Proceedings of the 21st International Conference on Information Integration and Web-based Applications & Services","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116412388","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Caio Libânio Melo Jerônimo, L. Marinho, C. E. Campelo, Adriano Veloso, A. S. C. Melo
{"title":"Fake News Classification Based on Subjective Language","authors":"Caio Libânio Melo Jerônimo, L. Marinho, C. E. Campelo, Adriano Veloso, A. S. C. Melo","doi":"10.1145/3366030.3366039","DOIUrl":"https://doi.org/10.1145/3366030.3366039","url":null,"abstract":"While many works investigate spread patterns of fake news in social networks, we focus on the textual content. Instead of relying on syntactic representations of documents (aka Bag of Words) as many works do, we seek more robust representations that may better differentiate fake from legitimate news. We propose to consider the subjectivity of news under the assumption that the subjectivity levels of legitimate and fake news are significantly different. For computing the subjectivity level of news, we rely on a set subjectivity lexicons built by Brazilian linguists. We then build subjectivity feature vectors for each news article by calculating the Word Mover's Distance (WMD) between the news and these lexicons considering the embedding the news words lie in, in order to classify the documents. The results demonstrate that our method is more robust than classical text classification approaches, especially in scenarios where training and test domains are different.","PeriodicalId":446280,"journal":{"name":"Proceedings of the 21st International Conference on Information Integration and Web-based Applications & Services","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114566212","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"ToT for CSV: accessing open data CSV files through SQL","authors":"Yasushi Doi, Motomichi Toyama","doi":"10.1145/3366030.3366130","DOIUrl":"https://doi.org/10.1145/3366030.3366130","url":null,"abstract":"Recently, the push for open data has been very strong, and more and more sources, such as governments are sharing data such as weather records or demographic statistics. The Remote Table Access (RTA) system allows the easy publication of data from a relational database, and its use through SQL-like queries by remote users. Still, the data currently being shared as open data comes in many formats, not always directly integrable with relational databases, and many sources publish data as raw CSV, XML or even PDF files. These files then need to be downloaded, parsed, and integrated with the final user's data, often in a relational database. In this work, we present Table on Top (ToT) for CSV, an extension of the RTA system that allows the easy publication and access of data contained in CSV files through RTA.","PeriodicalId":446280,"journal":{"name":"Proceedings of the 21st International Conference on Information Integration and Web-based Applications & Services","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128433292","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}