Shekoofe Bostan;Ali Mohammad Zareh Bidoki;Mohammad-Reza Pajoohan
{"title":"Improving Ranking Using Hybrid Custom Embedding Models on Persian Web","authors":"Shekoofe Bostan;Ali Mohammad Zareh Bidoki;Mohammad-Reza Pajoohan","doi":"10.13052/jwe1540-9589.2253","DOIUrl":"10.13052/jwe1540-9589.2253","url":null,"abstract":"Ranking plays a crucial role in information retrieval systems, especially in the context of web search engines. This article presents a new ranking approach that utilizes semantic vectors and embedding models to enhance the accuracy of web document ranking, particularly in languages with complex structures like Persian. The article utilizes two real-world datasets, one obtained through web crawling to collect a large-scale Persian web corpus, and the other consisting of real user queries and web documents labeled with a relevancy score. The datasets are used to train embedding models using a combination of static Word2Vec and dynamic BERT algorithms. The proposed hybrid ranking formula incorporates these semantic vectors and presents a novel approach to document ranking called HybridMaxSim. Experiments conducted indicate that the HybridMaxSim formula is effective in enhancing the precision of web document ranking up to 0.87 according to the nDCG criterion.","PeriodicalId":49952,"journal":{"name":"Journal of Web Engineering","volume":"22 5","pages":"797-820"},"PeriodicalIF":0.8,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10374421","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138949356","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Development of Web Content for Music Education Using AR Human Facial Recognition Technology","authors":"Eunee Park","doi":"10.13052/jwe1540-9589.2252","DOIUrl":"10.13052/jwe1540-9589.2252","url":null,"abstract":"As the media market changes rapidly, market demand is increasing for content that can be consumed on web platforms. It's required to produce differentiated web content that can attract viewers' interest. In order to increase the productivity and efficiency of content creation, cases of content production using AR engines are increasing. This study has a development environment in which parametrics and muscle-based model techniques are mixed. The faces of famous Western classical musicians, such as Mozart, Beethoven, Chopin and List are created as 3D characters and augmented on human's face based on facial recognition technology in this study. It analyzes and traces the changed of facial expression of each person, then apply to 3D character's facial expression in real-time. Each person who augmented musicians' faces can become those who lived in different times, deliver information and communicate with viewers of the present era based on the music educational scripts. This study presents a new direction for video production required in the media market.","PeriodicalId":49952,"journal":{"name":"Journal of Web Engineering","volume":"22 5","pages":"783-796"},"PeriodicalIF":0.8,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10374420","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138952371","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Machine Learning Approaches for Fake Reviews Detection: A Systematic Literature Review","authors":"Mohammed Ennaouri;Ahmed Zellou","doi":"10.13052/jwe1540-9589.2254","DOIUrl":"10.13052/jwe1540-9589.2254","url":null,"abstract":"These days, most people refer to user reviews to purchase an online product. Unfortunately, spammers exploit this situation by posting deceptive reviews and misleading consumers either to promote a product with poor quality or to demote a brand and damage its reputation. Among the solutions to this problem is human verification. Unfortunately, the real-time nature of fake reviews makes the task more difficult, especially on e-commerce platforms. The purpose of this study is to conduct a systematic literature review to analyze solutions put out by researchers who have worked on setting up an automatic and efficient framework to identify fake reviews, unsolved problems in the domain, and the future research direction. Our findings emphasize the importance of the use of certain features and provide researchers and practitioners with insights on proposed solutions and their limitations. Thus, the findings of the study reveals that most approaches focus on sentiment analysis, opinion mining and, in particular, machine learning (ML), which contributes to the development of more powerful models that can significantly solve the problem and thus enhance further the accuracy and efficiency of detecting fake reviews.","PeriodicalId":49952,"journal":{"name":"Journal of Web Engineering","volume":"22 5","pages":"821-848"},"PeriodicalIF":0.8,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10374425","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138953298","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"HCNN-LSTM: Hybrid Convolutional Neural Network with Long Short-Term Memory Integrated for Legitimate Web Prediction","authors":"Candra Zonyfar;Jung-Been Lee;Jeong-Dong Kim","doi":"10.13052/jwe1540-9589.2251","DOIUrl":"10.13052/jwe1540-9589.2251","url":null,"abstract":"Phishing techniques are the most frequently used threat by attackers to deceive Internet users and obtain sensitive victim information, such as login credentials and credit card numbers. So, it is important for users to know the legitimate website to avoid the traps of fake websites. However, it is difficult for lay users to distinguish legitimate websites, considering that phishing techniques are always developing from time to time. Therefore, a legitimate website detection system is an easy way for users to avoid phishing websites. To address this problem, we present a hybrid deep learning model by combining a convolution neural network and long short-term memory (HCNN-LSTM). A one-dimensional CNN with a LSTM network shared estimation of all sublayers, then implements the proposed model in the benchmark dataset for phishing prediction, which consists of 11430 URLs with 87 attributes extracted of which 56 parameters are selected from URL structure and syntax. The HCNN-LSTM model was successful in binary classification with accuracy, precision, recall, and F1-score of 95.19%, 95.00%, 95.00%, 95.00%, successively outperforming the CNN and LSTM. Thus, the results show that our proposed model is a competitive new model for the legitimate web prediction tasks.","PeriodicalId":49952,"journal":{"name":"Journal of Web Engineering","volume":"22 5","pages":"757-782"},"PeriodicalIF":0.8,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10374423","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138950485","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Proposed Secure Hypertext Model in Web Engineering","authors":"Madhuri N. Gedam;Bandu B. Meshram","doi":"10.13052/jwe1540-9589.2241","DOIUrl":"https://doi.org/10.13052/jwe1540-9589.2241","url":null,"abstract":"Secure web application development is one of the prime challenges for the software industry. In the last decade, web applications have rapidly developed but web engineering methods have some limitations while designing web applications. The extensive literature survey explores various concepts like web engineering, hypertext modelling, web applications hypertext modelling methods, attacks on web applications, same origin policy (SOP) and cross origin resource sharing (CORS). The complexity of web pages is a major concern for security. The proposed secure hypertext model (SHM) provides hypertext modelling of web applications and helps in the identification of attacks on hypertext links. It provides security stereotypes and precisely specifies vulnerability defences in web application design. This standardized attack vector and defence mechanism will help developers to build more secure applications.","PeriodicalId":49952,"journal":{"name":"Journal of Web Engineering","volume":"22 4","pages":"575-596"},"PeriodicalIF":0.8,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"71903044","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Metaheuristic Aided Improved LSTM for Multi-document Summarization: A Hybrid Optimization Model","authors":"Sunilkumar Ketineni;Sheela J","doi":"10.13052/jwe1540-9589.2246","DOIUrl":"https://doi.org/10.13052/jwe1540-9589.2246","url":null,"abstract":"Multi-document summarization (MDS) is an automated process designed to extract information from various texts that have been written regarding the same subject. Here, we present a generic, extractive, MDS approach that employs steps like preprocessing, feature extraction, score generation, and summarization. The input text goes preprocessing steps such as lemmatization, stemming, and tokenization in the first stage. After preprocessing, features are extracted, including improved semantic similarity-based features, term frequency-inverse document frequency (TF-IDF-based features), and thematic-based features. Finally, an improved LSTM model will be proposed to summarize the document based on the scores considered under the objectives such as content coverage and redundancy reduction. The Blue Monkey Integrated Coot Optimization (BMICO) algorithm is proposed in this paper for fine-tuning the optimal weight of the LSTM model that ensures precise summarization. Finally, the suggested BMICO's effectiveness is evaluated, and the outcome is successfully verified.","PeriodicalId":49952,"journal":{"name":"Journal of Web Engineering","volume":"22 4","pages":"701-730"},"PeriodicalIF":0.8,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"71903507","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Web-based Non-contact Edge Computing Solution for Suspected COVID-19 Infection Classification Model","authors":"Tae-Ho Hwang;KangYoon Lee","doi":"10.13052/jwe1540-9589.2242","DOIUrl":"https://doi.org/10.13052/jwe1540-9589.2242","url":null,"abstract":"The recent outbreak of the COVID-19 coronavirus pandemic has necessitated the development of web-based, non-contact edge analytics solutions. Non-contact sensors serve as the interface between web servers and edge analytics through web engineering technology. The need for an edge device classification model that can identify COVID-19 patients based on early symptoms has become evident. In particular a non-contact implementation of such a classification model is required to efficiently prevent viral infection and minimize cross-infection. In this work, we investigate the use of diverse non-contact biosensors (e.g., remote photoplethysmography, radar, and infrared sensors) for reducing effective physical contact with patients and for measuring their biometric data and vital signs. We further explain a classification method for suspected COVID-19 infection based on the measured vital signs and symptoms. The results of this study can be applied in patient classification by mobile-based edge computing applications. The correlation between symptoms comprising cough, sore throat, fever, headache, myalgia, and arthralgia are analyzed in the model. We implement a machine learning classification model using vital signs for performance evaluation, and propose an ensemble model realized by fine-tuning the high-performing classification models. The proposed ensemble model successfully distinguishes suspected patients with an accuracy, area under curve, and F1 scores of 94.4%, 98.4%, and 94.4%, respectively.","PeriodicalId":49952,"journal":{"name":"Journal of Web Engineering","volume":"22 4","pages":"597-613"},"PeriodicalIF":0.8,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"71903045","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Youngmo Kim;Seok-Yoon Kim;Chyapol Kamyod;Byeongchan Park
{"title":"Proposition of Rubustness Indicators for Immersive Content Filtering","authors":"Youngmo Kim;Seok-Yoon Kim;Chyapol Kamyod;Byeongchan Park","doi":"10.13052/jwe1540-9589.2247","DOIUrl":"https://doi.org/10.13052/jwe1540-9589.2247","url":null,"abstract":"With the full-fledged service of mobile carrier 5G networks, it is possible to use large-capacity, immersive content at high speed anytime, anywhere. It can be illegally distributed in web-hard and torrents through DRM dismantling and various transformation attacks; however, evaluation indicators that can objectively evaluate the filtering performance for copyright protection are required. Since applying existing 2D filtering techniques to immersive content directly is not possible, in this paper we propose a set of robustness indicators for immersive content. The proposed indicators modify and enlarge the existing 2D video robustness indicators to consider the projection and reproduction method, which are the characteristics of immersive content. A performance evaluation experiment has been carried out for a sample filtering system and it is verified that an excellent recognition rate of 95% or more is achieved in about 3 s of execution time.","PeriodicalId":49952,"journal":{"name":"Journal of Web Engineering","volume":"22 4","pages":"731-755"},"PeriodicalIF":0.8,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"71903508","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Embedding a Microblog Context in Ephemeral Queries for Document Retrieval","authors":"Shilpa Sethi","doi":"10.13052/jwe1540-9589.2245","DOIUrl":"https://doi.org/10.13052/jwe1540-9589.2245","url":null,"abstract":"With the proliferation of information globally, the search engine had become an indispensable tool that helps the user to search for information in a simple, easy and quick way. These search engines employ sophisticated document ranking algorithms based on query context, link structure and user behavior characterization. However, all these features keep changing in the real scenario. Ideally, ranking algorithms must be robust enough to time-sensitive queries. Microblog content is typically short-lived as it is often intended to provide quick updates or share brief information in a concise manner. The technique first determines if a query is currently in high demand, then it automatically appends a time-sensitive context to the query by mining those microblogs whose torrent matches with query-in-demand. The extracted contextual terms are further used in re-ranking the search results. The experimental results reveal the existence of a strong correlation between ephemeral search queries and microblog volumes. These volumes are analyzed to identify the temporal proximity of their torrents. It is observed that approximately 70% of search torrents occurred one day before or after blog torrents for lower threshold values. When the threshold is increased, the match ratio of torrent is raised to ~90%. In addition, the performance of the proposed model is analyzed for different combining principles namely, aggregate relevance (AR) and disjunctive relevance (DR). It is found that the DR variant of the proposed model outperforms the AR variant of the proposed model in terms of relevance and interest scores. Further, the proposed model's performance is compared with three categories of retrieval models: log-logistic model, sequential dependence model (SDM) and embedding based query expansion model (EQE1). The experimental results reveal the effectiveness of the proposed technique in terms of result relevancy and user satisfaction. There is a significant improvement of ~25% in the result relevance score and ~35% in the user satisfaction score compared to underlying retrieval models. The work can be expanded in many directions in the future as various researchers can combine these strategies to build a recommendation system, auto query reformulation system, Chatbot, and NLP professional toolkit.","PeriodicalId":49952,"journal":{"name":"Journal of Web Engineering","volume":"22 4","pages":"679-700"},"PeriodicalIF":0.8,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"71903509","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Federated Latent Dirichlet Allocation for User Preference Mining","authors":"Xing Wu;Yushun Fan;Jia Zhang;Zhenfeng Gao","doi":"10.13052/jwe1540-9589.2244","DOIUrl":"https://doi.org/10.13052/jwe1540-9589.2244","url":null,"abstract":"In the field of Web services computing, a recent demand trend is to mine user preferences based on user requirements when creating Web service compositions, in order to meet comprehensive and ever evolving user needs. Machine learning methods such as the latent Dirichlet allocation (LDA) have been applied for user preference mining. However, training a high-quality LDA model typically requires large amounts of data. With the prevalence of government regulations and laws and the enhancement of people's awareness of privacy protection, the traditional way of collecting user data on a central server is no longer applicable. Therefore, it is necessary to design a privacy-preserving method to train an LDA model without massive collecting or leaking data. In this paper, we present novel federated LDA techniques to learn user preferences in the Web service ecosystem. On the basis of a user-level distributed LDA algorithm, we establish two federated LDA models in charge of two-layer training scenarios: a centralized synchronous federated LDA (CSFed-LDA) for synchronous scenarios and a decentralized asynchronous federated LDA (DAFed-LDA) for asynchronous ones. In the former CSFed-LDA model, an importance-based partially homomorphic encryption (IPHE) technique is developed to protect privacy in an efficient manner. In the latter DAFed-LDA model, blockchain technology is incorporated and a multi-channel-based authority control scheme (MCACS) is designed to enhance data security. Extensive experiments over a real-world dataset ProgrammableWeb.com have demonstrated the model performance, security assurance and training speed of our approach.","PeriodicalId":49952,"journal":{"name":"Journal of Web Engineering","volume":"22 4","pages":"639-677"},"PeriodicalIF":0.8,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"71903510","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}