{"title":"Impact of Cooperative Innovation on the Technological Innovation Performance of High-Tech Firms: A Dual Moderating Effect Model of Big Data Capabilities and Policy Support.","authors":"Xianglong Li, Qingjin Wang, Renbo Shi, Xueling Wang, Kaiyun Zhang, Xiao Liu","doi":"10.1089/big.2022.0301","DOIUrl":"10.1089/big.2022.0301","url":null,"abstract":"<p><p>The mechanism of cooperative innovation (CI) for high-tech firms aims to improve their technological innovation performance. It is the effective integration of the internal and external innovation resources of these firms, along with the simultaneous reduction in the uncertainty of technological innovation and the maintenance of the comparative advantage of the firms in the competition. This study used 322 high-tech firms as our sample, which were located in 33 national innovation demonstration bases identified by the Chinese government. We implemented a multiple linear regression to test the impact of CI conducted by these high-tech firms at the level of their technological innovation performance. In addition, the study further examined the moderating effect of two boundary conditions-big data capabilities and policy support (PS)-on the main hypotheses. Our study found that high-tech firms carrying out CI can effectively improve their technological innovation performance, with big data capabilities and PS significantly enhancing the degree of this influence. The study reveals the intrinsic mechanism of the impact of CI on the technological innovation performance of high-tech firms, which, to a certain extent, expands the application context of CI and enriches the research perspective on the impact of CI on the innovation performance of firms. At the same time, the findings provide insight for how high-tech firms in the digital era can make reasonable use of data empowerment in the process of CI to achieve improved technological innovation performance.</p>","PeriodicalId":51314,"journal":{"name":"Big Data","volume":" ","pages":"63-80"},"PeriodicalIF":2.6,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10243508","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Big DataPub Date : 2024-02-01Epub Date: 2023-07-07DOI: 10.1089/big.2022.0215
Mauro Papa, Ioannis Chatzigiannakis, Aris Anagnostopoulos
{"title":"Automated Natural Language Processing-Based Supplier Discovery for Financial Services.","authors":"Mauro Papa, Ioannis Chatzigiannakis, Aris Anagnostopoulos","doi":"10.1089/big.2022.0215","DOIUrl":"10.1089/big.2022.0215","url":null,"abstract":"<p><p>Public procurement is viewed as a major market force that can be used to promote innovation and drive small and medium-sized enterprises growth. In such cases, procurement system design relies on intermediates that provide vertical linkages between suppliers and providers of innovative services and products. In this work we propose an innovative methodology for decision support in the process of supplier discovery, which precedes the final supplier selection. We focus on data gathered from community-based sources such as Reddit and Wikidata and avoid any use of historical open procurement datasets to identify small and medium sized suppliers of innovative products and services that own very little market shares. We look into a real-world procurement case study from the financial sector focusing on the Financial and Market Data offering and develop an interactive web-based support tool to address certain requirements of the Italian central bank. We demonstrate how a suitable selection of natural language processing models, such as a part-of-speech tagger and a word-embedding model, in combination with a novel named-entity-disambiguation algorithm, can efficiently analyze huge quantity of textual data, increasing the probability of a full coverage of the market.</p>","PeriodicalId":51314,"journal":{"name":"Big Data","volume":" ","pages":"30-48"},"PeriodicalIF":2.6,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9749953","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Big DataPub Date : 2024-01-01Epub Date: 2023-06-02DOI: 10.1089/big.2022.0211
Wataru Sasaki, Satoki Hamanaka, Satoko Miyahara, Kota Tsubouchi, Jin Nakazawa, Tadashi Okoshi
{"title":"Large-Scale Estimation and Analysis of Web Users' Mood from Web Search Query and Mobile Sensor Data.","authors":"Wataru Sasaki, Satoki Hamanaka, Satoko Miyahara, Kota Tsubouchi, Jin Nakazawa, Tadashi Okoshi","doi":"10.1089/big.2022.0211","DOIUrl":"10.1089/big.2022.0211","url":null,"abstract":"<p><p>The ability to estimate the current mood states of web users has considerable potential for realizing user-centric opportune services in pervasive computing. However, it is difficult to determine the data type used for such estimation and collect the ground truth of such mood states. Therefore, we built a model to estimate the mood states from search-query data in an easy-to-collect and non-invasive manner. Then, we built a model to estimate mood states from mobile sensor data as another estimation model and supplemented its output to the ground-truth label of the model estimated from search queries. This novel two-step model building contributed to boosting the performance of estimating the mood states of web users. Our system was also deployed in the commercial stack, and large-scale data analysis with >11 million users was conducted. We proposed a nationwide mood score, which bundles the mood values of users across the country. It shows the daily and weekly rhythm of people's moods and explains the ups and downs of moods during the COVID-19 pandemic, which is inversely synchronized to the number of new COVID-19 cases. It detects big news that simultaneously affects the mood states of many users, even under fine-grained time resolution, such as the order of hours. In addition, we identified a certain class of advertisements that indicated a clear tendency in the mood of the users who clicked such advertisements.</p>","PeriodicalId":51314,"journal":{"name":"Big Data","volume":" ","pages":"191-209"},"PeriodicalIF":2.6,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11304759/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9565593","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Big DataPub Date : 2024-01-01Epub Date: 2023-06-07DOI: 10.1089/big.2022.0107
Robin Van Oirbeek, Jolien Ponnet, Bart Baesens, Tim Verdonck
{"title":"Computational Efficient Approximations of the Concordance Probability in a Big Data Setting.","authors":"Robin Van Oirbeek, Jolien Ponnet, Bart Baesens, Tim Verdonck","doi":"10.1089/big.2022.0107","DOIUrl":"10.1089/big.2022.0107","url":null,"abstract":"<p><p>Performance measurement is an essential task once a statistical model is created. The area under the receiving operating characteristics curve (AUC) is the most popular measure for evaluating the quality of a binary classifier. In this case, the AUC is equal to the concordance probability, a frequently used measure to evaluate the discriminatory power of the model. Contrary to AUC, the concordance probability can also be extended to the situation with a continuous response variable. Due to the staggering size of data sets nowadays, determining this discriminatory measure requires a tremendous amount of costly computations and is hence immensely time consuming, certainly in case of a continuous response variable. Therefore, we propose two estimation methods that calculate the concordance probability in a fast and accurate way and that can be applied to both the discrete and continuous setting. Extensive simulation studies show the excellent performance and fast computing times of both estimators. Finally, experiments on two real-life data sets confirm the conclusions of the artificial simulations.</p>","PeriodicalId":51314,"journal":{"name":"Big Data","volume":" ","pages":"243-268"},"PeriodicalIF":2.6,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9592435","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Big DataPub Date : 2024-01-01Epub Date: 2023-05-16DOI: 10.1089/big.2022.0181
Oded Koren, Aviel Shamalov, Nir Perel
{"title":"Small Files Problem Resolution via Hierarchical Clustering Algorithm.","authors":"Oded Koren, Aviel Shamalov, Nir Perel","doi":"10.1089/big.2022.0181","DOIUrl":"10.1089/big.2022.0181","url":null,"abstract":"<p><p>The Small Files Problem in Hadoop Distributed File System (HDFS) is an ongoing challenge that has not yet been solved. However, various approaches have been developed to tackle the obstacles this problem creates. Properly managing the size of blocks in a file system is essential as it saves memory and computing time and may reduce bottlenecks. In this article, a new approach using a Hierarchical Clustering Algorithm is suggested for dealing with small files. The proposed method identifies the files by their structure and via a special Dendrogram analysis, and then recommends which files can be merged. As a simulation, the proposed algorithm was applied via 100 CSV files with different structures, containing 2-4 columns with different data types (integer, decimal and text). Also, 20 files that were not CSV files were created to demonstrate that the algorithm only works on CSV files. All data were analyzed via a machine learning hierarchical clustering method, and a Dendrogram was created. According to the merge process that was performed, seven files from the Dendrogram analysis were chosen as appropriate files to be merged. This reduced the memory space in the HDFS. Furthermore, the results showed that using the suggested algorithm led to efficient file management.</p>","PeriodicalId":51314,"journal":{"name":"Big Data","volume":" ","pages":"229-242"},"PeriodicalIF":2.6,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9830746","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Big DataPub Date : 2024-01-01Epub Date: 2023-08-14DOI: 10.1089/big.2022.0182
Rouzbeh Razavi, Guisen Xue, Ikpe Justice Akpan
{"title":"Predicting Sociodemographic Attributes from Mobile Usage Patterns: Applications and Privacy Implications.","authors":"Rouzbeh Razavi, Guisen Xue, Ikpe Justice Akpan","doi":"10.1089/big.2022.0182","DOIUrl":"10.1089/big.2022.0182","url":null,"abstract":"<p><p>When users interact with their mobile devices, they leave behind unique digital footprints that can be viewed as predictive proxies that reveal an array of users' characteristics, including their demographics. Predicting users' demographics based on mobile usage can provide significant benefits for service providers and users, including improving customer targeting, service personalization, and market research efforts. This study uses machine learning algorithms and mobile usage data from 235 demographically diverse users to examine the accuracy of predicting their sociodemographic attributes (age, gender, income, and education) from mobile usage metadata, filling the gap in the current literature by quantifying the predictive power of each attribute and discussing the practical applications and privacy implications. According to the results, gender can be most accurately predicted (balanced accuracy = 0.862) from mobile usage footprints, whereas predicting users' education level is more challenging (balanced accuracy = 0.719). Moreover, the classification models were able to classify users based on whether their age or income was above or below a certain threshold with acceptable accuracy. The study also presents the practical applications of inferring demographic attributes from mobile usage data and discusses the implications of the findings, such as privacy and discrimination risks, from the perspectives of different stakeholders.</p>","PeriodicalId":51314,"journal":{"name":"Big Data","volume":" ","pages":"213-228"},"PeriodicalIF":2.6,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9997249","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Improved Influence Maximization Method for Online Advertising in Social Internet of Things.","authors":"Reza Molaei, Kheirollah Rahsepar Fard, Asgarali Bouyer","doi":"10.1089/big.2023.0042","DOIUrl":"10.1089/big.2023.0042","url":null,"abstract":"<p><p>Recently, a new subject known as the Social Internet of Things (SIoT) has been presented based on the integration the Internet of Things and social network concepts. SIoT is increasingly popular in modern human living, including applications such as smart transportation, online health care systems, and viral marketing. In advertising based on SIoT, identifying the most effective diffuser nodes to maximize reach is a critical challenge. This article proposes an efficient heuristic algorithm named <i>Influence Maximization of advertisement for Social Internet of Things (IMSoT)</i>, inspired by real-world advertising. The IMSoT algorithm consists of two steps: selecting candidate objects and identifying the final seed set. In the first step, influential candidate objects are selected based on factors, such as degree, local importance value, and weak and sensitive neighbors set. In the second step, effective influence is calculated based on overlapping between candidate objects to identify the appropriate final seed set. The IMSoT algorithm ensures maximum influence and minimum overlap, reducing the spreading caused by the seed set. A unique feature of IMSoT is its focus on preventing duplicate advertising, which reduces extra costs, and considering weak objects to reach the maximum target audience. Experimental evaluations in both real-world and synthetic networks demonstrate that our algorithm outperforms other state-of-the-art algorithms in terms of paying attention to weak objects by 38%-193% and in terms of preventing duplicate advertising (reducing extra cost) by 26%-77%. Additionally, the running time of the IMSoT algorithm is shorter than other state-of-the-art algorithms.</p>","PeriodicalId":51314,"journal":{"name":"Big Data","volume":" ","pages":"173-190"},"PeriodicalIF":2.6,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9922927","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Big DataPub Date : 2023-12-01Epub Date: 2023-05-23DOI: 10.1089/big.2022.0170
Ramkumar Jayaraman, Mohammed Alshehri, Manoj Kumar, Ahed Abugabah, Surender Singh Samant, Ahmed A Mohamed
{"title":"Secure Biomedical Document Protection Framework to Ensure Privacy Through Blockchain.","authors":"Ramkumar Jayaraman, Mohammed Alshehri, Manoj Kumar, Ahed Abugabah, Surender Singh Samant, Ahmed A Mohamed","doi":"10.1089/big.2022.0170","DOIUrl":"10.1089/big.2022.0170","url":null,"abstract":"<p><p>In the recent health care era, biomedical documents play a crucial role, and they contain much evidence-based documentation associated with many stakeholders data. Protecting those confidential research documents is more difficult and effective, and a significant process in the medical-based research domain. Those bio-documentation related to health care and other relevant community-valued data are suggested by medical professionals and processed. Many traditional security mechanisms such as akteonline and Health Insurance Portability and Accountability Act (HIPAA) are used to protect the biomedical documents as they consider the problem of non-repudiation and data integrity related to the retrieval and storage of documents. Thus, there is a need for a comprehensive framework that improves protection in terms of cost and response time related to biomedical documents. In this research work, blockchain-based biomedical document protection framework (BBDPF) is proposed, which includes blockchain-based biomedical data protection (BBDP) and blockchain-based biomedical data retrieval (BBDR) algorithms. BBDP and BBDR algorithms provide consistency on the data to prevent data modification and interception of confidential data with proper data validation. Both the algorithms have strong cryptographic mechanisms to withstand post-quantum security risks, ensuring the integrity of biomedical document retrieval and non-deny of data retrieval transactions. In the performance analysis, Ethereum blockchain infrastructure is deployed BBDPF and smart contracts using Solidity language. In the performance analysis, request time and searching time are determined based on the number of request to ensure data integrity, non-repudiation, and smart contracts for the proposed hybrid model as it gets increased gradually. A modified prototype is built with a web-based interface to prove the concept and evaluate the proposed framework. The experimental results revealed that the proposed framework renders data integrity, non-repudiation, and support for smart contracts with Query Notary Service, MedRec, MedShare, and Medlock.</p>","PeriodicalId":51314,"journal":{"name":"Big Data","volume":" ","pages":"437-451"},"PeriodicalIF":4.6,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9563040","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Big DataPub Date : 2023-12-01Epub Date: 2023-03-16DOI: 10.1089/big.2022.0042
Oznur Ozaltin, Ozgur Yeniay, Abdulhamit Subasi
{"title":"OzNet: A New Deep Learning Approach for Automated Classification of COVID-19 Computed Tomography Scans.","authors":"Oznur Ozaltin, Ozgur Yeniay, Abdulhamit Subasi","doi":"10.1089/big.2022.0042","DOIUrl":"10.1089/big.2022.0042","url":null,"abstract":"<p><p>Coronavirus disease 2019 (COVID-19) is spreading rapidly around the world. Therefore, the classification of computed tomography (CT) scans alleviates the workload of experts, whose workload increased considerably during the pandemic. Convolutional neural network (CNN) architectures are successful for the classification of medical images. In this study, we have developed a new deep CNN architecture called OzNet. Moreover, we have compared it with pretrained architectures namely AlexNet, DenseNet201, GoogleNet, NASNetMobile, ResNet-50, SqueezeNet, and VGG-16. In addition, we have compared the classification success of three preprocessing methods with raw CT scans. We have not only classified the raw CT scans, but also have performed the classification with three different preprocessing methods, which are discrete wavelet transform (DWT), intensity adjustment, and gray to color red, green, blue image conversion on the data sets. Furthermore, it is known that the architecture's performance increases with the use of DWT preprocessing method rather than using the raw data set. The results are extremely promising with the CNN algorithms using the COVID-19 CT scans processed with the DWT. The proposed DWT-OzNet has achieved a high classification performance of more than 98.8% for each calculated metric.</p>","PeriodicalId":51314,"journal":{"name":"Big Data","volume":" ","pages":"420-436"},"PeriodicalIF":4.6,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9129822","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}