Frontiers of Computer Science最新文献_第8页

A robust optimization method for label noisy datasets based on adaptive threshold: Adaptive-k 基于自适应阈值的标签噪声数据集鲁棒性优化方法自适应-k

IF 4.2 3区计算机科学

Frontiers of Computer Science Pub Date : 2023-12-16 DOI: 10.1007/s11704-023-2430-4

Enes Dedeoglu, Himmet Toprak Kesgin, Mehmet Fatih Amasyali

{"title":"A robust optimization method for label noisy datasets based on adaptive threshold: Adaptive-k","authors":"Enes Dedeoglu, Himmet Toprak Kesgin, Mehmet Fatih Amasyali","doi":"10.1007/s11704-023-2430-4","DOIUrl":"https://doi.org/10.1007/s11704-023-2430-4","url":null,"abstract":"The use of all samples in the optimization process does not produce robust results in datasets with label noise. Because the gradients calculated according to the losses of the noisy samples cause the optimization process to go in the wrong direction. In this paper, we recommend using samples with loss less than a threshold determined during the optimization, instead of using all samples in the mini-batch. Our proposed method, Adaptive-k, aims to exclude label noise samples from the optimization process and make the process robust. On noisy datasets, we found that using a threshold-based approach, such as Adaptive-k, produces better results than using all samples or a fixed number of low-loss samples in the mini-batch. On the basis of our theoretical analysis and experimental results, we show that the Adaptive-k method is closest to the performance of the Oracle, in which noisy samples are entirely removed from the dataset. Adaptive-k is a simple but effective method. It does not require prior knowledge of the noise ratio of the dataset, does not require additional model training, and does not increase training time significantly. In the experiments, we also show that Adaptive-k is compatible with different optimizers such as SGD, SGDM, and Adam. The code for Adaptive-k is available at GitHub.","PeriodicalId":12640,"journal":{"name":"Frontiers of Computer Science","volume":"104 1","pages":""},"PeriodicalIF":4.2,"publicationDate":"2023-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138681629","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Gria: an efficient deterministic concurrency control protocol Gria：高效的确定性并发控制协议

IF 4.2 3区计算机科学

Frontiers of Computer Science Pub Date : 2023-12-16 DOI: 10.1007/s11704-023-2605-z

Xinyuan Wang, Yun Peng, Hejiao Huang

{"title":"Gria: an efficient deterministic concurrency control protocol","authors":"Xinyuan Wang, Yun Peng, Hejiao Huang","doi":"10.1007/s11704-023-2605-z","DOIUrl":"https://doi.org/10.1007/s11704-023-2605-z","url":null,"abstract":"Deterministic databases are able to reduce coordination costs in a replication. This property has fostered a significant interest in the design of efficient deterministic concurrency control protocols. However, the state-of-the-art deterministic concurrency control protocol Aria has three issues. First, it is impractical to configure a suitable batch size when the read-write set is unknown. Second, Aria running in low-concurrency scenarios, e.g., a single-thread scenario, suffers from the same conflicts as running in high-concurrency scenarios. Third, the single-version schema brings write-after-write conflicts.To address these issues, we propose Gria, an efficient deterministic concurrency control protocol. Gria has the following properties. First, the batch size of Gria is auto-scaling. Second, Gria’s conflict probability in low-concurrency scenarios is lower than that in high-concurrency scenarios. Third, Gria has no write-after-write conflicts by adopting a multi-version structure. To further reduce conflicts, we propose two optimizations: a reordering mechanism as well as a rechecking strategy. The evaluation result on two popular benchmarks shows that Gria outperforms Aria by 13x.","PeriodicalId":12640,"journal":{"name":"Frontiers of Computer Science","volume":"5 1","pages":""},"PeriodicalIF":4.2,"publicationDate":"2023-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138681630","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Density estimation-based method to determine sample size for random sample partition of big data 基于密度估计的方法确定大数据随机抽样分区的样本量

IF 4.2 3区计算机科学

Frontiers of Computer Science Pub Date : 2023-12-16 DOI: 10.1007/s11704-023-2356-x

{"title":"Density estimation-based method to determine sample size for random sample partition of big data","authors":"","doi":"10.1007/s11704-023-2356-x","DOIUrl":"https://doi.org/10.1007/s11704-023-2356-x","url":null,"abstract":"<h3>Abstract</h3> Random sample partition (RSP) is a newly developed big data representation and management model to deal with big data approximate computation problems. Academic research and practical applications have confirmed that RSP is an efficient solution for big data processing and analysis. However, a challenge for implementing RSP is determining an appropriate sample size for RSP data blocks. While a large sample size increases the burden of big data computation, a small size will lead to insufficient distribution information for RSP data blocks. To address this problem, this paper presents a novel density estimation-based method (DEM) to determine the optimal sample size for RSP data blocks. First, a theoretical sample size is calculated based on the multivariate Dvoretzky-Kiefer-Wolfowitz (DKW) inequality by using the fixed-point iteration (FPI) method. Second, a practical sample size is determined by minimizing the validation error of a kernel density estimator (KDE) constructed on RSP data blocks for an increasing sample size. Finally, a series of persuasive experiments are conducted to validate the feasibility, rationality, and effectiveness of DEM. Experimental results show that (1) the iteration function of the FPI method is convergent for calculating the theoretical sample size from the multivariate DKW inequality; (2) the KDE constructed on RSP data blocks with sample size determined by DEM can yield a good approximation of the probability density function (p.d.f.); and (3) DEM provides more accurate sample sizes than the existing sample size determination methods from the perspective of p.d.f. estimation. This demonstrates that DEM is a viable approach to deal with the sample size determination problem for big data RSP implementation.","PeriodicalId":12640,"journal":{"name":"Frontiers of Computer Science","volume":"60 1","pages":""},"PeriodicalIF":4.2,"publicationDate":"2023-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138681701","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Minimizing the cost of periodically replicated systems via model and quantitative analysis 通过模型和定量分析使周期性复制系统的成本最小化

IF 4.2 3区计算机科学

Frontiers of Computer Science Pub Date : 2023-12-16 DOI: 10.1007/s11704-023-2625-8

Chenhao Zhang, Liang Wang, Limin Xiao, Shixuan Jiang, Meng Han, Jinquan Wang, Bing Wei, Guangjun Qin

{"title":"Minimizing the cost of periodically replicated systems via model and quantitative analysis","authors":"Chenhao Zhang, Liang Wang, Limin Xiao, Shixuan Jiang, Meng Han, Jinquan Wang, Bing Wei, Guangjun Qin","doi":"10.1007/s11704-023-2625-8","DOIUrl":"https://doi.org/10.1007/s11704-023-2625-8","url":null,"abstract":"Geographically replicating objects across multiple data centers improves the performance and reliability of cloud storage systems. Maintaining consistent replicas comes with high synchronization costs, as it faces more expensive WAN transport prices and increased latency. Periodic replication is the widely used technique to reduce the synchronization costs. Periodic replication strategies in existing cloud storage systems are too static to handle traffic changes, which indicates that they are inflexible in the face of unforeseen loads, resulting in additional synchronization cost. We propose quantitative analysis models to quantify consistency and synchronization cost for periodically replicated systems, and derive the optimal synchronization period to achieve the best tradeoff between consistency and synchronization cost. Based on this, we propose a dynamic periodic synchronization method, Sync-Opt, which allows systems to set the optimal synchronization period according to the variable load in clouds to minimize the synchronization cost. Simulation results demonstrate the effectiveness of our models. Compared with the policies widely used in modern cloud storage systems, the Sync-Opt strategy significantly reduces the synchronization cost.","PeriodicalId":12640,"journal":{"name":"Frontiers of Computer Science","volume":"25 1","pages":""},"PeriodicalIF":4.2,"publicationDate":"2023-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138681627","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Index-free triangle-based graph local clustering 基于无索引三角形的图形局部聚类

IF 4.2 3区计算机科学

Frontiers of Computer Science Pub Date : 2023-12-13 DOI: 10.1007/s11704-023-2768-7

Zhe Yuan, Zhewei Wei, Fangrui Lv, Ji-Rong Wen

引用次数: 0

Constrained clustering with weak label prior 弱标签先验的受限聚类

IF 4.2 3区计算机科学

Frontiers of Computer Science Pub Date : 2023-12-13 DOI: 10.1007/s11704-023-3355-7

Jing Zhang, Ruidong Fan, Hong Tao, Jiacheng Jiang, Chenping Hou

{"title":"Constrained clustering with weak label prior","authors":"Jing Zhang, Ruidong Fan, Hong Tao, Jiacheng Jiang, Chenping Hou","doi":"10.1007/s11704-023-3355-7","DOIUrl":"https://doi.org/10.1007/s11704-023-3355-7","url":null,"abstract":"Clustering is widely exploited in data mining. It has been proved that embedding weak label prior into clustering is effective to promote its performance. Previous researches mainly focus on only one type of prior. However, in many real scenarios, two kinds of weak label prior information, e.g., pairwise constraints and cluster ratio, are easily obtained or already available. How to incorporate them to improve clustering performance is important but rarely studied. We propose a novel constrained Clustering with Weak Label Prior method (CWLP), which is an integrated framework. Within the unified spectral clustering model, the pairwise constraints are employed as a regularizer in spectral embedding and label proportion is added as a constraint in spectral rotation. To approximate a variant of the embedding matrix more precisely, we replace a cluster indicator matrix with its scaled version. Instead of fixing an initial similarity matrix, we propose a new similarity matrix that is more suitable for deriving clustering results. Except for the theoretical convergence and computational complexity analyses, we validate the effectiveness of CWLP through several benchmark datasets, together with its ability to discriminate suspected breast cancer patients from healthy controls. The experimental evaluation illustrates the superiority of our proposed approach.","PeriodicalId":12640,"journal":{"name":"Frontiers of Computer Science","volume":"34 1","pages":""},"PeriodicalIF":4.2,"publicationDate":"2023-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138579387","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Safeguarding text generation API’s intellectual property through meaning-preserving lexical watermarks 通过意义保护词汇水印保护文本生成应用程序接口的知识产权

IF 4.2 3区计算机科学

Frontiers of Computer Science Pub Date : 2023-12-13 DOI: 10.1007/s11704-023-3252-0

Shiyu Zhu, Yun Li, Xiaoye Ouyang, Xiaocheng Hu, Jipeng Qiang

引用次数: 0

IF 4.2 3区计算机科学

Frontiers of Computer Science Pub Date : 2023-12-13 DOI: 10.1007/s11704-023-2678-8

Qianwen Gou, Yunwei Dong, YuJiao Wu, Qiao Ke

引用次数: 0

The governance technology for blockchain systems: a survey 区块链系统的治理技术:调查

IF 4.2 3区计算机科学

Frontiers of Computer Science Pub Date : 2023-12-06 DOI: 10.1007/s11704-023-3113-x

Guocheng Zhu, Debiao He, Haoyang An, Min Luo, Cong Peng

引用次数: 0

MLDA: a multi-level k-degree anonymity scheme on directed social network graphs 有向社交网络图上的多层级k度匿名方案

IF 4.2 3区计算机科学

Frontiers of Computer Science Pub Date : 2023-12-04 DOI: 10.1007/s11704-023-2759-8

Yuanjing Hao, Long Li, Liang Chang, Tianlong Gu

{"title":"MLDA: a multi-level k-degree anonymity scheme on directed social network graphs","authors":"Yuanjing Hao, Long Li, Liang Chang, Tianlong Gu","doi":"10.1007/s11704-023-2759-8","DOIUrl":"https://doi.org/10.1007/s11704-023-2759-8","url":null,"abstract":"With the emergence of network-centric data, social network graph publishing is conducive to data analysts to mine the value of social networks, analyze the social behavior of individuals or groups, implement personalized recommendations, and so on. However, published social network graphs are often subject to re-identification attacks from adversaries, which results in the leakage of users’ privacy. The k-anonymity technology is widely used in the field of graph publishing, which is quite effective to resist re-identification attacks. However, the current researches still exist some issues to be solved: the protection of directed graphs is less concerned than that of undirected graphs; the protection of graph structure is often ignored while achieving the protection of nodes’ identities; the same protection is performed for different users, which doesn’t meet the different privacy requirements of users. Therefore, to address the above issues, a multi-level k-degree anonymity (MLDA) scheme on directed social network graphs is proposed in this paper. First, node sets with different importance are divided by the firefly algorithm and constrained connectedness upper approximation, and they are performed different k-degree anonymity protection to meet the different privacy requirements of users. Second, a new graph anonymity method is proposed, which achieves the addition and removal of edges with the help of fake nodes. In addition, to improve the utility of the anonymized graph, a new edge cost criterion is proposed, which is used to select the most appropriate edge to be removed. Third, to protect the community structure of the original graph as much as possible, fake nodes contained in a same community are merged prior to fake nodes contained in different communities. Experimental results on real datasets show that the newly proposed MLDA scheme is effective to balance the privacy and utility of the anonymized graph.","PeriodicalId":12640,"journal":{"name":"Frontiers of Computer Science","volume":"1 1","pages":""},"PeriodicalIF":4.2,"publicationDate":"2023-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138534539","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0