An Empirical Study of Linear Dimensionality Reduction for Judicial Predictive Models

2018 Eighth International Conference on Information Science and Technology (ICIST) Pub Date : 2018-06-01 DOI:10.1109/ICIST.2018.8426121

Zhenyu Liu, Huanhuan Chen

引用次数: 1

Abstract

Judicial cases can be modeled with thetextual frequency vectors under the Bag-of-Words assumption to predict the decision outcome. However, such models are often with much more numbers of features than training samples, which usually leads to the over fitting problem. In this paper, we conduct an empirical investigation on linear dimensionality reduction of the high-dimensional judicial predictive models via the wide spread principal component analysis approach. The experimental results show that these high-dimensional models do not suffer from the overfitting problem, but the under fitting problem. Moreover, the higher-order dependency in the textual frequency data cannot be decorrelated by the linear dimensionality reduction approach, which restrains the performance of judicial classification models subject to the unchanged level of signal-noise ratio in the derived low-dimensional features.

查看原文本刊更多论文

司法预测模型线性降维的实证研究

在词袋假设下，利用文本频率向量对司法案件进行建模，预测判决结果。然而，这样的模型往往比训练样本具有更多的特征，这通常会导致过拟合问题。本文采用广义主成分分析方法对高维司法预测模型的线性降维进行了实证研究。实验结果表明，这些高维模型不存在过拟合问题，但存在拟合不足问题。此外，文本频率数据中的高阶依赖不能通过线性降维方法去相关，这限制了司法分类模型在派生的低维特征信噪比不变的情况下的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2018 Eighth International Conference on Information Science and Technology (ICIST)

自引率

0.00%

发文量