We demonstrate the effectiveness of the proposed method with numerical experiments. We restrict our discussion to a two-stage data-driven linear predictive prescription model. We applied and compared the following alternative approaches: the sample average approximation (SAA), the point-prediction (PP) approach, the predictive-prediction (PR) approach, and the proposed approach. First, we apply the proposed framework to a small-sized problem in which there are two variables and two constraints in the first and second stages and show how the proposed method is applied. Second, we expand the experiments with larger sized problems.
Experimental conditions are Intel(R) CoreTM i7-8700 (3.20 GHz, 3.19 GHz) with 32.0 GB memory. The program was coded in Julia with Gurobi optimizer called from Convex.jl.
5.1. Small-Size Instance
For the ease of exhibition, we consider a two-stage stochastic linear-programming problem (
28)
where
is the optimal value of the following 2nd stage problem, (
29)
Note that the problem has complete recourse, i.e., if there exists a solution y satisfying and for every x.
We assume that
u follows the multivariate normal distribution with known
and
as in (
30)
In this experiment, we set
and
. We also assume that
v follows the normal distribution with known
and
as in (
31).
In this experiment, we set and .
We generate training samples and test samples . The decision-maker does not know the true distribution or test samples. The decision-maker only knows the training samples generated from the true distribution.
We have M training samples and N test samples. We repeat this same experiment where the decision-maker sees M samples and solves the problem 10,000 times. Each time, we record the optimal value of the optimization problem . Each time, we got an optimal decision which is a random variable that depends on training samples. Each of these decisions, we evaluated the objective of the optimization problem by using another N test samples to compute the out-of-sample performance. The optimal value will be random because the training samples are random.
Table 2 presents the average out-of-sample performance, where SAA, PP, PR, and RO denote the average out-of-sample performance of the sample average approximation, point-prediction approach, predictive prescription, and proposed method, respectively. From
Table 2, we see that PP derived the worst out-of-sample performance of these. This is because the PP approach does not take into account the robustness against the prediction error. From the assumption that the joint distribution of
u and
v is a multivariate normal distribution, even if
v is obtained, the range that
u can take varies. The point prediction approach makes decisions without considering this variability, and as a result, the out-of-sample performance was disappointing. The SAA approach was the second worst, which was also disappointing. This is because SAA does not use the information of
v, so regardless of the value of
v, all samples are used as training data to make decisions. Therefore, it is considered that the overfitting made the out of sample performance worse because the possible ranges of training data and validation data are different. PR and RO, both of which utilized auxiliary data
v and consider the robustness against prediction error, derived much better out-of-sample performance than the other two. Furthermore, the proposed method was able to obtain the best value of all. This is because the PR approach makes decisions using only the samples that appear in the
k-nearest neighbor as training data, whereas in RO, the uncertainty set is defined using the minimum volume enclosing ellipsoid. Therefore, it is considered that the better result was obtained because the worst case is taken even for the unknown sample.
Table 3 presents the average out-of-sample performance of the proposed algorithm with the different number of training samples that included in the
kNN. From the
Table 3, it can be seen that the quality of the proposed method changes greatly depending on the value of
k. As
k increases, the robust optimization approach does not work well. To consider the reason for this,
Figure 1 shows the minimum volume ellipsoid with
and
, where the horizontal axis represents
and the vertical axis represents
, the blue dots indicate all training data, the black dots indicate validation data, the red dots indicate samples within the
k-nearest neighbor, and the green line indicates the obtained minimum volume ellipsoid. From this
Figure 1, we see that the distributions of training data and validation data are significantly different. At
, it can be seen that the distribution of the samples in the
k-nearest neighbor and the distribution of validation data are close. On the other hand, when
, the distribution of validation data differs greatly from that of
k-nearest neighbors because the
k-nearest neighbors are too large. These results indicate that by setting
k properly, the corresponding ellipsoid covers the proper size of uncertainty.
5.2. Large-Size Instances
We consider a two-stage stochastic linear programming problem:
where
,
, and
are first-stage parameters and
is an optimal value of the second-stage problem
and
with second-stage parameters
,
,
,
. The problem has complete recourse, i.e., there exists
y that satisfying
for
.
We assume that
follows the multivariate normal distribution as in (
34).
We also assume that
follows the normal distribution with known
and
We change the parameters as 10,000. Each element of was randomly generated from a uniform distribution . Furthermore, b was set by generating a random solution and setting . Each element of was randomly drawn from the uniform distribution . Each element of was randomly drawn from and made into a symmetric matrix by setting . Each element of was randomly drawn from the uniform distribution . Each element of was randomly drawn from and made into a symmetric matrix by the same method as .
The result of the out-of-sample performance is summarized in
Table 4. From
Table 4, it can be seen that RO (
), the proposed method, has the best out-of-sample performance, as in the case of small sized instance. We also find that the SAA and PP approaches have very poor out-of-sample performance. This result suggests that utilizing the auxiliary variable
v and consideration of the prediction error will make the out-of-sample performance better.
The CPU Time to solve randomly generated samples is summarized in
Table 5. From
Table 5, SAA takes a long time when the sample size
M is large. This is because it is necessary to solve the 2nd stage linear programming problem for each sample, i.e.,
M times. The PP is the fastest, regardless of sample size
M. This is because PP solves 2nd stage linear programming problem only once regardless of the sample size. The PR was faster than the SAA and slower than the PP. This is because it is necessary to solve the 2nd stage linear programming problem for for each sample in the
k-nearest neighborhood, i.e.,
k times. As
, the relation of the CPU Time for these three approaches can be explained. Finally, the RO was faster than PR and slower than PP. This is because the proposed method solves only one SOCP regardless of the sample size
M.
It is not possible to directly compare the speed with other papers because the assumption of the proposed framework is more complex. The closest one is that of Bertsimas and Van Parys (2017), in which their proposed algorithm was tested by the newsvendor problem with one decision variable and the portfolio allocation problem with the six decision variables. In this study, the proposed method was tested to the two-stage problem with over 100 decision variables in each stage. These results indicate the proposed method has a better scalability compared to the existing alternative approaches.
Unfortunately, the result was not obtained within an hour for even larger data, e.g., or . This is mainly because of the performance of the commercial solver. The robust counterpart derived in the proposed method is a SOCP and in theory can be solved efficiently. However, it is still nonlinear programming model and is difficult. Therefore, we need to consider developing the algorithm to exploit the special structure of the model for the future research.