1. Introduction
Some distributions are adopted to model the important wave parameters [
1,
2,
3,
4,
5], such as significant wave heights and mean wave periods, to better understand the extremely complex marine environment, which is considered to be crucial for coastal engineering applications and the safety of offshore structures.
In practice, these wave parameters are correlated, so it is appropriate for the joint distributions to be used for statistical analysis. On the other hand, the modeled results may have a relatively large bias if we just utilize the univariate probabilistic models in the statistical analyses, so it is one-sided to study one of them alone. To provide better assistance in offshore operations and the construction of drilling platforms, studying the bivariate probability distribution of significant wave heights and mean wave periods has received growing attention recently [
6,
7,
8].
Several parametric approaches have been pointed out to simulate the correlation between these two wave parameters, in which the bivariate function model has been widely employed [
9,
10,
11]. Ochi [
12] pointed out that a bivariate lognormal function can be used to simulate the bivariate probability of wave heights and periods. In addition, a two-dimensional Weibull model was proposed by Kimura [
13] to provide a description of the statistical characteristics of wave heights and periods and discuss the influence of shape factors and related parameters on the fitting effect of the model. These bivariate methods mentioned above are very simple and easy to implement, but the problem is that the requirement for the dataset is relatively high [
14]. On the other hand, the bivariate function models capture the joint behavior of two wave parameters as a whole. In general, different wave parameters should be fitted with different distributions. The conditional model can select different probability distributions for the marginal distributions to fit [
15,
16,
17]. For datasets with low correlation coefficients, the conditional model can add weight to the parts that need attention. However, the disadvantage is that it is difficult to determine the optimal expressions of the joint functions [
18].
The copula function, proposed by Sklar [
19], was used for the stock market and financial risk assessment. With continuous exploration, it has emerged in the joint analysis of ocean and water resources variables [
20,
21]. As a flexible statistical approach, copula allows any type of distribution function and has an indispensable role in the joint modeling of bivariate variables. The bivariate distribution of two ocean parameters, it can be decomposed into the marginal distributions of two parameters and a copula function, and the copula function acts as a bridge between the marginal distributions and the joint distribution. Gaussian copula, as a common copula function, is widely applied in hydrologic analysis [
22,
23,
24,
25]. In addition, Archimedean copula is also applied to establish the bivariate distribution due to its good mathematical properties. Compared with conditional modeling, Dong et al. [
26] found that Clayton Copula showed good performance in bivariate probability analysis of group height and length. Iturrizaga and Zavoni [
27] proposed to establish the bivariate distribution of wave heights and periods through copulas to achieve structural reliability research. Kim et al. [
28] pointed out that Frank and Gaussian copula functions are most suitable for frequency analysis of wave heights and periods on the Korean Peninsula. At the same time, copula functions also show outstanding performance in the modeling of multivariate variables [
29,
30]; it should be noted that asymmetric copulas have certain advantages in the multivariable modeling process, which can more adequately simulate the asymmetric correlation between variables [
31]. For extreme events, three extreme copulas were used by Mazas and Hamm [
32] to model the bivariate distributions of wave heights and sea levels, and the return periods were discussed. Li et al. [
33] showed that Gumbel–Hougaard copula could well fit the joint characteristics of extreme waves and surges.
In previous work, simple parametric probabilistic methods have often been used to fit marginal distributions, such as the Weibull distribution, Forristall distribution, lognormal distribution, and Gamma distribution [
17,
26,
34]. Owing to their simple forms, it is difficult to describe the marginal distributions adequately when the probability distributions show some special features. Therefore, based on the advantage that copulas allow any type of marginal distribution function, a mixed lognormal distribution is proposed to fit the marginal distributions, which are combined with three common Archimedean copula functions to establish the bivariate distributions.
The rest of this paper is arranged as follows.
Section 2 introduces the approaches to fitting univariate distribution and gives the construction of mixed lognormal distribution in detail. At the same time, it also presents the methods of describing joint distribution.
Section 3 gives the research area and provides a statistical analysis of data. In
Section 4, we discuss the fitting of marginal distribution and analyze the joint probability of significant wave heights and wave periods. In addition, the findings are summarized in
Section 5.
3. Study Area and Data Analysis
The wave data are from the National Marine Data Center (
http://mds.nmdis.org.cn/, accessed on 1 June 2022.). Two stations in the East China Sea, namely NanJi (NJ) station and BeiShuang (BS) station, are selected as the research objects, as shown in
Figure 1. Their specific coordinates are
and
, respectively. The East China Sea is an important transportation hub for China’s maritime interactions with various countries in the Pacific region. Here, the cold and warm currents converge, and the seawater exchanges smoothly, which is one of the important fishing grounds in China. Therefore, the analysis of wave characteristics and parameters, especially the joint distribution of
and
, is of positive significance for understanding the wave characteristics in the East China Sea and has important practical value for the design of the offshore structure, prevention of marine disasters, and navigation. In addition, the XaiMaiDao (XMD) station located in the Yellow Sea is selected to further verify the applicability of the methods proposed in this study, with a specific coordinate of
. All data are from 2018 to 2020, and the sampling frequency is one hour. It is inevitable that there will be a small amount of missing observation data. Except for the missing part, the rest will be used for simulation experiments.
Figure 2 displays the scatter plots and histograms of
and
of the three stations. For the NJ and BS stations, the wave heights are mainly in the range of 0.5–2 m, and the wave periods are mainly in the range of 4–8 s. As for the XMD station, most of the wave heights are in the interval of 0–1 m, and most of the wave periods are in the interval of 3–7 s. In addition,
Table 2 shows the statistical information of
and
. It can be found that the data of the XMD station is quite different from that of the other two stations. Therefore, it is feasible to use the XMD station to further illustrate the applicability of the proposed method. As can be seen from the skewness, the probability distribution curves of the two wave parameters are skewed to the right; on the other hand, kurtosis indicates that the probability distribution curves are steep. When fitting the probability density distributions of
and
, we should pay attention to these characteristics of the distribution curves.
4. Results and Discussion
4.1. Fitting the Marginal Distributions
First, we conduct an experimental analysis on NJ and BS stations. Before constructing the bivariate distribution of
and
, it is necessary to conduct probability analysis on each variable to determine their marginal distributions. Weibull distribution has been applied to fit the wave heights [
36], and lognormal distribution is a better approach for probability analysis of wave periods [
14]. Therefore, in order to adequately explain the advantages of the mixed lognormal distributions in estimating the probability of
and
, the fitting results are compared with Weibull and lognormal distributions, respectively.
The probability analysis of
and
from NJ and BS stations is carried out in this section, it is necessary to determine the parameter values of distributions before fitting the wave data. For Weibull and lognormal distributions, the parameters can be solved directly by MLE. On the other hand, the quantity of mixed components of the mixed lognormal distribution is given by
(
Table 3), and the results of other parameters are obtained with the help of the EM algorithm.
Figure 3 presents the probability density functions and frequency histograms of
from NJ and BS stations. Through intuitive comparison, we can find that the Weibull distribution is not enough to predict
in the middle region, and the mixed lognormal distribution performs well on the whole. As shown in
Figure 3b, the prediction of
in the range of 1.2 to 1.8 by Weibull distribution is higher than the empirical value. At the same time, the empirical distributions and marginal distributions from the two methods are also plotted in
Figure 3. There are obvious differences between the Weibull distributions and the empirical distributions, especially in the middle area. In contrast, the curves of the mixed lognormal distributions are basically consistent with the curves of the empirical distributions. Therefore, the mixed lognormal distribution may be a new option to effectively fit
.
The lognormal distributions and mixed lognormal distributions are used to describe
of NJ and BS stations, and their probability density functions are given in
Figure 4. According to
Figure 4a, the lognormal distribution does not provide enough probability prediction in the range of 5 to 6 and overestimates the probability in the range of 7 to 8. By comparison, the mixed lognormal distribution can adequately fit the distribution characteristics of
. The empirical distributions generated from the two datasets are shown in
Figure 4c,d, from which it can be concluded that the fitting accuracy of the mixed lognormal distribution is higher. In order to more specifically verify the performance of mixed lognormal distribution in describing marginal probability distribution, this paper uses the root mean square error (
) as an evaluation index and defines it as follows:
where
represents the empirical distribution and
the theoretical distribution.
Table 4 presents the values of the
test. In general, the smaller the value of
, the better the performance of this method. The results show that the mixed lognormal distribution is quite different from the other two methods and has a smaller
value, indicating that it produces better consistency in fitting field data.
The mixed lognormal distribution, as a flexible statistical method, performs well in univariate analysis of and . When the marginal distribution shows some special features, such as heavy tail and saddle shape. Because of its simplicity, it is difficult to capture these characteristics for Weibull distribution and lognormal distribution, in which case the mixed lognormal distribution will show a greater advantage. At the same time, the mixed lognormal distribution may be a new option in the extreme analysis of ocean parameters.
4.2. Fitting the Bivariate Distributions
It is not easy to construct the joint distributions of
and
in the mixed sea state [
14]. The main purpose of this section is to apply the conditional model, bivariate function model, and copula model to establish the bivariate distributions of wave data from NJ and BS stations under the total sea state. In order to assess the fitting ability of these models, the squared Euclidean distance is introduced.
4.2.1. Bivariate Distributions with the Conditional Model and Bivariate Function Model
When using the conditional model to establish the bivariate distribution, it is important to note that the interval of dividing
may affect the final fitting performance [
24]. In this present work, the interval of
is selected as 0.25 m, and the parameter values of the conditional model can be obtained by nonlinear fitting. For the bivariate function model, the bivariate lognormal model has been proven to be useful for describing the joint behavior of
and
. The parameters of the bivariate lognormal model can be obtained by MLE based on experimental data.
The contour plots obtained based on the conditional model and the bivariate lognormal model are shown in
Figure 5 and
Figure 6. To intuitively assess the applicability of the above two models, the empirical distributions obtained from the experimental data are also drawn in the contour plots. As can be summarized from
Figure 5a and
Figure 6a, the conditional model overestimates the probability of lower wave heights and larger periods for NJ and BS stations. In addition, the conditional model fits the contours of the empirical distributions poorly.
Figure 6b presents the contour plot of the bivariate lognormal model at BS station, and we can find that this model underestimates the probability of larger wave heights. The conditional model and the bivariate lognormal model have larger the squared Euclidean distance, which further verifies that the two methods have a poor ability to fit joint samples, as shown in
Table 5.
4.2.2. Bivariate Distributions with the Copula Model
In contrast to conditional modeling and bivariate lognormal distribution, the copula method is different in that it has no requirement on the distribution form of the two connected variables. Therefore, in this section, two types of copula models are introduced: one is that Weibull distribution is chosen to fit
, and lognormal distribution is chosen to fit
, and then construct bivariate distribution by copula function. The other is that combine the mixed lognormal distributions proposed in
Section 2 with the copula function to obtain a new copula model for constructing the bivariate distribution of
and
. In view of the variety of copula functions, the Gumbel, Clayton, and Frank copulas are selected for analysis and comparison. The parameters of copulas are solved by the
function in the MATLAB toolbox.
Figure 7 and
Figure 8 display the contour plots of the copula models connecting Weibull and lognormal distributions. For the NJ station, all three copula models overestimate the probability of a larger
and
, which may be caused by the inadequate fitting of Weibull and lognormal distributions to the marginal distributions. For the BS station, the Gumbel and Clayton copula models have poor fitting for the lower wave heights. Meanwhile, the Frank copula model underestimates the probability of smaller wave heights. From
Table 5, we can see that the copula models cannot achieve satisfactory results by connecting the Weibull distribution and lognormal distribution to establish bivariate distributions.
Weibull and lognormal; (b) Clayton copula model Weibull and lognormal; (c) Frank copula model Weibull and lognormal. Red lines are generated from raw data.
The analysis in
Section 4.1 shows that the mixed lognormal distributions are more suitable for fitting the probability distributions of
and
than the Weibull distribution and lognormal distribution. Combining two mixed lognormal distributions with copula function may be a good option to effectively simulate the bivariate distribution of
and
. The contour plots of copula models with mixed lognormal distributions as the marginal distributions are shown in
Figure 9 and
Figure 10. From this figure, it can be found that the three copula functions based on the mixed lognormal distributions can better fit the curve of the empirical distributions.
From the contour plots, it is difficult to determine which of the three copula functions performs best.
Table 5 gives the values of the squared Euclidean distance
for NJ and BS stations. For the NJ station, the
of Gumbel, Clayton, and Frank copulas are 0.0436, 0.0407, and 0.0564, respectively. According to the data analysis in
Section 3, it is concluded that the wave height data accounts for the largest proportion in the range of 0.5–1.5 m. The Clayton copula fits the contour plot of the empirical distribution best in the above range, as shown in
Figure 9, which may be the reason for the smallest value of
for the Clayton copula. For the BS station, the
of Gumbel, Clayton, and Frank copulas are 0.0582, 0.0600, and 0.0655, respectively. The Gumbel copula is optimal for the BS station.
4.3. Verification
In view of the similarity of the data from the two stations in the East China Sea, it is insufficient to demonstrate the wide applicability of the proposed method. Therefore, the three bivariate models involved in this study, namely the conditional model, the bivariate lognormal model, and the copula model based on mixed lognormal distribution, are applied to the XMD station to further illustrate the wide applicability of the new copula method. Since the data of the XMD station is different from that of the other two stations, in order to make the contour plot size appropriate, we have made appropriate adjustments to the grid division. Although this affects the size of , it does not affect the performance comparison of several methods. In addition, the parameters of all bivariate models are obtained by using the solution methods mentioned earlier.
The contour plots obtained based on the conditional model and the bivariate lognormal model are shown in
Figure 11. From this figure, it can be found that the conditional model overestimates the probability of larger wave heights and periods. At the same time, the bivariate lognormal model also overestimates the probability of larger wave periods. The
of conditional model and bivariate, lognormal model is 1.9359 and 0.8465, respectively, which further verifies that the two methods have poor ability to fit joint samples.
Figure 12 displays the contour plots of the copula models based on mixed lognormal distribution. It can be found that the three copula models can better fit the curve of the empirical distribution. The
of Gumbel, Clayton, and Frank copulas are 0.4314, 0.5111, and 0.4986, respectively. By comparison, the Clayton copula is optimal for the XMD station.
Obviously, the biggest problem of the conditional model and bivariate lognormal model is poor flexibility. They may perform well in some special cases, but the fitting effect is not satisfactory in most cases. After the analysis of univariate fitting, a copula method based on mixed lognormal distribution is proposed to establish the joint distribution of and , and satisfactory results are obtained. The characteristics of the copula function provide the basis for selecting the optimal distribution forms for the marginal distributions of two variables, which directly affects the performance of the constructed bivariate distribution. Of course, the choice of copula function type is also a problem, which reflects the coupling of two ocean parameters. We can choose the best one by comparing of several copula models.
5. Conclusions
In this work, two mixed lognormal distributions are connected by copula function to establish the bivariate distribution of and , and compared with the conditional model and bivariate function model. The squared Euclidean distance is used to verify the fitting performance of these models.
Since the copula function allows any type of marginal distribution, in order to obtain the optimal form of marginal distribution, the probability distributions of and are analyzed. The experimental results show that the Weibull distribution fits the probability distribution of poorly, and the lognormal distribution underestimates the probability of in the middle region. In contrast, the mixed lognormal distribution, as a flexible statistical method, provides satisfactory fitting results for both and . Although the solution of many parameters brings inevitable drawbacks to the mixed lognormal distribution, the EM algorithm can effectively solve this problem. In the analysis of bivariate probability, the conditional model and bivariate function model have poor fitting effect in the region with larger . The copula model, which connects Weibull and lognormal distributions, performs poorly in predicting the probability of smaller . In comparison, the copula model based on mixed lognormal distribution is more suited to simulate the joint distribution of and .
An accurate prediction of the joint distribution of and is of high practical value in the design of maritime structures and mitigation of marine disasters. The method of connecting mixed lognormal distributions by copula function may be a new option to effectively simulate the joint distribution.