An Effective Framework Using Spatial Correlation and Extreme Learning Machine for Moving Cast Shadow Detection

Yi, Yugen; Dai, Jiangyan; Wang, Chengduan; Hou, Jinkui; Zhang, Huihui; Liu, Yunlong; Gao, Jin

doi:10.3390/app9235042

Open AccessArticle

An Effective Framework Using Spatial Correlation and Extreme Learning Machine for Moving Cast Shadow Detection

by

Yugen Yi

¹

,

Jiangyan Dai

^2,*,

Chengduan Wang

²,

Jinkui Hou

²,

Huihui Zhang

²,

Yunlong Liu

²

and

Jin Gao

²

¹

School of Software, Jiangxi Normal University, Nanchang 330022, China

²

School of Computer Engineering, Weifang University, Weifang 261061, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2019, 9(23), 5042; https://doi.org/10.3390/app9235042

Submission received: 16 October 2019 / Revised: 11 November 2019 / Accepted: 20 November 2019 / Published: 22 November 2019

(This article belongs to the Section Optics and Lasers)

Download

Browse Figures

Versions Notes

Abstract

:

Moving cast shadows of moving objects significantly degrade the performance of many high-level computer vision applications such as object tracking, object classification, behavior recognition and scene interpretation. Because they possess similar motion characteristics with their objects, moving cast shadow detection is still challenging. In this paper, we present a novel moving cast shadow detection framework based on the extreme learning machine (ELM) to efficiently distinguish shadow points from the foreground object. First, according to the physical model of shadows, pixel-level features of different channels in different color spaces and region-level features derived from the spatial correlation of neighboring pixels are extracted from the foreground. Second, an ELM-based classification model is developed by labelled shadow and unlabelled shadow points, which is able to rapidly distinguish the points in the new input whether they belong to shadows or not. Finally, to guarantee the integrity of shadows and objects for further image processing, a simple post-processing procedure is designed to refine the results, which also drastically improves the accuracy of moving shadow detection. Extensive experiments on two publicly common datasets including 13 different scenes demonstrate that the performance of the proposed framework is superior to representative state-of-the-art methods.

Keywords:

moving cast shadow; feature extraction; extreme learning machine; spatial correlation; post processing

1. Introduction

As a fundamental procedure in many high-level computer vision and image-processing applications, moving cast shadow detection has drawn more attention in recent years. This is because that cast shadows have similar properties with their corresponding moving objects, which may cause the misclassification of object detection and further downgrades the performance of object classification [1], object tracking [2], behavior analysis [3], scene interpretation [4]. Therefore, it is urgent to develop an effective moving cast-shadow detection method to separate shadows from the foreground.

Over the past decades, numerous works have been studied and surveyed in the literatures (e.g., [5,6]). Prati et al. [5] first divided shadow detection methods into deterministic and statistical methods depending on whether the decision process utilized the uncertainty or not. Subsequently, Sanin et al. [6] further categorized shadow detection methods into four kinds, including chromaticity-based methods, physical-based methods, geometry-based methods and texture-based methods, according to the type of shadow features. Regardless of the above categories, the process of shadow detection mainly consists of two stages: feature extraction and classification. The commonly applied shadow features are divided into pixel-level features and region-level features based on shadow properties [4]. In particular, the pixel-level feature is the value obtained by different channels in different color spaces, while the region-level feature is extracted statistically in terms of spatial correlations of neighboring pixels.

A significant property of a shadow is that the shadow is darker than that of the corresponding background and maintains constant chromaticity. Based on the property, the widely applied pixel-level features are intensity and chromaticity. Cucchiara et al. [7] first proposed a chromaticity-based method to detect shadows by calculating the change rates of three components in HSV color space. Thereafter, many color space-based methods were developed to exploit this property for shadow detection, such as RGB [8,9],

c_{1} c_{2} c_{3}

[10], YCbCr [11], normalized RGB [12], YUV [13], HSI [14] or combination among of them [15,16]. However, just only intensity and chromaticity features may not be able to accurately separate shadows from foregrounds when moving shadows and moving objects both have darker colors. Moreover, pixel-based methods are sensitive to noise. To solve these problems, the other property of a shadow is exploited to perform shadow detection, which assumes that the texture of the shadow is similar to the background and it is different from the foreground. According to the texture consistency, the region-level feature is described by spatial correlations of neighboring pixels, which is also called the statistical feature. The local texture descriptors, which are robust to noise and illumination variation, have been adopted widely to detect shadows, such as Gabor function [17], scale-invariant local ternary patterns (SILTP) [18], discrete wavelet transform (DWT) [19], gradient information [20], non-linear tone-mapping (NTM) [21] or their combinations [22]. These kinds of methods will fail when the texture property is similar between the background and foreground.

Obviously, using a single property cannot detect shadows completely. Therefore, moving shadow detection methods with multiple features based on these two properties have attracted extensive attention [23,24,25,26,27]. In particular, Liu et al. [23] utilized the pixel-level information, local region-level information and global-level information to remove the shadow. Tang et al. [25] extracted the grey level, color composition, and gradient information of pixels in a video frame to construct a respective shadow mask corresponding to the same video frame, and then they discriminated shadow pixels by minimizing the likelihood of missing a shadow pixel in each of the shadow masks. Wang et al. [26] jointly used the color, texture and gradient by exploiting local neighboring information and designed an adaptive mechanism to estimate threshold parameters for detecting shadows. Gomes et al. [27] integrated the chromatic and gradient information with the image hypergraph segmentation and used a stochastic majority voting scheme to shadow regions. Without loss of generality, after extracting features, the above methods detect shadows according to the parameter assumptions and thresholds tuning in the classification stage. Apparently, it is difficult to acquire the appropriate thresholds of parameters for various environments such as indoors and outdoors.

Recently, learning-based methods have been popular in shadow detection [28,29,30,31,32,33,34], which apply a classifier constructed by various shadow features to discriminate shadows from the foreground. For example, Joshi et al. [30] extracted a set of features derived from characteristic differences in color and edges, then employed support vector machines (SVM) and the co-training algorithm for classification. Dai et al. [31] extracted various features based on illumination, color and texture, then employed partial least squares (PLS) and logistic discrimination (LD) to classify shadows from their moving objects. Russell et al. [33] introduced the binary patterns of local colour constancy (BPLCC), light-based gradient matching (LGM) and intensity-reduction histogram (IRH) to construct two over-complete dictionaries from image patches, then performed the sparse representation classifier to discriminate shadows and objects. Lin [34] designed a multi-layer pooling scheme (MLPS) to integrate the features in a local region and to reduce the dimension of extracting features, then used a random forest algorithm as the ensemble decision scheme. Although the above methods require the label information from a part of ground truth to train the classifier model, they can adapt to various environments very well without turning thresholds or parameters.

Instead of handcrafted feature extraction, machine learning techniques have been developed to automatically learn features using deep neural networks [35,36,37,38,39,40,41,42]. First, researchers mainly treated the convolutional neural network (CNN) as a powerful feature extractor and made significant performance improvement with the powerful deep features. For example, Shen et al. [35] first extracted shadow edges via a structured CNN, and then solved shadow recovery as an optimization problem. Khan et al. [36] first applied the CNN to shadow detection. They utilized a 7-layer CNN to extract features from superpixels and then feed the features to a conditional random field (CRF) model to smooth the detection results. Then, end-to-end CNN models were proposed due to the emergence of fully convolutional networks (FCN) [37]. For example, Vicente et al. [40] presented a semantic-aware stacked CNN model to extract the semantic shadow prior and then refined the output by a patch-based CNN. Hu et al. [41] formulated the direction-aware attention mechanism in a spatial recurrent neural network (RNN) and recovered direction-aware spatial context (DSC) for detecting shadows. The CNN-based methods can learn features from the image, while they need a sufficient data for training and require to turn some parameters, such as the number of convolutional layers, the detection window size of each convolution layer, learning rate and the number of iterations. However, most of the methods mentioned were presented to detect shadow in single image. Besides, deep learning methods generally require a large number of labeled samples for training. For video sequences, it is very difficult and expensive to obtain a large number of labeled samples manually.

As a special type of single-hidden layer feed-forward neural network, an extreme learning machine (ELM) has been extensively applied in several fields of machine learning [43,44,45,46,47,48], in which the parameters are randomly generated and the values of parameters do not need to be tuned. Ghimire and Lee [48] proposed an online sequential extreme learning machine-based semi-supervised technique for moving cast shadow detection, which provided better generalization performance. Motivated by its fast learning, good generalization and universal approximation capability, we propose an effective moving shadow detection based on the extreme learning machine. The main contributions of this work are: (1) based on the shadow properties, a set of features consisting of pixel-level features and region-level features is extracted by considering the characteristics of pixels and the spatial correlations of neighboring pixels simultaneously. (2) A generalized model based on the extreme learning machine is constructed for classification, which is simple and efficient, without turning thresholds or parameters. (3) Extensive qualitative and quantitative evaluations on 13 various scenes demonstrate that the performance of the proposed method compared with some well-known methods.

The rest of this paper is organized as follows. Section 2 describes the moving shadow detection based on ELM in detail, Section 3 presents the experimental results and analysis. The conclusions are given in Section 4.

2. Extreme Learning Machine (ELM)-Based Moving Cast Shadow Detection

In this section, we develop a novel ELM-based moving cast shadow detection approach, and the overall architecture of the proposal is illustrated in Figure 1, which mainly includes the following five steps. The first step is to acquire the labelled object pixels and shadow pixels from the ground truths. The next step is feature extraction. In this step, pixel-based features and region-based features are extracted to form an input data matrix for training. The third step is MCSD-ELM model learning. The proposed MCSD-ELM classifier is trained for the moving cast-shadow detection, and we can obtain the corresponding output connecting weights w. The fourth step is the classification. For a given foreground image, firstly, extract features by step 2; then, calculate the corresponding network output values using the weights w; then, utilize the highest network output value based on the weights w to determine the final class label. Finally, to obtain the complete objects and shadows for high-level computer vision applications, the last step of post-processing is carried out.

2.1. Feature Extraction

To efficiently describe the shadow properties with abundant information, the pixel-level and region-level illumination invariant distinguishing features are extracted in three different color spaces based on intensity, local color constancy, and local texture consistency. The set of feature descriptors is constructed to form an input data matrix for ELM.

Here, we assume that the background image B (without moving objects), and current foreground frame F (containing moving objects and moving shadows) of a sequence are generated using a standard background subtraction as described in [49]. Let

B^{c} (x, y)

be the intensity value located at

(x, y)

of component c in the background B. Similarly,

F^{c} (x, y)

is the intensity value located at

(x, y)

of component c in the current foreground frame F.

2.1.1. Pixel-Level Features

Shadows are illuminated by a skylight, while non-shadows are illuminated by both skylight and sunlight. Therefore, shadows are darker than that of the surface where they are cast. Based on the Phong model [50], one of the properties related to shadows is that the intensity of shadows must be lower than that of non-shadows in each component and the chromaticity changes in a small limit. According to the property, several pixel-level features derived from different color space are exploited to depict shadows as much as possible.

(A) Color ratio in RGB color space

Due to the intensity of shadow is lower than non-shadow in each component, the color ratio is used to represent the ratio of shadow and non-shadow in RGB color space [4]. To avoid the division by zero, the color ratio is defined as:

K^{c} (x, y) = \frac{F^{c} (x, y)}{B^{c} (x, y) + 1}

(1)

where

K^{c} (x, y)

is the color ratio at the location

(x, y)

in component c,

c \in {R, G, B}

.

B^{c} (x, y)

and

F^{c} (x, y)

are the intensity values at the location

(x, y)

of component c in the background B and the current foreground frame F.

(B) Lightness ratio in LRGB color space

The lightness-red-green-blue (LRGB) color model was proposed in the literature [51], in which the lightness and color components can be scaled separately. Hence, the lightness ratio calculated in LRGB color space can better describe the darkness characteristic of shadow.

The LRGB components

{(L, T_{1}, T_{2}, T_{3})}^{T}

are generated by transforming RGB components

{(A_{1}, A_{2}, A_{3})}^{T}

, as follows:

\begin{array}{l} L = 0.299 \times A_{1} + 0.587 \times A_{2} + 0.114 \times A_{3} \\ T_{i} = A_{i} - L, i = 1, 2, 3 \end{array}

(2)

Then, the lightness ratio L^LRGB is given by:

L^{L R G B} (x, y) = \frac{F^{L} (x, y)}{B^{L} (x, y)}

(3)

where

F^{L} (x, y)

and

B^{L} (x, y)

are the lightness values at the location

(x, y)

in lightness components L of the foreground frame F and background image B in LRGB color space, respectively.

(C) Color constancy in the HSV color space

The shadow maintains the color constancy compared with the surface it is cast. Generally, the hue and saturation components in HSV color space were utilized to describe this property [7]. Meanwhile, Tsai [52] assumed that shadows have higher hue in HSV color space. Therefore, the color constancy can be depicted adequately by the following three features:

\begin{array}{l} H (x, y) = | F^{h} (x, y) - B^{h} (x, y) | \\ S (x, y) = | F^{s} (x, y) - B^{s} (x, y) | \\ R (x, y) = \frac{F^{s} (x, y)}{F^{v} (x, y) + 1} \end{array}

(4)

where

F^{h} (x, y)

,

F^{s} (x, y)

and

F^{v} (x, y)

are values at location (x, y) in the hue, saturation and value components of foreground frame F in HSV color space, respectively. Likewise,

B^{h} (x, y)

and

B^{s} (x, y)

are values at location (x, y) in the hue and saturation components of background image B in the HSV color space.

H (x, y)

and

S (x, y)

denote the hue and saturation differences between F and B, respectively.

R (x, y)

reflects that the higher hue of shadows appears in HSV color space, which is calculated in the foreground image F. In addition,

F^{h}, F^{s}, F^{v}, B^{h}, B^{s} \in [0, 1]

.

2.1.2. Region-Level Features

The other property of shadow is that the texture of the shadow is similar to the surface where it is cast (called background) and is different from the foreground. It is noticed that pixel-level features are sensitive to noises. To overcome the drawback for shadow detection, region-level features are developed to describe the texture consistency of shadow, which are explored by the spatial correlations of neighboring pixels, such as normalized cross-correlation (NCC), Gabor features and modified local binary patterns (MLBP).

(A) Normalized cross-correlation (NCC) in the LRGB color space

In fact, the shadow is the approximate scale version with respect to the background because of its darkness [50]. It is proved that the normalized cross-correlation (NCC) [53] is useful to adequately reflect the similarity between shadow and corresponding background, which is obtained in a neighboring region and is robust to noise. As noticed by the literature [51], the lightness can be scaled very well in LRGB color space. Given one pixel p at the location

(x, y)

, its neighboring pixel q at the location

(i, j)

, and the set of its neighboring pixels is denoted as

Ω_{p}

, in which

(i, j) \in Ω_{p}

. Hence, the NCC is formulated as:

N C C (x, y) = \frac{M^{F B} (x, y)}{M^{B} (x, y) \cdot M^{F} (x, y)}

(5)

where

\begin{array}{l} M^{F B} (x, y) = \sum_{(i, j) \in Ω_{p}} F^{L} (i, j) \cdot B^{L} (i, j) \\ M^{B} (x, y) = \sum_{(i, j) \in Ω_{p}} B^{L} (i, j) \cdot B^{L} (i, j) \\ M^{F} (x, y) = \sum_{(i, j) \in Ω_{p}} F^{L} (i, j) \cdot F^{L} (i, j) \end{array}

(6)

where

F^{L} (i, j)

and

B^{L} (i, j)

are the foreground lightness value and the background lightness value of a neighboring pixel at location

(i, j)

in the lightness component L of LRGB color model, respectively.

(B) Illumination invariant Gabor features within the current frame of the RGB color space

The 2D Gabor filter [54] depicts the intensity variation with a range of scales and orientations for one pixel in its neighborhood. The generated Gabor texture descriptor is illumination invariant which is powerful to further express the texture information of shadow and non-shadow regions. Given one pixel at location

(x, y)

, and its neighborhood

D (x, y)

centered at

(x, y)

. Its Gabor transform is defined over P scales and Q orientations by the convolution:

G_{p q}^{c} (x, y) = \sum_{i}^{I - 1} \sum_{j}^{J - 1} D (x - i, y - j) \cdot g_{p q} (i, j)

(7)

where I and J represent the dimensions of Gabor kernel

g_{p q}

, and

G_{p q}^{c} (x, y)

is the Gabor coefficient at the location

(x, y)

in component c,

c \in {R, G, B}

.

Here, the Gabor kernel

g_{p q}

is:

g_{p q} (x, y, f, θ) = a^{- p} (\frac{1}{2 π σ_{x} σ_{y}}) \exp (- \frac{1}{2} [\frac{x_{θ}}{σ_{x}^{2}} + \frac{y_{θ}}{σ_{y}^{2}}]) \times \cos (2 π f x_{θ})

(8)

where

\begin{array}{l} x_{θ} = a^{- p} (x \cos θ + y \sin θ) \\ y_{θ} = a^{- p} (- x \sin θ + y \cos θ) \end{array}

(9)

where

σ_{x}

and

σ_{y}

denote the sizes of the Gaussian envelope in the x and y directions, respectively. f is the base frequency of the sinusoid, p is the scale factor (

p = 0, 1, \dots, P - 1

for a > 1) and q is the orientation factor (

q = 0, 1, \dots, Q - 1

). Therefore, the filter orientation

θ = \frac{q π}{Q}

.

In our work, we extract the Gabor features with

P = 0

and

Q = 4

. Particularly, the texture information is described in the current foreground frame F with

θ \in {0^{\circ}, 45^{\circ}, 90^{\circ}, 135^{\circ}}

for R, G, B three components in RGB color space.

(C) Modified local binary patterns (MLBP) in the RGB color space

The modified local binary pattern (MLBP) [55] is significant to represent the texture information of shadow. This is because the MLBP is not only illumination invariant but also is robust to flat regions. Besides, it is fast for computation. Given one pixel at location

(x, y)

and its intensity value is denoted as V_m. Then, the MLBP descriptor is calculated as follows:

M L B P_{N, r} (x, y) = \sum_{(i, j) \in Ω (x, y)} s (V_{n} - V_{m} - Δ) \times 2^{n}, s (z) = {\begin{matrix} 1, & z \geq 0 \\ 0, & o t h e r w i s e \end{matrix}

(10)

where N and r are the numbers of pixels in the neighborhood

Ω (x, y)

centered at

(x, y)

and the radius of a circle, respectively. V_n is the intensity value of the neighbor pixel at location

(i, j)

in

Ω (x, y)

and

Δ

is a threshold to maintain the robustness for flat regions.

n = 0, 1, \dots, N - 1

. Hence, we can obtain a N-bits binary pattern for one given pixel according to the Equation (10). Afterwards, a histogram with 2^N-bits is produced to express the texture information.

To obtain the texture similarity between shadow and non-shadow, the simple histogram intersection operation is adopted for fast computation. Therefore, the texture similarity is given by:

S i m^{c} (x, y) = \sum_{i = 0}^{2^{N} - 1} \min (h_{F}^{c}, h_{B}^{c})

(11)

where

h_{F}^{c}

and

h_{B}^{c}

are the histograms of pixels at (x, y) in component c (

c \in {R, G, B}

) of the foreground frame F and corresponding background B, respectively.

S i m^{c} (x, y)

represents the common parts of two histograms for one pixel at location (x, y) in component c.

2.1.3. Feature Descriptor

The pixel-level features and region-level features extracted above have varying dynamic ranges. Therefore, the features need to be normalized. After that, all of the features are combined to form the final descriptor for the foreground frame with a dimension of d = 23.

2.2. Classification Using Extreme Learning Machine

After feature extraction, the objective of the next step is to assign each pixel to one of two categories: shadow and object. The ELM [43,44,45,46,47,48] is adopted for classification, which has been proven to be effective and efficient in addressing a variety of classification problems. Next, the classification procedure using the ELM algorithm is described in detail.

Suppose that a training set is denoted as

{(x_{i}, y_{i})}_{i = 1}^{N}

with N samples and C classes, where

x_{i} = [x_{i 1}, x_{i 2}, \dots, x_{i d}] \in ℜ^{d}

is the ith training sample with d dimensions.

y_{i} = [y_{i 1}, y_{i 2}, \dots, y_{i C}] \in ℜ^{C}

is the C-dimensional label vector of

x_{i}

. Note that if

x_{i}

belongs to the jth class, then

y_{i j} = 1

. Otherwise,

y_{i k} = 0 (k \neq j)

. The training set

{(x_{i}, y_{i})}_{i = 1}^{N}

is utilized to train the single hidden layer feedforward neural network (SLFN). Here, the network contains d inputs, L hidden-layer neurons and C outputs.

The output function of the ELM [43] is expressed as:

f_{L} (x) = \sum_{i = 1}^{L} w_{i} h_{i} (x) = h (x) w

(12)

where

w = {[w_{1}, w_{2}, \dots, w_{L}]}^{T}

is the output weight vector connecting the hidden layer nodes and the output nodes

h (x) = [h_{1} (x), h_{2} (x), \dots, h_{L} (x)]

is the output vector of the hidden layer for the input

x_{i}

which is also called ELM non-linear feature mapping [43].

To obtain the weights

W \in ℜ^{L \times C}

connecting the hidden layer nodes and the output layer nodes, the sum of squared loss for the prediction errors is minimized by:

\min_{W \in ℜ^{L \times C}} {‖ H W - Y ‖}^{2}

(13)

where

‖ \cdot ‖

represents the Frobenius norm and

H = [h (x_{1}); h (x_{2}); \dots; h (x_{N})] \in R^{N \times L}

is the output matrix of the hidden layer and

Y = {[y_{1}, y_{2}, \dots, y_{N}]}^{T} \in R^{N \times C}

is the target matrix of the training set (i.e., labels of the training samples) [43]. Consequently, the output weights W can be rewritten as follows:

W = H^{†} Y

(14)

where

H^{†} = {(H H^{T})}^{- 1} H

is the Moore-Penrose generalized inverse of matrix H.

Given a new sample

x_{N + 1} \in ℜ^{d}

, its network response

o_{N + 1}

is generated by the obtained network output weights W.

o_{N + 1} = W^{T} φ (x_{N + 1})

(15)

where

φ (x_{N + 1})

is the network hidden layer output for

x_{N + 1}

.

For the binary classification applications, the decision function of ELM is:

y_{N + 1}^{*} = \underset{x_{N + 1} \in {1, 2, \dots C}}{\arg} \max (o_{N + 1}), C = 2 .

(16)

2.3. Post-Processing

In the process of classification, misclassification may commonly occur. Specifically, the shadows may be detected as objects incorrectly, and the objects may be misclassified as shadows. Figure 2 shows the shadow detection results of some frames in different scenes.

From Figure 2c, we can see that the detected shadows and objects are comprised of some isolated regions compared with the groundtruths shown in Figure 2b. After the observation, we summarize that there are two situations for the misclassification. (1) There are some dark regions (such as windshields in the second column of Figure 2a) in moving objects, which are often mistakenly detected as shadows. (2) The color similarity or the texture similarity between moving objects and the corresponding background (such as the shirt of the person in the fourth column of Figure 2a) is consistent with the similarity between their moving shadows and the background. It will lead to the incorrect classification of moving objects as moving shadows. In many cases, the situations mentioned above are inevitable. To solve the problem, the post-processing is performed to ensure the integrity of moving objects and moving shadows for further applications in computer vision. This is designed by the spatial correlation and geometric properties of shadows and objects. There are two operations in post-processing: size discrimination of candidate moving objects and moving shadows and border discrimination of candidate moving shadows.

(1) Size discrimination of candidate moving shadows and moving objects

Generally, the candidate moving shadow consists of shadow regions classified correctly and some small object blobs classified incorrectly, as well as the candidate moving objects shown in Figure 2c. To remove the misclassified blobs, we first utilize a connected component labelling algorithm to mark candidate moving shadows and moving objects respectively, in order to generate different labelled sub-regions. Then, a size filter is performed to redress small misclassified blobs. Take the candidate moving shadows as an example, we expound the operation in detail.

For a candidate moving shadow mask M^S, a connected component algorithm is performed, then a series of connected regions are generated:

M^{S} = {R_{1}, R_{2}, \dots, R_{n}}, i = 1, 2, \dots, n

(17)

where R_i is the ith connected sub-region and n is the number of sub-regions.

Next, the sub-regions in the set

M^{S}

are sorted in terms of their sizes and the sub-regions have small sizes will be filtered and recognized as object regions.

R_{i} \in {\begin{array}{l} s h a d o w & N u m (R_{i}) < α \cdot n u m \\ o b j e c t & o t h e r w i s e \end{array}

(18)

where

N u m (R_{i})

denote the number of pixels in the sub-region R_i. num is the number of pixels in the maximum sub-region, and

α

is an empirical threshold,

α \in [0, 0.2]

. Likewise, the same operations are performed on the candidate moving object M^o.

(2) Border discrimination of candidate moving shadows

In practice, the true shadow pixels will occur at the edges of blobs. In other words, if a part of a moving object is misclassified as a shadow, most of the boundary of this region will be located inside the candidate moving object, as displayed in Figure 2c. Similarly, if a shadow candidate is a true shadow, then more than half of the boundary should be adjacent to the boundary of moving objects. Therefore, the boundary information of the candidate shadow region can be helpfully exploited to determine whether the region is a shadow or not. First, the candidate moving objects and moving shadows are segmented by the sobel edge algorithm. Then, a connected component labelling algorithm is also performed to mark each region and compute the edge of the region. For one candidate shadow region, the number of all boundary shadow pixels N_s and the number of boundary shadow pixels N_o which are adjacent to the boundary of a candidate moving object region are obtained, respectively. Consequently, we can determine the candidate region to a shadow according to the following rule:

R_{i} \in {\begin{array}{l} s h a d o w & \frac{N_{o}}{N_{s}} > 0.5 \\ o b j e c t & o t h e r w i s e \end{array}

(19)

The results after post-processing are given in Figure 2d. Obviously, the post processing can refine shadow detection results and play a very important role in correcting misclassification results.

2.4. Moving Cast-Shadow Detection Algorithm

Based on the above discussions, the proposed moving cast shadow detection algorithm based on the extreme learning machine (MCSD-ELM) is summarized in Algorithm1.

Algorithm1 MCSD-ELM algorithm

Input:t original frames F^t, t groundtruths G^t, background image B and the t+1 foreground frame F^t⁺¹.
Output: The moving cast shadows and moving objects in the foreground frame F^t⁺¹.
Step 1. Randomly select N labelled pixels from t groundtruths G^t and extract pixel-level and region-level features according to Equations (1)–(11).
Step 2. Generate the feature descriptor

x_{i} = [x_{i 1}, x_{i 2}, \dots, x_{i d}] \in ℜ^{d}, i = 1, 2, \dots, N

, and the label vector

y_{i} = [y_{i 1}, y_{i 2}, \dots, y_{i C}] \in ℜ^{C}

with C-dimensions of x_i to obtain the training set

{(x_{i}, y_{i})}_{i = 1}^{N}

.
Step 3. Initiate an ELM with L hidden neurons, random input weights and bias values, and compute the output vector h_i(x) of the hidden layer. Then the output matrix

H \in ℜ^{N \times L}

is formed.
Step 4. Compute the output weights W using Equation (14).
Step 5. For one new testing foreground frame F^t⁺¹, extract pixel-level and region-level features according to Equations (1)–(11) and form the feature descriptor

x_{m}^{t + 1}

for each foreground pixel (

m = 1, 2, \dots, f p

, fp is the number of moving pixels in F^t⁺¹).
Step 6. Calculate the network response of each foreground pixel according to the obtained output weights W, denoted by

o_{m}^{t + 1} = W^{T} φ (x_{m}^{t + 1})

in which

φ (x_{m}^{t + 1})

is the network hidden layer output for

x_{m}^{t + 1}

of the mth pixel in F^t⁺¹.
Step 7. Determine the class label

y_{m}^{t + 1}

according to Equation (16) and generate the candidate moving shadow mask M^S and candidate moving objects mask M^O.
Step 8. Perform the post-processing on the candidate M^S and M^O to obtain the refined classification results.

3. Experiments and Result Analysis

In this section, we first introduce two common datasets which are used to validate the superiority of the proposed method. Then, the quantitative evaluation metrics are illustrated. Finally, we compare the proposed method with several representative well-known methods quantitatively and qualitatively.

3.1. Datasets

In our experiments, we assume that the foreground detection masks (including the moving cast shadows and moving objects) are available and select two datasets consisting of 13 scenes (indoor and outdoor) to test the performance of our proposed method.

The first dataset [56,57] has six popular and widely tested scenes, which is summarized as follows. Moreover, the details of the first popular dataset are listed in Table 1.

(1) Campus: It is an outdoor scene, in which the shadow is relatively large and weak because of the presence of multi-light effects.

(2) Highway: It is an outdoor scene of the traffic with different lighting conditions, in which the shadow is relatively large and strong.

(3) Intelligent Room: It is an indoor scene with different perspectives and lighting conditions, in which the shadow is medium and weak.

(4) Laboratory: It is an indoor scene with various lighting conditions, in which the shadow is medium and weak.

(5) Hallway: It is an indoor scene with a textured background, in which the shadow is variable and weak.

(6) CAVIAR: It is an indoor scene with different lighting conditions, in which the shadow is variable and weak. Besides, it has an obvious shadow color blending because of a strong background reflection.

The second dataset is selected from the CDnet dataset [58] including a large number of benchmarks, which is illustrated in Table 2 in detail.

(1) Cubicle: It is a typical indoor scene with strong light in the front of the view and has strong camouflages between the walking staffs and the background.

(2) Bungalows: It is an outdoor scene of the traffic with various vehicles, in which the shadow is relatively large and strong.

(3) BusStation: It is an outdoor scene, in which the shadow is static, medium and strong casting by buildings in front of the bus station and peoples coming out from the station or passing in front of the station.

(4) PeopleInShade: It is a challenging outdoor scene, in which people walk under a large shaded area. It has foreground-background camouflages and non-textured-dark regions except strong shadows.

(5) Seam, Senoon, Sepm: They are outdoor scenes acquired from the same camera with different time. They have strong shadows with various sizes.

3.2. Evaluation Metrics

To quantitively evaluate the performance of shadow detection methods, two metrics [5], shadow detection rate (

η

) and shadow discrimination rate (

ξ

) are defined as follows:

η = \frac{T P_{S}}{T P_{S} + F N_{S}} \times 100 %, ξ = \frac{T P_{O}}{T P_{O} + F N_{O}} \times 100 %

(20)

where TP_S and TP_O are the numbers of moving shadow pixels and moving object pixels correctly detected, respectively. FN_S is the number of moving shadow pixels classified as moving object pixels incorrectly. Likewise, FN_O is the number of moving object pixels classified as moving shadow pixels incorrectly.

To consider the performance of shadow detection and shadow discrimination simultaneously, the third measure is given by:

A v g = \frac{η + ξ}{2}

(21)

where Avg is the mean value of the shadow detection rate

η

and shadow discrimination rate

ξ

.

3.3. Comparisons on a Popular Dataset

The quantitative results obtained by the proposed method and nine comparative methods are summarized as Table 3 and the best result Avg of each scene is marked with bold. Some quantitative results of comparative methods refer to original literature [9,22] and some are provided by the authors of the literature [26,31]. It can be observed that the proposed method has the highest shadow detection rate

η

on CAVIAR, Hallway, Laboratory and Intelligent Room and has the best shadow discrimination rate

ξ

on Campus. From the aspect of the mean value Avg, the proposed method performs well on Campus, CAVIAR, Hallway, Highway and Intelligent Room compared with the existing state-of-the-art methods. In particular, the mean value Avg of the proposed method is higher than the literature [22] more about 3.0% on Campus and CAVIAR. Moreover, the proposed method is slightly lower than the literature [27] about 0.14% on Laboratory. To evaluate the performance comprehensively, the averages of six different scenes are calculated which demonstrate that the proposed method is superior to all comparative methods. Significantly, the average of the proposed method is higher more than the second-highest method [22] about 1.97%.

The qualitative evaluation results of five representative state-of-the-art methods and the proposed method on six different scenes are shown in Figure 3, in which the moving shadow pixels are marked with green and the moving object pixels are marked with red. From Figure 3, we can see that the proposed method has an excellent capability of discriminating moving shadow pixels from foreground frames in both outdoor and indoor. Although pixel-based methods are sensitive to noise, the proposed method can improve the robustness by post-processing. Obviously, the integrities of moving shadows and moving objects have been maintained well.

3.4. Comparisons on a Large Dataset

Table 4 illustrates the shadow detection comparison results of the proposed method and eight representative advanced methods intuitively. Most of the quantitative results of comparative methods refer to original literature [22] and some are provided by other authors [24,26]. In terms of shadow detection rate

η

, the proposed method achieves the highest accuracy on PeopleInShade, Cubicle, Senoon and Sepm, while it is worse than the method proposed by Wang et al. [24] on BusStation, Bungalows and Seam about 5.74%, 1.40% and 1.82%, respectively. From the aspect of shadow discrimination rate

ξ

, the proposed method has the best capability of separating moving shadows from moving objects on most scenes except Bungalows. For the sake of comprehensively evaluating the performance, the mean metric Avg is calculated for comparison. It can be seen that the proposed method has the best performance on PeopleInShade, BusStation, Cubicle, Seam, Senoon and Sepm compared with the existing methods, while it has lower accuracy than the second-highest method proposed by Wang et al. [24] of about 4.52% on Bungalows. Moreover, we can see that the proposed method is superior to the comparative methods according to the averages of seven scenes. In particular, the average of our method is higher than the second-highest method [24] of about 6.35%. In addition, for the three scenes, such as Seam, Senoon and Sepm, from the same camera with different time, our method exhibits the best performance. As summarized above, it demonstrates that our method is effective and robust in various indoor and outdoor scenes.

To validate the effectiveness and superiority of the proposed method visually, the qualitative comparison results obtained by our method and the recent well-known method presented in [22] on seven different scenes are displayed in Figure 4. Apparently, it can be seen that the proposed method can separate moving shadows accurately from foreground frames in different outdoor scenes with strong shadows as well as the indoor scene with weak shadows. Similarly, it is also robust to noises and guarantees the integrity of moving shadows and moving objects.

3.5. Comparisons with Some Representative Deep-Learning Methods

Furthermore, the proposed method is also compared with two state-of-the-art methods based on the convolutional neural network and the quantitative comparative results listed in Table 5 which are obtained from the literature [34]. Considering the metrics of shadow detection rate

η

, shadow discrimination rate

ξ

and the average among of them, the proposed method is superior to the method suggested by Long et al. [37] on the most scenes except on Campus, and it performs better than the method designed by Lee et al. [38] on all test scenes. The reasons are summarized as follows. (1) our method is implemented with the learning features obtained by ELM based on the discriminative hand-crafted features, while shallow features derived from original frames are learnt by convolutional neural network methods. (2) The deep-learning methods have a large number of parameters to estimate. When the sample size is small, the deep-learning methods cannot be able to optimize parameters well, which will lead to the under-learning phenomenon. While our method based on ELM requires fewer optimization parameters, it can achieve better performance with fewer labelled samples.

3.6. Parameter Sensitivity Analysis

In this section, to analyze the influence of different hidden layers L on the proposed method, Figure 5 illustrates the variation curves of classification rates on the two datasets. Obviously, the classification rate is not sensitive to the value of the parameter L. Specifically, with an increase of the value of L, the performance of the proposed method is being improved. After achieving the peak of classification rate, the performance of the proposed method is decreasing with an increase of the value of L. Therefore, we can see that neither too small nor too large L is appropriate for our method from Figure 5.

To analyze the performance of the proposed MCSD-ELM with varying the number of training samples, Figure 6 displays classification rates of our method with different numbers of training samples on two datasets. It can be seen that with the increase of training samples, the classification rates of the proposed method are increasing. Moreover, when the number of training samples increases to a certain extent, the accuracy tends to be stable. It is well known that the time cost will be higher with increasing the number of training samples. Therefore, neither too small nor too large the number of training samples is appropriate for our method.

4. Conclusions

In this study, we have proposed a novel moving cast-shadow detection method using the supervised ELM. In contrast to the conventional methods, the proposed method not only incorporates pixel-level features but also explores region-level features according to the correlations among neighboring pixels to form input data for constructing the MCSD-ELM model. On the one hand, the proposed model only needs to turn one parameter which has little effect on the accuracy and can automatically determine one pixel whether it is a shadow or not. On the other hand, the post-processing operation can further improve the classification performance and guarantee the integrity of moving cast shadows and moving objects. We have evaluated the performance of the proposed method on two publicly available datasets. Compared with some representative state-of-the-art methods, the extensive experimental results indicate the effectiveness and robustness to noises of our method.

Considering that the process of labelling data could be time-consuming and infeasible, we will research the semi-supervised methods for moving cast-shadow detection using labelled and unlabeled data in the future work. Moreover, we have also discovered that some researchers have proposed active annotation methods [42] for the single image or video sequences. Therefore, we also will study moving shadow detection based on deep learning on video sequences.

Author Contributions

Data curation, J.D., J.H., H.Z., Y.L. and J.G.; Formal analysis, Y.Y., J.D., C.W., J.H., H.Z., and J.G.; Funding acquisition, Y.Y. and J.D.; Methodology, J.D.; Supervision, Y.Y.; Writing–original draft, Y.Y.; Writing–review and editing, Y.Y., J.D., C.W., J.H., Y.L. and J.G.

Funding

This work is funded by the National Natural Science Foundation of China, grant number [61602221, 61602222, 61967010, 31872847, 61907007, 41661083], the Natural Science Foundation of Shandong Province, grant number [ZR2017QF011], the Weifang Science and Technology Development Plan Project, grant numbers [2018GX009, 2018GX004, 2019GX003], the Project of Doctoral Foundation of Weifang University, grant numbers [2015BS10, 2018BS11], the Provincial Key Research and Development Program of Jiangxi, grant number [20181ACE50030].

Conflicts of Interest

The authors declare no conflict of interest.

References

Chan, Y.-M.; Huang, S.-S.; Fu, L.-C.; Hsiao, P.-Y.; Lo, M.-F. Vehicle detection and tracking under various lighting conditions using a particle filter. IET Intell. Transp. Syst. 2012, 6, 1–8. [Google Scholar] [CrossRef]
Asaidi, H.; Aarab, A.; Bellouki, M. Shadow elimination and vehicles classification approaches in traffic video surveillance context. J. Vis. Lang. Comput. 2014, 25, 333–345. [Google Scholar] [CrossRef]
Candamo, J.; Shreve, M.; Goldgof, D.; Sapper, D.; Kasturi, R. Understanding Transit Scenes: A Survey on Human Behavior-Recognition Algorithms. IEEE Trans. Intell. Transp. Syst. 2009, 11, 206–224. [Google Scholar] [CrossRef]
Qu, L.; Tian, J.; Fan, H.; Li, W.; Tang, Y. Evaluation of shadow features. IET Comput. Vis. 2018, 12, 95–103. [Google Scholar] [CrossRef]
Prati, A.; Mikic, I.; Trivedi, M.; Cucchiara, R. Detecting moving shadows: Algorithms and evaluation. IEEE Trans. Pattern Anal. Mach. Intell. 2003, 25, 918–923. [Google Scholar] [CrossRef]
Sanin, A.; Sanderson, C.; Lovell, B.C. Shadow detection: A survey and comparative evaluation of recent methods. Pattern Recognit. 2012, 45, 1684–1695. [Google Scholar] [CrossRef]
Cucchiara, R.; Grana, C.; Piccardi, M.; Prati, A. Detecting moving objects, ghosts, and shadows in video streams. IEEE Trans. Pattern Anal. Mach. Intell. 2003, 25, 1337–1342. [Google Scholar] [CrossRef]
Choi, J.; Yoo, Y.J.; Choi, J.Y. Adaptive shadow estimator for removing shadow of moving object. Comput. Vis. Image Underst. 2010, 114, 1017–1029. [Google Scholar] [CrossRef]
Varghese, A.; Sreelekha, G. Sample-based integrated background subtraction and shadow detection. IPSJ Trans. Comput. Vis. Appl. 2017, 9, 25. [Google Scholar] [CrossRef]
Salvador, E.; Cavallaro, A.; Ebrahimi, T. Cast shadow segmentation using invariant color features. Comput. Vis. Image Underst. 2004, 95, 238–259. [Google Scholar] [CrossRef]
Melli, R.; Prati, A.; Cucchiara, R.; de Cock, L.; Traficon, N. Predictive and Probabilistic Tracking to Detect Stopped Vehicles. In Proceedings of the WACV/MOTION, Breckenridge, CO, USA, 5–7 January 2005; pp. 388–393. [Google Scholar]
Cavallaro, A.; Salvador, E.; Ebrahimi, T. Shadow-aware object-based video processing. IEEE Proc. Vis. Image Signal Process. 2005, 152, 398–406. [Google Scholar] [CrossRef]
Ishida, S.; Fukui, S.; Iwahori, Y.; Bhuyan, M.; Woodham, R.J. Shadow Detection by Three Shadow Models with Features Robust to Illumination Changes. Procedia Comput. Sci. 2014, 35, 1219–1228. [Google Scholar] [CrossRef]
Wang, J.; Wang, Y.; Jiang, M.; Yan, X.; Song, M. Moving cast shadow detection using online sub-scene shadow modeling and object inner-edges analysis. J. Vis. Commun. Image Represent. 2014, 25, 978–993. [Google Scholar] [CrossRef]
Sun, B.; Li, S. Moving cast shadow detection of vehicle using combined color models. In Proceedings of the IEEE 2010 Chinese Conference on Pattern Recognition (CCPR), Chongqing, China, 21–23 October 2010; pp. 1–5. [Google Scholar]
Huang, J.B.; Chen, C.S. Moving cast shadow detection using physics-based features. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 2310–2317. [Google Scholar]
Leone, A.; Distante, C. Shadow detection for moving objects based on texture analysis. Pattern Recognit. 2007, 40, 1222–1233. [Google Scholar] [CrossRef]
Qin, R.; Liao, S.; Lei, Z.; Li, S.Z. Moving cast shadow removal based on local descriptors. In Proceedings of the 2010 20th International Conference on Pattern Recognition, Istanbul, Turkey, 23–26 August 2010; pp. 1377–1380. [Google Scholar]
Khare, M.; Srivastava, R.K.; Khare, A. Moving shadow detection and removal–a wavelet transform based approach. IET Comput. Vis. 2014, 8, 701–717. [Google Scholar] [CrossRef]
Sanin, A.; Sanderson, C.; Lovell, B.C. Improved shadow removal for robust person tracking in surveillance scenarios. In Proceedings of the IEEE 2010 20th International Conference on Pattern Recognition, Istanbul, Turkey, 23–26 August 2010; pp. 141–144. [Google Scholar]
Bullkich, E.; Ilan, I.; Moshe, Y.; Hel-Or, Y.; Hel-Or, H. Moving shadow detection by nonlinear tone-mapping. In Proceedings of the IEEE 2012 19th International Conference on Systems, Signals and Image Processing (IWSSIP), Vienna, Austria, 11–13 April 2012; pp. 146–149. [Google Scholar]
Wang, B.; Chen, C.L.P. Optical reflection invariant-based method for moving shadows removal. Opt. Eng. 2018, 57, 093102. [Google Scholar] [CrossRef]
Liu, Z.; Huang, K.; Tan, T.; Wang, L. Cast shadow removal combining local and global features. In Proceedings of the 2007 IEEE Conference on Computer Vision and Pattern Recognition, Minneapolis, MN, USA, 18–23 June 2007; pp. 1–8. [Google Scholar]
Wang, B.; Chen, C.P.; Li, Y.; Zhao, Y. Hard shadows removal using an approximate illumination invariant. In Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, AB, Canada, 15–20 April 2018; pp. 1628–1632. [Google Scholar]
Tang, C.; Ahmad, M.O.; Wang, C. An efficient method of cast shadow removal using multiple features. Signal Image Video Process. 2013, 7, 695–703. [Google Scholar] [CrossRef]
Wang, B.; Yuan, Y.; Zhao, Y.; Zou, W. Adaptive moving shadows detection using local neighboring information. In Asian Conference on Computer Vision; Springer: Cham, Switzerland, 2016; pp. 521–535. [Google Scholar]
Gomes, V.; Barcellos, P.; Scharcanski, J. Stochastic shadow detection using a hypergraph partitioning approach. Pattern Recognit. 2017, 63, 30–44. [Google Scholar] [CrossRef]
Hsieh, J.-W.; Hu, W.-F.; Chang, C.-J.; Chen, Y.-S. Shadow elimination for effective moving object detection by Gaussian shadow modeling. Image Vis. Comput. 2003, 21, 505–516. [Google Scholar] [CrossRef]
Guo, R.; Dai, Q.; Hoiem, D. Single-image shadow detection and removal using paired regions. In Proceedings of the IEEE CVPR, Colorado Springs, CO, USA, 20–25 June 2011; pp. 2033–2040. [Google Scholar]
Joshi, A.; Papanikolopoulos, N. Learning to Detect Moving Shadows in Dynamic Environments. IEEE Trans. Pattern Anal. Mach. Intell. 2008, 30, 2055–2063. [Google Scholar] [CrossRef]
Dai, J.; Han, D.; Zhao, X. Effective moving shadow detection using statistical discriminant model. Optik 2015, 126, 5398–5406. [Google Scholar] [CrossRef]
Lalonde, J.F.; Efros, A.A.; Narasimhan, S.G. Detecting ground shadows in outdoor consumer photographs. In European Conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2010; pp. 322–335. [Google Scholar]
Russell, M.; Zou, J.J.; Fang, G.; Cai, W. Feature-based image patch classification for moving shadow detection. IEEE Trans. Circuits Syst. Video Technol. 2017, 29, 2652–2666. [Google Scholar] [CrossRef]
Lin, C.-W. Moving cast shadow detection using scale-relation multi-layer pooling features. J. Vis. Commun. Image Represent. 2018, 55, 504–517. [Google Scholar] [CrossRef]
Shen, L.; Wee Chua, T.; Leman, K. Shadow optimization from structured deep edge detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 2067–2074. [Google Scholar]
Khan, S.H.; Bennamoun, M.; Sohel, F.; Togneri, R. Automatic Shadow Detection and Removal from a Single Image. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 38, 431–446. [Google Scholar] [CrossRef]
Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
Lee, J.T.; Lim, K.T.; Chung, Y. Moving shadow detection from background image and deep learning. In Image and Video Technology; Springer: Cham, Switzerland, 2015; pp. 299–306. [Google Scholar]
Kim, D.S.; Arsalan, M.; Park, K.R. Convolutional Neural Network-Based Shadow Detection in Images Using Visible Light Camera Sensor. Sensors 2018, 18, 960. [Google Scholar] [CrossRef]
Vicente, T.F.Y.; Hou, L.; Yu, C.P.; Hoai, M.; Samaras, D. Large-scale training of shadow detectors with noisily-annotated shadow examples. In European Conference on Computer Vision; Springer: Cham, Switzerland, 2016; pp. 816–832. [Google Scholar]
Hu, X.; Zhu, L.; Fu, C.W.; Qin, J.; Heng, P.A. Direction-aware spatial context features for shadow detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 7454–7462. [Google Scholar]
Hou, L.; Vicente, T.Y.; Hoai, M.; Samaras, D. Large scale shadow annotation and detection using lazy annotation and stacked CNNs. IEEE Trans. Pattern Anal. Mach. Intell. 2019, 1. [Google Scholar] [CrossRef]
Huang, G.; Huang, G.-B.; Song, S.; You, K. Trends in extreme learning machines: A review. Neural Netw. 2015, 61, 32–48. [Google Scholar] [CrossRef]
Liu, X.; Gao, C.; Li, P. A comparative analysis of support vector machines and extreme learning machines. Neural Netw. 2012, 33, 58–66. [Google Scholar] [CrossRef]
Akusok, A.; Björk, K.-M.; Miche, Y.; Lendasse, A. High-Performance Extreme Learning Machines: A Complete Toolbox for Big Data Applications. IEEE Access 2015, 3, 1011–1025. [Google Scholar] [CrossRef]
Cao, J.; Lin, Z. Extreme Learning Machines on High Dimensional and Large Data Applications: A Survey. Math. Probl. Eng. 2015, 2015, 103796. [Google Scholar] [CrossRef]
Yi, Y.; Qiao, S.; Zhou, W.; Zheng, C.; Liu, Q.; Wang, J. Adaptive multiple graph regularized semi-supervised extreme learning machine. Soft Comput. 2018, 22, 3545–3562. [Google Scholar] [CrossRef]
Ghimire, D.; Lee, J. Online sequential extreme learning machine-based co-training for dynamic moving cast shadow detection. Multimed. Tools Appl. 2016, 75, 11181–11197. [Google Scholar] [CrossRef]
Amato, A.; Mozerov, M.; Huerta, I.; Gonzalez, J.; Villanueva, J.J. Background subtraction technique based on chromaticity and intensity patterns. In Proceedings of the IEEE 2008 19th International Conference on Pattern Recognition, Tampa, FL, USA, 8–11 December 2008; pp. 1–4. [Google Scholar]
Phong, B.T. Illumination for computer generated pictures. Commun. ACM 1975, 18, 311–317. [Google Scholar] [CrossRef] [Green Version]
Romero, J.D.; Lado, M.J.; Méndez, A.J. A background modeling and foreground detection algorithm using scaling coefficients defined with a color model called lightness-red-green-blue. IEEE Trans. Image Process. 2017, 27, 1243–1258. [Google Scholar] [CrossRef]
Tsai, V. A comparative study on shadow compensation of color aerial images in invariant color models. IEEE Trans. Geosci. Remote Sens. 2006, 44, 1661–1671. [Google Scholar] [CrossRef]
Jacques, J.C.S.; Jung, C.R.; Musse, S.R. Background subtraction and shadow detection in grayscale video sequences. In Proceedings of the XVIII Brazilian Symposium on Computer Graphics and Image Processing (SIBGRAPI′05), Natal, Brazil, 9–12 October 2005; pp. 189–196. [Google Scholar]
Daugman, J.G. Two-dimensional spectral analysis of cortical receptive field profiles. Vis. Res. 1980, 20, 847–856. [Google Scholar] [CrossRef]
Heikkila, M.; Pietikainen, M. A texture-based method for modeling the background and detecting moving objects. IEEE Trans. Pattern Anal. Mach. Intell. 2006, 28, 657–662. [Google Scholar] [CrossRef] [Green Version]
Cast Shadow Detection dataset. Available online: http://vision.gel.ulaval.ca/~CastShadows/ (accessed on 30 September 2005).
CAVIAR dataset. Available online: http://homepages.inf.ed.ac.uk/rbf/CAVIAR/ (accessed on 1 July, 2007).
Wang, Y.; Jodoin, P.M.; Porikli, F.; Konrad, J.; Benezeth, Y.; Ishwar, P. CDnet 2014: An expanded change detection benchmark dataset. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Columbus, OH, USA, 23–28 June 2014; pp. 387–394. [Google Scholar]

Figure 1. Overview of the proposed moving cast-shadow detection approach.

Figure 2. Shadow detection results of some frames in different scenes. (a) original frames; (b) groundtruths; (c) classification results. The shadows are marked with red and objects are marked with green; (d) final results after post-processing.

Figure 3. Visual shadow detection results of six scenes generated by the proposed method and comparative methods on the popular dataset. (a) input frames; (b) groundtruths; (c) results obtained by [21]; (d) results obtained by [20]; (e) results obtained by [31]; (f) results obtained by [26]; (g) results obtained by [22]; (h) results obtained by our method.

Figure 4. Visual comparisons of shadow-detection results on the large dataset. (a) input frame; (b) groundtruths; (c) results obtained by [22]; (d) results obtained by our method.

Figure 5. Classification rates of our method with a different number of hidden-layers on two datasets. (a) variation tendency on the popular dataset; (b) variation tendency on the large dataset.

Figure 6. Classification rates of our method with different numbers of training samples on two datasets. (a) variation tendency on the popular dataset; (b) variation tendency on the large dataset.

Table 1. The detailed information of a popular dataset.

Scene	Highway	Hallway	Laboratory	CAVIAR	Campus	Intelligent Room
Video frame
Groundtruth
Scene Type	Outdoor	Indoor	Indoor	Indoor	Outdoor	Indoor
Labeled frames	10	13	14	164	53	100
Total frames	440	1800	887	2725	1181	300
Frame size	320 × 240	320 × 240	320 × 240	384 × 288	352 × 288	320 × 240
Shadow strength	Strong	Weak	Weak	Weak	Weak	Weak
Object type	Vehicle	People	People	People	Vehicle/People	People
Shadow size	Large	Large/Medium	Medium	Variable	Large/Medium	Medium

Table 2. The detail information of a large dataset.

Scene	Cubicle	BusStation	Bungalows	Seam	Senoon	Sepm	PeopleInShade
Video frame
Groundtruth
Scene Type	Indoor	Outdoor	Outdoor	Outdoor	Outdoor	Outdoor	Outdoor
Labeled frames	3210	873	636	151	54	37	666
Total frames	7400	1250	1700	460	1501	251	1199
Frame size	352 × 240	360 × 240	360 × 240	320 × 240	320 × 240	320 × 240	380 × 244
Shadow strength	Strong/Weak	Strong	Strong	Strong	Strong	Strong	Strong
Object type	People	People	Vehicle	People	People	People	People
Shadow size	Large/Medium	Medium	Large	Small/Medium	Small/Medium	Small/Medium	Large

Table 3. Shadow detection results of the proposed method and compared methods on the popular dataset.

Scene	Metric	Bulkich [21]	Sanin [20]	Lalonde [32]	Guo [29]	Dai [31]	Wang [26]	Gomes [27]	Varghese [9]	Wang [22]	Joshi [30]	Proposed
Campus	$η$	74.59	77.74	13.90	89.39	74.30	78.16	84.36	82.20	77.64	—	79.35
	$ξ$	77.60	83.39	86.58	20.67	84.90	80.11	78.95	75.10	91.83	—	96.22
	Avg	76.10	80.56	50.24	55.03	79.60	79.14	81.66	78.65	84.73	—	87.79
CAVIAR	$η$	91.57	93.77	30.80	69.62	92.49	93.69	94.76	—	94.91	—	97.28
	$ξ$	86.76	82.07	78.75	23.45	99.26	87.72	81.15	—	94.55	—	98.34
	Avg	89.16	87.92	54.78	46.54	95.87	90.71	87.96	—	94.73	—	97.81
Hallway	$η$	90.43	95.18	91.05	91.51	83.45	95.20	95.16	91.30	94.85	—	95.83
	$ξ$	77.06	95.73	19.76	19.09	99.25	97.97	95.94	91.40	98.79	—	98.01
	Avg	83.74	95.46	55.41	55.30	91.35	96.59	95.55	91.35	96.82	—	96.92
Highway	$η$	72.00	81.84	60.54	67.69	73.75	65.69	87.89	89.50	84.62	83.44	88.24
	$ξ$	95.08	94.26	75.56	75.06	97.55	91.88	94.34	70.10	93.63	89.76	94.66
	Avg	83.54	88.05	68.05	71.38	85.65	78.79	91.11	79.80	89.13	86.60	91.45
Laboratory	$η$	67.94	85.99	17.18	86.43	89.17	85.83	92.50	82.90	87.46	—	92.72
	$ξ$	73.69	96.65	81.73	50.07	92.66	94.96	93.23	88.50	95.28	—	92.73
	Avg	70.82	91.32	49.45	68.25	90.91	90.39	92.86	85.70	91.37	—	92.72
Intelligent Room	$η$	72.86	91.36	16.14	72.38	84.37	93.99	92.47	79.50	91.13	81.23	96.65
	$ξ$	97.48	94.50	88.17	80.88	94.27	92.16	93.47	88.70	96.92	85.12	95.26
	Avg	85.17	92.93	52.16	76.63	89.32	93.07	92.97	84.10	94.02	83.18	95.95
Average	$η$	68.78	87.65	38.27	79.50	82.92	85.43	91.19	85.08	88.44	82.33	91.68
	$ξ$	84.61	91.10	71.76	44.87	94.65	90.80	89.51	82.76	95.17	86.47	95.87
	Avg	81.42	89.37	55.02	62.19	88.78	88.12	90.35	83.92	91.80	84.89	93.77

Table 4. Shadow detection results of the proposed method and compared methods on the large dataset.

Scene	Metric	Prati et al. [5]	Hsieh et al. [28]	Huang et al. [16]	Leone et al. [17]	Sanin et al. [20]	Wang et al. [26]	Wang et al. [24]	Wang et al. [22]	Proposed
PeopleInShade	$η$	9.83	57.34	8.96	51.88	32.99	35.56	—	75.90	99.65
	$ξ$	97.10	52.50	97.31	79.64	97.37	90.32	—	94.40	97.88
	Avg	53.47	54.92	53.14	65.76	65.18	62.94	—	85.15	98.76
BusStation	$η$	50.72	20.52	60.15	51.62	36.00	63.49	93.80	65.18	88.06
	$ξ$	83.05	64.45	94.30	84.95	94.33	89.72	88.90	95.00	95.17
	Avg	66.89	42.49	77.22	68.28	65.17	76.60	91.35	80.09	91.62
Cubicle	$η$	92.37	59.27	67.11	26.95	89.40	69.64	—	89.81	99.96
	$ξ$	91.56	68.66	91.10	89.64	94.66	84.39	—	87.19	97.36
	Avg	91.96	63.96	79.10	58.30	92.03	77.01	—	88.50	98.66
Bungalows	$η$	3.81	59.91	15.79	8.13	70.23	3.57	90.90	85.02	89.50
	$ξ$	74.75	55.43	89.79	78.26	64.16	94.57	92.10	78.08	84.46
	Avg	39.28	57.67	52.79	43.20	67.20	49.07	91.50	81.55	86.98
Seam	$η$	19.22	57.39	56.04	69.06	22.66	73.44	99.60	79.10	97.78
	$ξ$	59.92	64.61	87.11	67.75	79.08	77.65	72.80	85.70	99.35
	Avg	39.57	61.00	71.57	68.40	50.87	75.54	86.20	82.40	98.57
Senoon	$η$	—	—	—	—	—	—	95.60	52.69	99.19
	$ξ$	—	—	—	—	—	—	85.30	68.54	98.69
	Avg	—	—	—	—	—	—	90.45	60.62	98.94
Sepm	$η$	—	—	—	—	—	—	95.80	44.95	94.96
	$ξ$	—	—	—	—	—	—	78.90	83.73	98.09
	Avg	—	—	—	—	—	—	87.35	64.34	96.52
Average	$η$	54.21	41.78	53.23	62.32	60.11	68.45	95.14	70.38	95.59
	$ξ$	68.30	85.11	81.81	72.06	92.35	89.00	83.60	84.66	95.86
	Avg	61.25	63.45	67.52	67.19	76.23	78.73	89.37	77.52	95.72

Table 5. Shadow detection comparison results with some representative deep-learning methods.

Scene	Metric	Campus	Highway	Hallway	Laboratory	Intelligent Room	CAVIAR
Long et al. [37]	$η$	86.32	81.85	81.22	83.34	91.45	—
	$ξ$	96.97	90.42	83.27	76.54	85.31	—
	Avg	91.65	86.14	82.25	79.94	88.38	—
Lee et al. [38]	$η$	74.30	—	89.60	86.20	80.90	84.00
	$ξ$	84.90	—	92.20	84.30	93.70	93.70
	Avg	79.60	—	90.90	85.25	87.30	88.85
Proposed	$η$	79.35	86.24	94.02	92.72	96.65	97.28
	$ξ$	96.22	94.66	98.01	92.73	95.26	98.34
	Avg	87.79	90.45	96.02	92.72	95.95	97.81

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yi, Y.; Dai, J.; Wang, C.; Hou, J.; Zhang, H.; Liu, Y.; Gao, J. An Effective Framework Using Spatial Correlation and Extreme Learning Machine for Moving Cast Shadow Detection. Appl. Sci. 2019, 9, 5042. https://doi.org/10.3390/app9235042

AMA Style

Yi Y, Dai J, Wang C, Hou J, Zhang H, Liu Y, Gao J. An Effective Framework Using Spatial Correlation and Extreme Learning Machine for Moving Cast Shadow Detection. Applied Sciences. 2019; 9(23):5042. https://doi.org/10.3390/app9235042

Chicago/Turabian Style

Yi, Yugen, Jiangyan Dai, Chengduan Wang, Jinkui Hou, Huihui Zhang, Yunlong Liu, and Jin Gao. 2019. "An Effective Framework Using Spatial Correlation and Extreme Learning Machine for Moving Cast Shadow Detection" Applied Sciences 9, no. 23: 5042. https://doi.org/10.3390/app9235042

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

An Effective Framework Using Spatial Correlation and Extreme Learning Machine for Moving Cast Shadow Detection

Abstract

1. Introduction

2. Extreme Learning Machine (ELM)-Based Moving Cast Shadow Detection

2.1. Feature Extraction

2.1.1. Pixel-Level Features

2.1.2. Region-Level Features

2.1.3. Feature Descriptor

2.2. Classification Using Extreme Learning Machine

2.3. Post-Processing

2.4. Moving Cast-Shadow Detection Algorithm

3. Experiments and Result Analysis

3.1. Datasets

3.2. Evaluation Metrics

3.3. Comparisons on a Popular Dataset

3.4. Comparisons on a Large Dataset

3.5. Comparisons with Some Representative Deep-Learning Methods

3.6. Parameter Sensitivity Analysis

4. Conclusions

Author Contributions

Funding

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI