In the following sections, we present a general approach to remove the geometric constraints on the detected partitions in the previously described 2D empirical wavelets. First, we give a mathematical construction of wavelet filters is based on arbitrarily shaped supports and shows that these filters permit us to define a general empirical wavelet transform. We prove the existence of dual filters, allowing the definition of the inverse transform. Second, we present a new algorithm to detect 2D partitions of arbitrary geometry, inspired by the 1D scale-space algorithm discussed in
Section 2.3.
3.1. 2D Empirical Wavelets of Arbitrary Supports
Let us assume we are given a discrete image
$f\in {L}^{2}\left(\mathsf{\Lambda}\right)$ where
$\mathsf{\Lambda}=[1,\dots ,\mathcal{N}]\times [1,\dots ,\mathcal{N}]$ is the image domain (pixels coordinates will be denoted
$(i,j)\in \mathsf{\Lambda}$) and an arbitrary partition,
${\left\{{\Omega}_{n}\right\}}_{n=1}^{N}$, of the Fourier domain (denoted
$\Omega =[1,\dots ,\mathcal{N}]\times [1,\dots ,\mathcal{N}]$, frequency coordinates will be denoted
$(k,l)\in \Omega $), whose detection will be discussed in
Section 3.2. We assume the partition
${\left\{{\Omega}_{n}\right\}}_{n=1}^{N}$ has the following properties:
${\bigcup}_{n=1}^{N}{\Omega}_{n}=\Omega $ and
${\Omega}_{n}\cap {\Omega}_{m}=\varnothing $ if
$n\ne m$, whose boundaries delineate the expected supports. In order to define a transition region similar to the one shown in
Figure 1 for the 1D EWT, we must define the distance from any point in the image to these boundaries. As such, for each region
${\Omega}_{n}$, we define a distance transform of the region:
where
$\partial {\Omega}_{n}$ is the boundary of the region
${\Omega}_{n}$ and
$d(.,.,.,.)$ is the quasi-Euclidean distance, defined by
This is done such that, for each region, we define any point in the image spectrum by its distance to the region’s boundary. From there, we can define a 2D empirical wavelet as
where
$\tau $ is the desired transition area width and
$\beta \left(x\right)$ is an arbitrary
${\mathcal{C}}^{k}$ function such that
A usual choice for
$\beta $ is
Unlike in the 1D case or the existing 2D cases, there are no theoretical results for the appropriate choice of $\tau $, and therefore it must be chosen experimentally. Nevertheless, we can show that the set $\left\{{\phi}_{n}\right\}$ forms a frame.
Proposition 1. Denoting ${\phi}_{n,i,j}={\phi}_{n}(\xb7-i,\xb7-j)$, the set $\left\{{\phi}_{n,i,j}\right\}$ forms a frame.
Proof. Assuming, for now that (to be shown later)
$\exists A,B\in \mathbb{R},0<A\le B<\infty $, such that
$\forall (k,l)\in \Omega $
then, using Parseval’s theorem, we get
which implies that
$\left\{{\phi}_{n,i,j}\right\}$ forms a frame of bounds
A and
B.
It remains to show that such bounds
A and
B exist such that
$\forall (k,l)\in \Omega ,A\le {\sum}_{n=1}^{N}{|{\widehat{\phi}}_{n,k,l}|}^{2}\le B$. We can observe that three situations, as shown in
Figure 3, can occur:
- -
$(k,l)$ lies outside of a transition area
- -
$(k,l)$ lies in a transition area between only two regions
- -
$(k,l)$ lies in a transition area between three or more regions
It is easy to see that, if $(k,l)$ is not within a transition area, ${\phi}_{n,k,l}$ is nonzero for only one index n, and, by construction, we have ${\sum}_{n=1}^{N}{|{\widehat{\phi}}_{n,k,l}|}^{2}=1$. It remains to show the bounds if $(k,l)$ lies within a transition area.
The lower bound is decided when the transition area is defined between two regions. In this situation, we have
However, from the definition of the distance transform, we have
${D}_{2}(k,l)=-{D}_{1}(k,l)$, hence
Since
$\frac{\tau +{D}_{1}(k,l)}{2\tau}+\frac{\tau -{D}_{1}(k,l)}{2\tau}=1$ and
$\beta \left(x\right)+\beta (1-x)=1$,
and since
$\mathrm{cos}\left(x\right)=\mathrm{sin}(x+\frac{\pi}{2})$, this is equivalent to
Therefore, our lower bound is $A=1$.
In the situation where the transition area is between r regions, we have $|{\widehat{\phi}}_{1,k,l}{|}^{2}+|{\widehat{\phi}}_{2,k,l}{|}^{2}+|{\widehat{\phi}}_{3,k,l}{|}^{2}+\cdots +|{\widehat{\phi}}_{r,k,l}{|}^{2}=1+|{\widehat{\phi}}_{3,k,l}{|}^{2}+\cdots +{|{\widehat{\phi}}_{r,k,l}|}^{2}$. Since $\forall n=1,2,\cdots ,N,\forall (k,l)\in \Omega ,|{\widehat{\phi}}_{n,k,l}|\le 1$, we therefore get an upper bound $B=r-1$ which completes the proof. □
With our set of empirical wavelets in hand, the corresponding transform is then defined by
Since
$\forall (k,l)\in \Omega ,{\sum}_{n=1}^{N}{|{\widehat{\phi}}_{n,k,l}|}^{2}>0$, we can define the corresponding dual frame
$\left\{{\tilde{\phi}}_{n,i,j}\right\}$ via
The existence of such dual frame allows us to reconstruct the original signal
f from its empirical wavelet transform:
3.2. On the Detection of Partitions of Arbitrary Geometry
In this section, we address the question of how to detect partitions of the Fourier domain corresponding to meaningful harmonic modes of arbitrary geometry. The proposed method uses a 2D scale-space representation to find the “center” of harmonic modes, followed by a watershed transform to find the arbitrary boundaries delimiting the different regions ${\Omega}_{n}$. These two steps are described in detail in the next two sections. From now on, we assume our image f is discrete of size $\mathcal{N}\times \mathcal{N}$.
3.2.1. Scale-Space Localization of Harmonic Modes
In
Section 2.3, we showed that the expected 1D boundaries corresponded to meaningful local minima which can be found using scale-space representations. Unfortunately, 2D boundaries separating the different regions correspond to arbitrary curves and it becomes very hard to characterize what a meaningful curve is. Since our goal is to separate harmonic modes, assuming that these modes are well enough separated lobes in the spectrum, we propose first to detect the potential candidates by selecting meaningful local maxima. To do so, we follow a similar approach as in
Section 2.3 but looking for persistent maxima instead of minima. To resume the process, we build a 2D scale-space representation of the image magnitude spectrum, i.e.,
$g=|\widehat{f}|$, detect all local maxima at each scale
s, and measure their lifespans (i.e., the largest
s before they disappear). From there, we create a histogram of the persistence of the maxima and use Otsu’s method to define a threshold to classify each maxima as persistent or not. The locations of all persistent maxima, denoted
$\left\{{\mu}_{n}\right\}$, are then extracted; these locations represent “centers” of the expected harmonic modes. Because we are working in a discrete space, the scale step-size
${s}_{0}$ is a parameter and we set the maximum scale proportional to
${s}_{0}$ and the image size,
${s}_{max}=4\frac{{s}_{0}\mathcal{N}}{K}$, where
K is the size of the kernel used to create the scale-space representation.
Figure 4a illustrates the tracking of maxima through the scale-space, while
Figure 4b shows the remaining persistent maxima after thresholding.
3.2.2. Watershed Transforms
Given a set of mode centers,
$\left\{{\mu}_{n}\right\}$, we wish to find a set of boundaries which will define the mode supports. Taking again the point of view that modes correspond to lobes, then a natural way to find such boundaries is to select the bottom of the valleys between the modes. From a mathematical perspective, such a process corresponds to finding the path of lowest separation (this idea is a direct extension of the principle used in [
15], for finding 1D boundaries, where the lowest minimum between meaningful peaks was chosen).
The watershed transform, first proposed by Beucher et al. in [
24], is an image segmentation technique that defines a contour based on the path of highest separation between minima. Based on the geographic definition of watersheds and catchment basins, the watershed transform treats an image as a topographic landscape with pixel intensity representing the height at that pixel. Then, the transforms separate the image into its catchment basins, with a watershed contour separating them. This process can be described as follows.
Given an image g and a set of its minima, $\left\{{x}_{i}\right\}$, we uniformly “flood” the topographic landscape produced from g, with the water collecting at the minima. When one body of water flows into another, we construct a barrier. Once complete, we define the contour $\Gamma $ along the barrier.
Later, Meyer et al. in [
25,
26] proposed a method in which one morphologically reconstructs the image in such a way to impose minima at select markers
$\left\{{M}_{n}\right\}$. This is done by first forcing each marker to be a minimum, by setting
$g\left({M}_{n}\right)=-\infty $ for all
n, and then by filling the catchment basins of each minimum
${x}_{i}\notin \left\{{M}_{n}\right\}$ in such a way that the region contains no extrema. This approach reconstructs the image so that the only minima are at the markers
$\left\{{M}_{n}\right\}$ and thus reformulates the watershed transform such that the generated contour
$\Gamma $ lays along the path of highest separation between selected markers, rather than all minima.
To find the set of boundaries that will define the expected supports, we first invert the magnitude spectrum, ${f}_{-}=-|\widehat{f}|$, so that the watershed transform will define the path of lowest separation, rather than highest separation. Then, we impose minima to be at the location of persistent maxima, $\left\{{\mu}_{n}\right\}$. Finally, we apply the watershed transform, which defines a boundary $\Gamma $ on the magnitude spectrum $|\widehat{f}|$ that is along the paths of lowest separation between the locations of persistent maxima $\left\{{\mu}_{n}\right\}$. The boundary $\Gamma $ defines a partition with regions $\left\{{\Omega}_{n}\right\}$.
Since the watershed transform defines some pixels as part of the boundary, we must assign these pixels to a region. This assignment is not critical, since they belong to the path of lowest separation, but it should be symmetric if
f is a real valued image. Furthermore, if
f is real valued, we must pair regions symmetrically with respect to the origin. This can be achieved using Algorithm 1.
Algorithm 1 Boundary pixel assignment and regions symmetrization. |
Input: A set of unpaired maxima locations $\left\{{\mu}_{n}\right\}$ and corresponding partition $\left\{{\Omega}_{n}\right\}$, where n = $1,2,\cdots ,N$ |
${m}_{P}\leftarrow \left\{\begin{array}{c}\frac{N+1}{2}\phantom{\rule{4.pt}{0ex}}\mathrm{if}\phantom{\rule{4.pt}{0ex}}(0,0)\in \left\{{\mu}_{n}\right\}\hfill \\ \frac{N}{2}\phantom{\rule{4.pt}{0ex}}\mathrm{otherwise}\hfill \end{array}\right.$ |
for $n=1,2,\cdots N$ do |
${\Theta}_{n}=\{m|\pm {\mu}_{n}\in {\Omega}_{m}\}$ |
${\tilde{\Omega}}_{n}={\displaystyle \bigcup _{m\in {\Theta}_{n}}}{\Omega}_{m}$ |
$\left\{{\mu}_{n}\right\}\leftarrow \left\{{\mu}_{n}\right\}-\left\{{\mu}_{{\Theta}_{n}}\right\}$ |
end for |
Output: Set of paired regions $\left\{{\Omega}_{n}\right\}\leftarrow \left\{{\tilde{\Omega}}_{n}\right\}$, where $n=1,2,\cdots ,N$ |