MLTSP: New 3D Framework, Based on the Multilayer Tensor Spectrum Pyramid

Kountcheva, Roumiana A.; Mironov, Rumen P.; Kountchev, Roumen K.

doi:10.3390/sym14091909

Open AccessArticle

MLTSP: New 3D Framework, Based on the Multilayer Tensor Spectrum Pyramid

by

Roumiana A. Kountcheva

¹

,

Rumen P. Mironov

²

and

Roumen K. Kountchev

^2,*

¹

TK Engineering, 1582 Sofia, Bulgaria

²

Department of Radio Communications and Video Technologies, Faculty of Telecommunications, Technical University of Sofia, 1000 Sofia, Bulgaria

^*

Author to whom correspondence should be addressed.

Symmetry 2022, 14(9), 1909; https://doi.org/10.3390/sym14091909

Submission received: 29 July 2022 / Revised: 1 September 2022 / Accepted: 7 September 2022 / Published: 12 September 2022

(This article belongs to the Section Computer)

Download

Browse Figures

Versions Notes

Abstract

:

A tensor representation structure based on the multilayer tensor spectrum pyramid (MLTSP) is introduced in this work. The structure is “truncated”, i.e., part of the high-frequency spectrum coefficients is cut-off, and on the retained low-frequency coefficients, obtained at the output of each pyramid layer, a hierarchical tensor SVD (HTSVD) is applied. This ensures a high concentration of the input tensor energy into a small number of decomposition components of the tensors obtained at the coder output. The implementation of this idea is based on a symmetrical coder/decoder. An example structure for a cubical tensor of size 8 × 8 × 8, which is represented as a two-layer tensor spectrum pyramid, where 3D frequency-ordered fast Walsh–Hadamard transform and HTSVD are used, is given in this paper. The analysis of the needed mathematical operations proved the low computational complexity of the new approach, due to a lack of iterative calculations. The high flexibility of the structure in respect to the number of pyramid layers, the kind of used orthogonal transforms, the number of retained spectrum coefficients, and HTSVD components, permits us to achieve the desired accuracy of the restored output tensor, imposed by the application. Furthermore, this paper presents one possible application for 3D object searches in a tensor database. In this case, to obtain the invariant representation of the 3D objects, in the spectrum pyramid, the 3D modified Mellin–Fourier transform is embedded, and the corresponding algorithm is shown.

Keywords:

tensor decomposition; multilayer tensor spectrum pyramid; 3D orthogonal transform; hierarchical tensor SVD; 3D modified Mellin–Fourier transform

1. Introduction

Processing, storage, transfer, and analysis of multidimensional (MD) images, mathematically represented as MD tensors, recently became a question of particular focus studied in numerous research works. In the last years, various tensor decomposition methods have been developed [1,2,3,4], among which are the multilinear extensions of the matrix SVD, and the generalizations of the SVD matrix for higher-order tensors, called higher-order SVD (HOSVD). This includes several well-known methods: canonical polyadic decomposition (CPD) also known as parallel factor analysis (PARAFAC), where the tensor is represented as a sum of rank-one tensors; the hierarchical Tucker decomposition (H-Tucker), which is a higher-order form of the principal component analysis (PCA); the multilinear PCA (MPCA) [5]; the non-negative tensor factorization (NTF) [6]; the tensor-train decomposition (TTD) [7], etc. For the tensor decomposition components calculation, different iterative methods are used: the alternating least square (ALS) [8]; the tensor power iteration; the higher-order eigenvalue decomposition; the Jacobi algorithm; the QR factorization [9], etc.

Additionally, for object detection in tensor multispectral images, mathematical models of stochastic random fields for the background description are suggested [10]. In this work, the existing contemporary approaches for object detection is based on convolutional neural network (CNN) deep learning with structures of the kind region-based CNN (R-CNN), fast R-CNN, YOLO (You Only Live Once), EfficientDet (efficient object detection), SSD (single-shot detector), etc., which are compared with the introduced new approach, and it is indicated that neural networks ensure high efficiency of object detection, but unlike the approach based on the stochastic random fields models, they demand significant computational resources and a huge amount of priori visual information about the objects.

Tensors play, nowadays, the leading part in modern deep learning architectures. As is known, deep neural networks sometimes perform in an extremely complicated regime, manipulating tens of millions of parameters—much larger than the training data amount. Unlike this, the use of tensor decompositions in deep learning architectures results in a significant reduction in the number of unknown parameters [11].

To achieve computational complexity reduction, a new non-iterative approach for MD tensor image representation based on the multilayer tensor spectrum pyramid (MLTSP) [12] with embedded 3D orthogonal transforms (3D OTs) and hierarchical tensor SVD (HTSVD) [13] is proposed in this work. This approach is illustrated by an example representation of a tensor of the size 8 × 8 × 8 through two-layer tensor spectrum pyramid (2LTSP), with embedded 3D frequency-ordered fast Walsh–Hadamard transform (3D FO-FWHT) [14] and HTSVD for a tensor of the size 2 × 2 × 2 (HTSVD_2×2×2) [13]. Furthermore, one application of the presented pyramidal structure aimed at the accelerated search of 3D objects represented through MLTSP and 3D modified Mellin–Fourier Transform (3D MMFT), which is an upgrade from the previous research of the authors [15], is given in this work.

In the next sections of the paper, more details are given about the new approach; i.e., Section 2 contains the basic relations, which represent the two-layer tensor spectrum pyramid implementation for a tensor of the size 8 × 8 × 8; in Section 3, the HTSVD decomposition algorithm for a spectrum tensor of the size 2 × 2 × 2 (which is the building element in the introduced pyramidal structure) is given; in Section 4, the fast algorithm for 3D FO-FWHT calculation is explained; Section 5 evaluates the computational complexity of 3D-FWHT; in Section 6, the general algorithm for 3D object search in a database of 3D objects represented through MLTSP is given; in Section 7, the invariant 3D object representation based on the 3D MMFT is given; and Section 8 contains the Conclusion.

2. Basic Relations, Which Represent the Two-Layer Tensor Spectrum Pyramid with 3D OT and HTSVD

To explain the structure of the multilayer TSP, we use, as an example, the 2LTSP, which comprises a coder and decoder. The structure of the decoder is mirror-symmetrical to that of the coder. The corresponding block diagrams are shown in Figure 1a,b.

The following symbols are used: X—cubical tensor of size N = 2ⁿ for N = 8; r = 0, 1—the 2LTSP layer number; S—cubical spectrum tensor of size N = 2ⁿ; F(.)—operator for direct 3D orthogonal transform (3D OT) (for example, one of the following: 3D FFT [16], 3D FO-AHKLT [17], 3D FO-FWHT [14], etc.); F_T(.)—operator for direct truncated orthogonal transform (3D TOT); F⁻¹(.)—operator for 3D inverse orthogonal transform (3D IOT); HTSVD_2×_2×₂—hierarchical tensor singular value decomposition for a 2 × 2 × 2 elementary tensor [13].

2.1. Description of the 2LTSP Coder Performance

The performance of the coder (Figure 1a) is presented through the following operations:

In the first layer, for r = 0:

{\hat{S}}_{0} = F_{T} (X)

(1)

{\hat{S}}_{0, 2 \times 2 \times 2} = {\hat{S}}_{0, 2 \times 2 \times 2} (1) + {\hat{S}}_{0, 2 \times 2 \times 2} (2) + {\hat{S}}_{0, 2 \times 2 \times 2} (3) + {\hat{S}}_{0, 2 \times 2 \times 2} (4) = \sum_{t = 1}^{4} {\hat{S}}_{0, 2 \times 2 \times 2} (t)

(2)

\hat{X} = F^{- 1} ({\hat{S}}_{0}), E_{0} = X - \hat{X}

(3)

Here,

{\hat{S}}_{0}

denotes the spectrum tensor, which is the approximation of X in the layer r = 0;

E_{0}

—the difference tensor in the layer r = 0;

{\hat{S}}_{0, 2 \times 2 \times 2}

—the spectrum sub-tensor of the size 2 × 2 × 2 in the layer r = 0, which comprises eight low-frequency spectrum coefficients of the selected 3D truncated orthogonal transform (3D TOT);

\hat{X}

—the approximation of tensor X;

{\hat{S}}_{0, 2 \times 2 \times 2} (t)

—the SVD component t for the sub-tensor

{\hat{S}}_{0, 2 \times 2 \times 2}

.

In the second layer, for r = 1:

{\hat{S}}_{1} (u) = F_{T} {E_{0} (u)} for u = 1, 2, . . ., 8;

(4)

{\hat{S}}_{1, 2 \times 2 \times 2} (u) = {\hat{S}}_{1, 2 \times 2 \times 2} (u, 1) + {\hat{S}}_{1, 2 \times 2 \times 2} (u, 2) + {\hat{S}}_{1, 2 \times 2 \times 2} (u, 3) + {\hat{S}}_{1, 2 \times 2 \times 2} (u, 4) = \sum_{t = 1}^{4} {\hat{S}}_{1, 2 \times 2 \times 2} (u, t), for u = 1, 2, \dots, 8,

(5)

where

E_{0} (u)

denotes the u^th difference sub-tensor of the size 4 × 4 × 4;

{\hat{S}}_{1} (u)

—the u^th spectrum sub-tensor of the size 4 × 4 × 4, which is the approximation of the sub-tensor

S_{1} (u)

in the layer r = 1;

{\hat{S}}_{1, 2 \times 2 \times 2} (u)

—the u^th spectrum sub-tensor of the size 2 × 2 × 2 in the layer r = 1, which comprises eight low-frequency spectrum coefficients of the selected 3D TOT;

{\hat{S}}_{1, 2 \times 2 \times 2} (u, t)

—the SVD component t of the sub-tensor

{\hat{S}}_{1, 2 \times 2 \times 2} (u)

. At the coder output in the layer r = 0 and in correspondence with Equation (2), four HTSVD_2×_2×₂ components are obtained, arranged following the decreasing values of their dispersions. At the coder output in the layer r = 1, 32 components of HTSVD_2×_2×₂ are obtained, calculated in correspondence with Equation (5) for the sub-tensors u = 1, 2, …, 8, and arranged following the decreasing values of their dispersions.

2.2. Description of the 2LTSP Decoder Performance

Because of the mirror symmetry existing in the coder/decoder structures (Figure 1), the performance of the decoder is presented through operations, inverse to these in the coder.

In the first layer, for r = 0:

{\hat{S}}_{0, 2 \times 2 \times 2} = \sum_{t = 1}^{4} {\hat{S}}_{0, 2 \times 2 \times 2} (t), {\hat{S}}_{0} = \cup_{t = 1}^{4} {\hat{S}}_{0, 2 \times 2 \times 2} (t)

(6)

\hat{X} = F^{- 1} ({\hat{S}}_{0})

(7)

In accordance with Equation (6), the tensor

{\hat{S}}_{0}

is restored by uniting the sub-tensors

{\hat{S}}_{0, 2 \times 2 \times 2} (t)

in correspondence with Figure 1b, and the values of the remaining spectrum coefficients (

{\hat{S}}_{0}

voxels) are replaced by zeros.

In the second layer, for r = 1:

{\hat{S}}_{1, 2 \times 2 \times 2} (u) = \sum_{t = 1}^{4} {\hat{S}}_{1, 2 \times 2 \times 2} (u, t) for u = 1, 2, . . ., 8;

(8)

{\hat{S}}_{1} = \cup_{u = 1}^{8} \cup_{t = 1}^{4} {\hat{S}}_{1, 2 \times 2 \times 2} (u, t), {\hat{E}}_{0} = F^{- 1} ({\hat{S}}_{1})

(9)

X^{'} = {\hat{E}}_{0} + \hat{X} .

(10)

In accordance with Equation (9), the tensor

{\hat{S}}_{1}

is restored through uniting the sub-tensors

{\hat{S}}_{1, 2 \times 2 \times 2} (u, t)

in correspondence with Figure 1b, and the values of the remaining spectrum coefficients (voxels of

{\hat{S}}_{1}

) are replaced by zeros. In cases that in the process of 2LTSP coder performance is used, 3D OT instead of 3D TOT, at the output of the decoder 2LTSP, the restored tensor X is obtained, i.e., the transform of the input tensor X through 2LTSP is reversible.

The difference between the original and restored tensors defines the error tensor, ΔX:

Δ X = X - X^{'} = X - ({\hat{E}}_{0} + \hat{X}) .

(11)

The restoration accuracy of the tensor

X^{'}

depends on:

(1): The number of retained low-frequency spectrum coefficients, which compose the cubical spectrum tensors ${\hat{S}}_{0, 2 \times 2 \times 2}$ and ${\hat{S}}_{1, 2 \times 2 \times 2} (u)$ for u = 1, 2, …, 8 (in the last case, each spectrum tensor comprises eight coefficients);
(2): The number of retained components of each HTSVD_2×2×2, applied on the spectrum tensors ${\hat{S}}_{0, 2 \times 2 \times 2}$ and ${\hat{S}}_{1, 2 \times 2 \times 2} (u)$ .

The restoration accuracy of tensor

X^{'}

increases together with the number of retained HTSVD spectrum coefficients, and the use of 3D FO-AHKLT as a basis for 2LTSP. The presented approach could be easily generalized for MTSP of more than two layers. The increase in the number of layers results in a lower power of the error tensor, ΔX.

3. HTSVD Algorithm for Spectrum Tensor Decomposition

In this case, on each tensor S_2×_2×₂, obtained at the outputs 0 and 1 of the 2LTSP coder shown in Figure 1a, HTSVD_2×_2×₂ is applied. As a result, a sum of four components is obtained, for which the energy of each tensor is concentrated mainly in the first and second components. This permits us to reduce (cut off) the number of low-energy components without lessening the quality of the restored tensor, in correspondence with Figure 1b.

The block diagram of the computational graph of the two-level HTSVD_2×2×2 algorithm for decomposition of the elementary tensor S_2×2×2 is shown in Figure 2. The decomposition is based on SVD for the matrix X of size 2 × 2, denoted as SVD_2×2, and described by the relation below [13]:

X = [\begin{matrix} a & b \\ c & d \end{matrix}] = \frac{1}{2 A} \{σ_{1} [\begin{matrix} \sqrt{r p} & \sqrt{s p} \\ \sqrt{r q} & \sqrt{s q} \end{matrix}] + σ_{2} [\begin{matrix} \sqrt{s q} & - \sqrt{r q} \\ - \sqrt{s p} & \sqrt{r p} \end{matrix}]\} = σ_{1} U_{1} V_{1}^{t} + σ_{1} U_{2} V_{2}^{t} = C_{1} + C_{2}

(12)

where a, b, c, and d are the elements of the matrix X,

A = \sqrt{ν^{2} + 4 η^{2}} \neq 0

,

σ_{1} = \sqrt{\frac{ω + A}{2}}

,

σ_{2} = \sqrt{\frac{ω - A}{2}}

,

U_{1} = \frac{1}{\sqrt{2 A}} [\begin{matrix} \sqrt{p} \\ \sqrt{q} \end{matrix}]

,

U_{2} = \frac{1}{\sqrt{2 A}} [\begin{matrix} - \sqrt{q} \\ \sqrt{p} \end{matrix}]

,

V_{1} = \frac{1}{\sqrt{2 A}} [\begin{matrix} \sqrt{r} \\ \sqrt{s} \end{matrix}]

,

V_{2} = \frac{1}{\sqrt{2 A}} [\begin{matrix} - \sqrt{s} \\ \sqrt{r} \end{matrix}]

,

r = A + ν,

p = A + μ,

s = A - ν,

q = A - μ,

ν = a^{2} + c^{2} - b^{2} - d^{2},

η = a b + c d,

μ = a^{2} + b^{2} - c^{2} - d^{2},

ω = a^{2} + b^{2} + c^{2} + d^{2}

. Since

σ_{1} > > σ_{2}

, the energy of the first decomposition component in (12) is much larger than that of the second. The number of parameters which define SVD_2×2 is four (

ν,

η,

μ,

ω

), i.e., the decomposition is not “over-complete”.

After mode1 unfolding (matricization) of the elementary tensor S_2×2×2, the following is obtained:

u n f o l d (S_{2 \times 2 \times 2}) = [\begin{matrix} a & b & e & f \\ c & d & g & h \end{matrix}] = [\begin{matrix} S_{1} & S_{2} \end{matrix}]

(13)

In the first level of the algorithm HTSVD for the tensor S_2×2×2 (HTSVD_2×2×2), for each matrix S₁ and S₂, an SVD of the size 2 × 2 (SVD_2×2) is executed, and as a result we obtain:

S_{1} = [\begin{matrix} a & b \\ c & d \end{matrix}] = C_{11} + C_{12}, S_{2} = [\begin{matrix} e & f \\ g & h \end{matrix}] = C_{21} + C_{22}

(14)

The matrices C_i,j of the size 2 × 2 for i, j = 1, 2, calculated in accordance with Equation (12), are rearranged into new couples in correspondence with their singular values. After this, the first couple of matrices, C₁₁ and C₂₁, which have high singular values, define the tensor S_1(2×2×2) by inverse matricization, and for the second couple, C₁₂ and C₂₂, which have lower singular values, the corresponding tensor is S_2(2×2×2). Then:

S_{2 \times 2 \times 2} = S_{1 (2 \times 2 \times 2)} + S_{2 (2 \times 2 \times 2)} .

(15)

After mode-3 unfolding of both tensors, the following is obtained:

u n f o l d (S_{1 (2 \times 2 \times 2)}) + u n f o l d (S_{2 (2 \times 2 \times 2)}) = [\begin{matrix} S_{11} & S_{21} \end{matrix}] + [\begin{matrix} S_{12} & S_{22} \end{matrix}] .

(16)

In the second HTSVD_2×2×2 level, SVD_2×2 is applied for each matrix S_i,j of the size 2 × 2, and this result is obtained:

S_{11} = C_{111} + C_{112}, S_{21} = C_{211} + C_{212}, S_{12} = C_{121} + C_{122}, S_{22} = C_{221} + C_{222} .

(17)

The matrices C_i,j,k of the size 2 × 2 for i, j, k = 1, 2 are rearranged into four new couples, following the decrease in their singular values. After inverse matricization, each of these four couples of matrices defines a corresponding tensor of size 2 × 2 × 2.

In result of the mode1, unfolding is obtained:

\begin{matrix} u n f o l d {S_{1 (2 \times 2 \times 2)} (1)} + u n f o l d {S_{1 (2 \times 2 \times 2)} (2)} + u n f o l d {S_{2 (2 \times 2 \times 2)} (1)} + u n f o l d {S_{2 (2 \times 2 \times 2)} (2)} = \\ [C_{111} C_{211}] + [C_{121} C_{221}] + [C_{112} C_{212}] + [C_{122} C_{222}] . \end{matrix}

(18)

After execution of the two HTSVD_2×2×2 levels, the tensor

S_{2 \times 2 \times 2}

is represented as:

S_{2 \times 2 \times 2} = S_{1 (2 \times 2 \times 2)} (1) + S_{1 (2 \times 2 \times 2)} (2) + S_{2 (2 \times 2 \times 2)} (1) + S_{2 (2 \times 2 \times 2)} (2) = \sum_{i = 1}^{2} \sum_{j = 1}^{2} S_{j (2 \times 2 \times 2)} (i) .

(19)

After the execution of the

S_{2 \times 2 \times 2}

decomposition, the tensors in the resulting sum are arranged following the decrease in the dispersion values for the sub-matrices obtained after the unfolding. The voxels of higher values in the decomposition (19) are colored in dark red in Figure 2, and those of lower values are in light blue. The decomposition of tensor

S_{2 \times 2 \times 2}

is a basic structure unit of HTSVD, when 3D tensors of size N × N × N for N = 2ⁿ are decomposed [13].

4. Algorithm for Calculation of the 3D Frequency-Ordered Fast Walsh–Hadamard Transform

The embedding of the 3D frequency-ordered fast Walsh–Hadamard Transform (3D FO-FWHT) in the 2LTSP coder/decoder ensures maximum acceleration of the tensor

X^{'}

restoration. This approach is suitable for applications where high execution speed is requested, as, for example, the efficient compression of multidimensional images, etc.

The calculation of 3D FO-FWHT for tensors of the size 8 × 8 × 8 could be executed in accordance with the diagrams, shown in Figure 3. The transform is separable and is executed through consecutive 1D FO-WHT calculations for the columns, rows, and tubes of the tensor X of the size N × N × N, for N = 2ⁿ. To speed up the 1D FO-WHT calculations, the famous “fast” algorithm is used [16]. The number of basic operations “addition” (A_R) needed for the execution of the “fast” 1D FO-WHT (1D FO-FWHT) in correspondence with the computational graph from Figure 3 is

A_{R} (n) = N \times \lg_{2} N = 2^{n} n

. For the case when the “fast“ truncated 1D FO-TWHT (i.e., 1D FO-FTWHT) is used with a reduction (truncation) in some of the output coefficients from N down to 2, the number of “additions”

A_{T} (n)

is defined by the relation:

A_{T} (n) = N + N / 2 + N / 4 + .. + N / 2^{n - 1} = N \sum_{p = 0}^{n - 1} 2^{- p} = 2 (2^{n} - 1) .

(20)

The corresponding graph from Figure 3 is colored in red. In this case, the number of calculations needed for the 1D FO-FWHT compared to 1D FO-FTWHT is defined by the relation below:

ψ (n) = \frac{A_{R} (n)}{A_{T} (n)} = \frac{n}{2 (1 - 2^{- n})}

(21)

Hence, for

n = 3

,

ψ (3) \approx 1.5

In accordance with Figure 3, for the 3D WHT transform of the tensor X, it is first divided into N frontal slices (2D matrices). Then, for each column of the sequence of 2D matrices, 1D FWHT is executed, followed by a similar operation for each row of the corresponding transformed matrices. The calculated tensor X comprises all matrices transformed this way. Then, the tensor is divided again, but into N horizontal slices (2D matrices), which are transformed column-by-column through 1D FWHT. After that, on each row of the obtained 2D matrices, 1D TFWHT is applied again (for unfolding mode3 only), which, in correspondence with Equation (20), needs 2(N-1) additions only. Hence, the total number of additions needed to transform the tensor X of the size N × N × N (N = 2ⁿ) through 3D FO-FWHT is correspondingly:

A_{3 D, T}^{} (n) = N^{2} [2 N l g_{2} N + 2 (N - 1)] = 2^{2 n + 1} (n 2^{n} + 2^{n} - 1) \approx 2^{3 n + 1} (n + 1) .

(22)

The total number of additions needed to transform the tensor X of the size N × N × N through 3D WHT is:

A_{3 D} = 3 N^{3} (N - 1) \approx 3 \times 2^{4 n}

(23)

Hence, the acceleration of the calculations when the modified “fast” 3D FO-WHT (i.e., 3D FO-FWHT) is used, compared to the 3D FO-TFWHT, is:

ψ_{3 D} (n) = \frac{A_{3 D}^{} (n)}{A_{3 D, T}^{} (n)} \approx \frac{3 \times 2^{4 n}}{2 \times 2^{3 n} (n + 1)} = 1.5 \frac{2^{n}}{n + 1} .

(24)

For the case when n = 3, from Equation (24) it follows that

ψ_{3 D} (3) \approx 3

, i.e., the acceleration of the calculations based on the 3D FO-FWHT compared to 3D FO-TFWHT for a tensor X of the size 8 × 8 × 8, is 2.6 times.

5. Comparative Evaluation of 3D-FWHT Computational Complexity

Here, a comparative evaluation of the computational complexity (CC) of the 3D-FWHT is given with respect to the known hierarchical tensor decompositions 3D discrete wavelet transform (3D-DWT) and H-Tucker. For the evaluation, the number of needed basic operations is used. In the general case, the number of additions (A_3H) for the hierarchical 3D-FWHT [14] when N = 2ⁿ (n denotes the number of decomposition levels) is:

A_{3 H} (n) = 3 \times 2^{3 n} n .

(25)

The needed number of multiplications for the execution of the direct 3D FWHT is

l_{3 H} (n) = 0

. Then, the CC of the 3D FWHT evaluated through the total number of operations

O_{3 H} (n)

is defined by the relation:

O_{3 H} (n) = A_{3 H} (n) + l_{3 H} (n) = 3 \times 2^{3 n} n .

(26)

The global number of additions and multiplications needed to transform the tensor of a size N × N × N through n-level 3D-DWT with orthogonal filters (3, 5) is correspondingly [14]:

A_{3 W T} (n) = 18 N^{3} \sum_{p = 0}^{n - 1} 8^{- p} = 20.5 {(2}^{3 n} - 1) and l_{3 WT} (n) = 24 N^{3} \sum_{p = 0}^{n - 1} 8^{- p} = 27.4 (2^{3 n} - 1) .

(27)

Then, the CC of the 3D-DWT is:

O_{3 W T} (n) = A_{3 W T} (n) + l_{3 W T} (n) \approx 48 \times 2^{3 n} n .

(28)

The H-Tucker decomposition [2] of the cubical tensor of minimum rank R = 2ⁿ, size N = 2ⁿ, and order d = 3 requires

O_{H T} (n) = (3 \times 2^{3 n} + 2 \times 2^{4 n})

operations. For the TT decomposition [7], the needed operations for the same tensor are

O_{T T} (n) = (3 \times 2^{4 n})

, i.e., the CC is approximately 1.5 times higher than that of the H-Tucker. This is why the H-Tucker transform was selected for the CC comparison with the analyzed 3D deterministic orthogonal transforms.

The relative CC (denoted as

ψ (n)

) for the execution of the 3D FWHT compared to that of the 3D-DWT and H-Tucker is given below:

ψ_{1} (n) = O_{3 W T} (n) / O_{3 H} (n) = (48 \times 2^{3 n}) / (3 \times 2^{3 n} n) \approx 16 / n;

(29)

ψ_{2} (n) = O_{H T} (n) / O_{3 H} (n) = 2^{3 n} (2^{n + 1} + 3) / (3 \times 2^{3 n} n) \approx 0.66 \times (2^{n} / n);

(30)

The relative CC of the 3D FWHT decreases in inverse proportion to n towards the 3D DWT, while compared to H-Tucker, it grows proportionally to

2^{n} / n

. The CC of the 3D FWHT is lower when compared to that of the 3D DWT for levels n < 16, while compared to H-Tucker, it starts to decrease for levels n ≥ 4. In this case, the value of the function

ψ_{2} (n) \geq 2.6

. The comparison results permit the choosing of the number of hierarchical decomposition levels n, for which the algorithm 3D FWHT is more efficient than 3D DWT and H-Tucker.

6. Global Algorithm for 3D Object Search in a Database of 3D Objects Represented through MLTSP

We give here one possible application of the tensor representation based on the n-layer MLTSP, aimed at the accelerated 3D object search in a related database (DB). Such visual information is usually represented by image sequences obtained from various sources: video cameras, multispectral scanners, computer tomography, etc. The authentication degree of the 3D object search is increased significantly compared to the case when 2D image representation only is used. The main problems, which arise when such 3D image systems are created, are related to the high computational complexity of the operations needed for the full search in the DB and the huge number of data needed for the 3D object representation. To overcome these problems, we propose here the global algorithm for accelerated 3D object search, shown in Figure 4.

We suppose that the 3D object DB is already created and contains normalized tensor object representations, each of size N × N × N (for N = 2ⁿ), calculated through the n-layer MLTSP. For the query, the 3D object used is a deep neural network [18]. The segmented object image is scaled so as to match the size of the objects in the DB, and its tensor is represented through the n-layer MLTSP. The basic principle which we used to search for the unknown 3D object is to detect the closest similar objects in the DB through sequential multilayer selection. In the first MLTSP layer (r = 0), the search is full, and, as a result, a group of closest similar objects is selected. The selection is based on the similarity degree between homonymous HTSVD components of the spectrum tensors of the size 2 × 2 × 2 in layers r = 0 of the corresponding MLTSPs. The small size of the compared THSVD components does not demand the use of significant computational resources for similarity evaluations. The search in the next MLTSP layer (r = 1) is also based on the similarity evaluation of the HTSVD components, but for the already selected group in the layer r = 0 only. As a result, the total number of computational operations needed for the similarity evaluation of the query tensor is reduced compared to the group from the previous layer. The search continues in the next MLTSP layers and stops when a 3D object is detected in the DB, whose similarity is higher than a predefined threshold. The thresholds used to select the groups of similar tensors in each layer of MLTSP are defined through training the corresponding neural network (NN).

The similarity degree for a couple of P-dimensional vectors

x = {[x_{0}, x_{1}, .., x_{P - 1}]}^{T}

and

y = {[y_{0}, y_{1}, .., y_{P - 1}]}^{T}

is calculated by using the squared cosine similarity (SCSim) criterion [15], defined by the relation:

S C S i m (x, y) = {[\sum_{i = 0}^{P - 1} \sum_{j - 0}^{P - 1} x (i) y (j)]}^{2} / [\sum_{i - 0}^{P - 1} x {(i)}^{2}] \times [\sum_{j = 0}^{P - 1} y {(j)}^{2}]

(31)

To use this criterion for similarity evaluation between a couple of tensors of the same size, first of all, unfolding of the same kind (mode 1, 2, or 3) for each tensor, X and Y, must be executed. In cases that the total number of vectors obtained after the unfolding is Q, for the similarity evaluation, the mean value of SCSim could be used for each couple of vectors x_q and y_q of the same sequential number q = 1, 2, …, Q, which belongs to X and Y, correspondingly:

S C S i m (X, Y) = \frac{1}{Q} \sum_{q = 1}^{Q} \{{[\sum_{i = 0}^{P - 1} \sum_{j - 0}^{P - 1} x_{q} (i) y_{q} (j)]}^{2} / [\sum_{i - 0}^{P - 1} x_{q} {(i)}^{2}] \times [\sum_{j = 0}^{P - 1} y_{q} {(j)}^{2}]\}

(32)

7. Invariant 3D Object Representation, Based on MLTSP with 3D Modified Mellin–Fourier Transform

The description of the 3D object should be invariant to 3D rotations (R) on angles α_r, β_r, γ_r, 3D translation shift (T) in directions x_s, y_s, z_s, 3D scaling (S), and contrast (C) changes. To obtain the invariant representation of a 3D object through MLTSP, the spectrum coefficients at the outputs of the decomposition layers r = 0, 1, …, n−1 must be calculated by using the 3D modified Mellin–Fourier transform (3D MMFT), which is an upgrade from the previous work of the authors, introduced in [15].

The block diagram of the algorithm for calculation of the direct 3D MMFT spectrum coefficients is shown in Figure 5. The main operations for the 3D MMFT execution are:

(1): Bi-polar transform of the voxels x(i, l, k) of the input tensor image X:

$L (i, l, k) = x (i, l, k) - (x_{\max} + 1) / 2 for i, l, k = 0, 1, . ., N - 1,$

(33)

where x_max is the maximum value in the voxel quantization scale.
(2): First direct 3D discrete Fourier transform (3D DFT):

$F (a, b, c) = \sum_{i = 0}^{N - 1} \sum_{l = 0}^{N - 1} \sum_{k}^{N - 1} L (i, l, k) exp {- j (2 π / N (ia + l b + k c)]} for a, b, c = 0, 1, . ., N - 1$

(34)

followed by centering of the Fourier coefficients F(a,b,c):

$F_{0} (a, b, c) = F (a - \frac{N}{2}, b - \frac{N}{2}, c - \frac{N}{2}) for a, b, c = 0, 1, . ., N - 1$

(35)
(3): Retaining the centered low-frequency spectrum Fourier coefficients:

$F_{0 R} (a, b, c) = \{\begin{matrix} F_{0} (a, b, c), i f (a, b, c) \in r e t a i n e d r e g i o n; \\ 0 - i n a l l o t h e r c a s e s . \end{matrix}$

(36)

The retained coefficients’ region is a cube with a side H ≤ N, which envelopes the center (0,0,0) of the 3D Fourier spectrum (H—even number). For H < N and

a, b, c = - (H / 2), - (H / 2) + 1, \dots, - 1, 0, 1, \dots, (H / 2) - 1

; this cube contains low-frequency coefficients only.

(4): Calculation of modules and phases of the retained coefficients $F_{0 R} (a, b, c) = D_{F_{0 R}} (a, b, c {) e}^{j φ_{F_{0 R}} (a, b, c)}$ :

$D_{F_{0 R}} (a, b, c) = |\sqrt{{[A_{F_{0 R}} (a, b, c)]}^{2} + {[B_{F_{0 R}} (a, b, c)]}^{2}}|, φ_{F_{0 R}} (a, b, c) = arctan [B_{F_{0 R}} (a, b, c) / A_{F_{0 R}} (a, b, c)]$

(37)

where $A_{0 R} (a, b, c)$ and $B_{0 R} (a, b, c)$ are the real and the imaginary components of $F_{0 R} (a, b, c)$ , correspondingly.
(5): Calculation of the retained coefficients’ normalized modules:

$D (a, b, c) = p \ln D_{F_{0 R}} (a, b, c) for p - the normalization coefficient;$

(38)
(6): Replacement of the orthogonal 3D discretization grid, which contains the voxels $D (a, b, c)$ , by a new grid, defined through 3D logarithmic spherical polar transform (3D LSPT), using the relations below:

$ρ = \log \sqrt{a^{2} + b^{2} + c^{2}}, θ = \arccos [c / \sqrt{a^{2} + b^{2} + c^{2}}], φ = \arctan [b / a]$

(39)

After discretization of

ρ, θ, ϕ

, the voxels

D (ρ_{i}, θ_{j}, ϕ_{j})

are obtained where:

ρ_{i} = [(\sqrt{2} / 2) H]^{i / H} for i = 1, 2, . ., H; θ_{j} = (2 π / H) j and ϕ_{j} = (2 π / H) j for j = (- H / 2), ., - 1, 0, 1, ., (H / 2) - 1 .

(40)

Here, instead of the 3D LSPT given in Equation (32), we used the 3D exponential polar transform (3D EPT) from Equation (33), which gives a higher density of the retained voxels

D (a, b, c)

around the origin (0,0,0) of the discretization grid

(ρ_{i}, θ_{j}, ϕ_{j})

.

(7): Replacement of the discretization grid for voxels $D (ρ_{i}, θ_{j}, ϕ_{j})$ defined through 3D EPT, by the orthogonal discretization 3D grid for voxels D(x,y,z), calculated through trilinear interpolation (Figure 6). In this case, each interpolated voxel D(x₁,y₁,z₁) is calculated taking into consideration the closest eight neighbor voxels on the grid:

$D (x_{1}, y_{1}, z_{1}) = (1 / H^{3}) {z_{1} [x_{1} (y_{1} B + y_{2} F) + x_{2} (y_{1} E + y_{2} A)] + z_{2} [y_{1} (x_{1} G + x_{2} L) + y_{2} (x_{1} C + x_{2} K)]}$

(41)

where A, B, C, K, E, F, G, L—neighboring voxels from the discretization grid $(x, y, z)$ ;

α, β, γ—distances between the eight neighboring voxels in directions x, y, z;

D(x₁,y₁,z₁)—linearly interpolated voxel for

x_{1} = α - x_{2}

,

y_{1} = β - y_{2}

,

z_{1} = γ - z_{1}

;

The obtained results are the voxels D(x,y,z) of the interpolated output tensor D for x, y, z = 0, 1, 2, .., H − 1.

(8): Second direct 3D DFT for a tensor D, with voxels D(x,y,z):

$S (a, b, c) = (1 / H^{3}) \sum_{x = 0}^{H - 1} \sum_{y = 0}^{H - 1} \sum_{z = 0}^{H - 1} D (x, y, z) exp {- j (2 π / N (xa + y b + z c)]} for a, b, c = 0, 1, .., H - 1 .$

(42)
(9): Calculation of the complex coefficient $S (a, b, c)$ modules:

$D_{S} (a, b, c) = |\sqrt{{[A_{S} (a, b, c)]}^{2} + {[B_{S} (a, b, c)]}^{2} + {[C_{S} (a, b, c)]}^{2}}|,$

(43)

where $A_{S} (a, b, c)$ , $B_{S} (a, b, c)$ , and $C_{S} (a, b, c)$ are the real and the imaginary components of $S (a, b, c)$ , correspondingly.
(10): Calculation of the normalized modules $D_{S_{0}} (a, b, c)$ of the Fourier coefficients $S (a, b, c)$ :

$D_{S_{0}} (a, b, c) = x_{max} \times [(D_{S} (a, b, c) / D_{S max} (a, b, c))],$

(44)

where $D_{S max} (a, b, c)$ is the module of the coefficient $S (a, b, c)$ with a maximum value for $a, b, c = 0, 1, .., H - 1$ .
(11): Calculation of the vector for RSTC-invariant 3D object representation based on the coefficients $D_{S_{0}} (a, b, c)$ of highest energy in the amplitude spectrum 3D MMFT, for $a, b, c = 0, 1, .., H - 1$ . The v_m components for m = 1, 2, .., R of the corresponding RSTC-invariant vector $V = {[v_{1}, v_{2}, .., v_{R}]}^{T}$ are defined by coefficients $D_{S_{0 R}} (a, b, c)$ , arranged as a 1D massif after scanning the 3D-MMFT spectrum in the frame of the 3D mask area with R < H³ voxels $D_{S_{0}} (a, b, c)$ .

The main advantages of 3D MMFT compared to other famous spectrum 3D object descriptors [19,20,21,22] are that it is an RSTC invariant, and the features’ space dimensionality is reduced. This is why the approach based on MLTSP with embedded 3D MMFT accelerates the search of 3D objects in tensor DBs and ensures higher identification accuracy.

8. Conclusions

In this work, a new approach is proposed for 3D tensor representation through MLTSP with embedded 3D FO-FWHT and HTSVD, or 3D FO-FWHT and 3D-MMFT depending on the application, for example, information redundancy reduction in the input tensor, or an accelerated 3D object search in a tensor DB. In both cases, the main advantages of the MLTSP algorithms are the low computational complexity, the high flexibility regarding the choice of their parameters, and the ability for information redundancy reduction in the input tensor. These qualities have good potential for application in various areas where multidimensional signals and images are concerned.

Our future work will be focused on MLTSP modeling with various kinds of 3D OT and on the investigation of spectrum pyramid applications in different areas of tensor image processing, analysis, and computer vision. The future investigation of the MLTSP application in deep learning architectures is extremely promising.

Author Contributions

Conceptualization, R.A.K., R.P.M., and R.K.K.; methodology, formal analysis, R.A.K.; investigation, R.P.M.; writing—original draft preparation, R.K.K.; writing—review and editing, R.A.K.; visualization, R.K.K. All authors have read and agreed to the published version of the manuscript.

Funding

This work was funded by the Bulgarian National Science Fund: Project No. KP-06-H27/16: “Development of efficient methods and algorithms for tensor-based processing and analysis of multidimensional images with application in interdisciplinary areas”.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

References

Kolda, T.; Bader, B. Tensor decompositions and applications. SIAM Rev. 2009, 51, 455–500. [Google Scholar] [CrossRef]
Cichocki, A.; Lee, N.; Oseledets, I.; Phan, A.; Zhao, Q.; Mandic, D. Tensor networks for dimensionality reduction and large-scale optimization: Part 1. Low-rank tensor decompositions. Found. Trends Mach. Learn. 2016, 9, 249–429. [Google Scholar] [CrossRef]
Bergqvist, G.; Larsson, E. The Higher-Order Singular Value Decomposition: Theory and an Application. IEEE Signal Processing Mag. 2010, 27, 151–154. [Google Scholar] [CrossRef]
De Lathauwer, L.; De Moor, B.; Vandewalle, J. A Multilinear Singular Value Decomposition. SIAM J. Matrix Anal. Appl. 2000, 24, 1253–1278. [Google Scholar] [CrossRef]
Lu, H.; Plataniotis, K.; Venetsanopoulos, A. MPCA: Multilinear Principal Component Analysis of tensor objects. IEEE Trans. Neural. Netw. 2008, 19, 18–39. [Google Scholar] [PubMed]
Shashua, A.; Hazan, T. Non-negative tensor factorization with applications to statistics and computer vision. In Proceedings of the 22nd International Conference on Machine Learning, Bonn, Germany, 7–11 August 2005; pp. 792–799. [Google Scholar]
Oseledets, I. Tensor-Train Decomposition. SIAM J. Sci. Comput. 2011, 33, 2295–2317. [Google Scholar] [CrossRef]
Mohlenkamp, M. Musings on multilinear fitting. Linear Algebra Its Appl. 2013, 438, 834–852. [Google Scholar] [CrossRef]
Golub, H.; Loan, V.; Charles, F. Matrix Computations, 3rd ed.; Johns Hopkins University Press: Baltimore, MD, USA, 1996. [Google Scholar]
Andriyanov, N.; Dementiev, V.; Tashlinskii, A. Detection of objects in the images: From likelihood relationships towards scalable and efficient neural networks. Comput. Opt. 2022, 46, 139–159. [Google Scholar] [CrossRef]
Panagakis, Y.; Kossaifi, J.; Chrysos, G.; Oldfield, J.; Nicolaou, M.; Anandkumar, A.; Zafeiriou, S. Tensor Methods in Computer Vision and Deep Learning. Proc. IEEE 2021, 109, 863–890. [Google Scholar] [CrossRef]
Kountchev, R.; Kountcheva, R. Low Computational Complexity Third-Order Tensor Representation through Inverse Spectrum Pyramid. In Advances in 3D Image and Graphics Representation, Analysis, Computing and InformationTechnology—Methods and Algorithms; Kountchev, R., Patnaik, S., Shi, J., Favorskaya, M., Eds.; Springer: Berlin/Heidelberg, Germany, 2020; Volume 1, pp. 63–76. [Google Scholar]
Kountchev, R.; Kountcheva, R. Comparative Analysis of the Hierarchical 3D-SVD and Reduced Inverse Tensor Pyramid in Regard to Famous 3D Orthogonal Transforms. In New Approaches for Multidimensional Signal Processing—Proceedings of International Workshop NAMSP 2020; Kountchev, R., Mironov, R., Shengqing, L., Eds.; Springer SIST series: Singapore, 2021; pp. 35–56. [Google Scholar]
Kountchev, R.; Mironov, R.; Kountcheva, R. Hierarchical Cubical Tensor Decomposition through Low Complexity Orthogonal Transforms. Symmetry 2020, 12, 864. [Google Scholar] [CrossRef]
Kountchev, R.; Kountcheva, R. Objects color segmentation and RSTC-invariant representation using adaptive color KLT and modified Mellin-Fourier transform. In Proceedings of the 8th IASTED International Conference on Signal Processing, Pattern Recognition, and Applications (SPPRA), Innsbruck, Austria, 16–18 February 2011; pp. 21–28. [Google Scholar]
Rao, K.; Kim, D.H.; Wang, J. Fast Fourier Transform: Algorithms and Applications; Springer: Dordrecht, The Netherlands, 2010. [Google Scholar]
Kountchev, R.; Mironov, R.; Kountcheva, R. Complexity Estimation of Cubical Tensor Represented through 3D Frequency-Ordered Hierarchical KLT. Symmetry 2020, 12, 1605. [Google Scholar] [CrossRef]
Minaee, S.; Boykov, Y.; Porikli, F.; Plaza AKehtarnavaz, N.; Terzopoulos, D. Image Segmentation Using Deep Learning: A Survey. arXiv 2020, arXiv:2001.05566v5. [Google Scholar] [CrossRef] [PubMed]
Tyagi, V. Content-Based Image Retrieval: Ideas, Influences, and Current Trends; Springer: Singapore, 2017. [Google Scholar]
Lee, C.; Shih JLiu, Y.; Han, C. 3D-DFT Spectrum and Cepstrum of Dense Local Cubes for 3D Model Retrieval. In Proceedings of the Fifth International Conference on Informatics and Applications, Takamatsu, Japan, 14–16 November 2016; pp. 1–11. [Google Scholar]
Lee, C.; Shih, J.; Liu, Y. 3D model retrieval using 3D-DFT spectral and cepstral descriptors of local octants. In Proceedings of the 3rd IIAE International Conference on Intelligent Systems and Image Processing, Fukuoka, Japan, 2–5 September 2015; pp. 63–69. [Google Scholar]
Vranic, D.; Saupe, D.; Richter, J. Tools for 3D-object retrieval: Karhunen-Loeve transform and spherical harmonics. In Proceedings of the IEEE Workshop Multimedia Signal Processing, Cannes, France, 3–5 October 2001; pp. 293–298. [Google Scholar]

Figure 1. 2LTSP encoder, based on 3D OT and HTSVD_2×2×2 for N = 8.

Figure 2. HTSVD_2×2×2 algorithm for decomposition of tensor S_2×2×2.

Figure 3. Block diagrams of algorithms 3D FO-FWHT and 3D FO-FTWHT.

Figure 4. Global algorithm for fast 3D object search in a tensor database, based on MLTSP.

Figure 5. Block diagram of the algorithm for calculation of 3D MMFT coefficients for a cubical tensor of size N × N × N.

Figure 6. Calculation of the voxel D(x₁, y₁, z₁) through trilinear interpolation on the basis of the closest eight neighbor voxels (colored in green) of the 3D discretization grid

(ρ_{i}, θ_{j}, ϕ_{j})

.

Figure 6. Calculation of the voxel D(x₁, y₁, z₁) through trilinear interpolation on the basis of the closest eight neighbor voxels (colored in green) of the 3D discretization grid

(ρ_{i}, θ_{j}, ϕ_{j})

.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Kountcheva, R.A.; Mironov, R.P.; Kountchev, R.K. MLTSP: New 3D Framework, Based on the Multilayer Tensor Spectrum Pyramid. Symmetry 2022, 14, 1909. https://doi.org/10.3390/sym14091909

AMA Style

Kountcheva RA, Mironov RP, Kountchev RK. MLTSP: New 3D Framework, Based on the Multilayer Tensor Spectrum Pyramid. Symmetry. 2022; 14(9):1909. https://doi.org/10.3390/sym14091909

Chicago/Turabian Style

Kountcheva, Roumiana A., Rumen P. Mironov, and Roumen K. Kountchev. 2022. "MLTSP: New 3D Framework, Based on the Multilayer Tensor Spectrum Pyramid" Symmetry 14, no. 9: 1909. https://doi.org/10.3390/sym14091909

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

MLTSP: New 3D Framework, Based on the Multilayer Tensor Spectrum Pyramid

Abstract

1. Introduction

2. Basic Relations, Which Represent the Two-Layer Tensor Spectrum Pyramid with 3D OT and HTSVD

2.1. Description of the 2LTSP Coder Performance

2.2. Description of the 2LTSP Decoder Performance

3. HTSVD Algorithm for Spectrum Tensor Decomposition

4. Algorithm for Calculation of the 3D Frequency-Ordered Fast Walsh–Hadamard Transform

5. Comparative Evaluation of 3D-FWHT Computational Complexity

6. Global Algorithm for 3D Object Search in a Database of 3D Objects Represented through MLTSP

7. Invariant 3D Object Representation, Based on MLTSP with 3D Modified Mellin–Fourier Transform

8. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI