Next Article in Journal
A Histone Deacetylase Inhibitor Manifests Synergistic Interaction with Artesunate by Suppressing DNA Repair Activity
Next Article in Special Issue
A Dual Multimodal Biometric Authentication System Based on WOA-ANN and SSA-DBN Techniques
Previous Article in Journal
Increasing Firm Performance through Industry 4.0—A Method to Define and Reach Meaningful Goals
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Concise Tutorial on Functional Analysis for Applications to Signal Processing

by
Najah F. Ghalyan
1,2,*,
Asok Ray
1,3,* and
William Kenneth Jenkins
4,*
1
Department of Mechanical Engineering, Pennsylvania State University, University Park, PA 16802, USA
2
Department of Mechanical Engineering, University of Kerbala, Kerbala 56001, Iraq
3
Department of Mathematics, Pennsylvania State University, University Park, PA 16802, USA
4
Department of Electrical Engineering, Pennsylvania State University, University Park, PA 16802, USA
*
Authors to whom correspondence should be addressed.
Submission received: 1 September 2022 / Revised: 18 September 2022 / Accepted: 13 October 2022 / Published: 21 October 2022

Abstract

:
Functional analysis is a well-developed field in the discipline of Mathematics, which provides unifying frameworks for solving many problems in applied sciences and engineering. In particular, several important topics (e.g., spectrum estimation, linear prediction, and wavelet analysis) in signal processing had been initiated and developed through collaborative efforts of engineers and mathematicians who used results from Hilbert spaces, Hardy spaces, weak topology, and other topics of functional analysis to establish essential analytical structures for many subfields in signal processing. This paper presents a concise tutorial for understanding the theoretical concepts of the essential elements in functional analysis, which form a mathematical framework and backbone for central topics in signal processing, specifically statistical and adaptive signal processing. The applications of these concepts for formulating and analyzing signal processing problems may often be difficult for researchers in applied sciences and engineering, who are not adequately familiar with the terminology and concepts of functional analysis. Moreover, these concepts are not often explained in sufficient details in the signal processing literature; on the other hand, they are well-studied in textbooks on functional analysis, yet without emphasizing the perspectives of signal processing applications. Therefore, the process of assimilating the ensemble of pertinent information on functional analysis and explaining their relevance to signal processing applications should have significant importance and utility to the professional communities of applied sciences and engineering. The information, presented in this paper, is intended to provide an adequate mathematical background with a unifying concept for apparently diverse topics in signal processing. The main objectives of this paper from the above perspectives are summarized below: (1) Assimilation of the essential information from different sources of functional analysis literature, which are relevant to developing the theory and applications of signal processing. (2) Description of the underlying concepts in a way that is accessible to non-specialists in functional analysis (e.g., those with bachelor-level or first-year graduate-level training in signal processing and mathematics). (3) Signal-processing-based interpretation of functional-analytic concepts and their concise presentation in a tutorial format.

1. Introduction

The concept of functional analysis is built upon normed vector spaces and particularly inner product spaces, which are merged with diverse notions of topology and geometry, linear algebra, probability theory, and real and complex analysis (see, e.g., [1,2,3,4,5]). Topics in functional analysis include various concepts such as Banach spaces and Hilbert spaces, and linear operators and their spectral theory, as well as group and semigroup theory. Knowledge of these mathematical structures is often essential for understanding and solving a variety of analytical problems in signal processing and related fields, as well as in mathematics itself [5]. For example, in functional analysis, objects like functions are considered as elements or points in a space of functions [6], and hence the name functional analysis.
Results, generated from functional analysis, form key concepts in the frameworks of advanced scientific and engineering disciplines that include the fields of statistical signal processing and adaptive signal processing. Although adaptive signal processing can be viewed as a branch of statistical signal processing [7], the special properties of this field and their roles in engineering applications have led many specialists to consider them as two separate fields. Therefore, in many universities and research institutions around the world, statistical signal processing and adaptive signal processing are taught as independent graduate courses in engineering and applied sciences, and many textbooks have been devoted to study these important fields individually (e.g., see [8,9] and references therein). Nevertheless both statistical signal processing and adaptive signal processing form the backbone of the so-called modern signal processing, in which signals are generally considered as random processes. Modern signal processing covers many topics of current interest, such as signal modeling and estimation, signal prediction, signal compression, adaptive lattice filtering, adaptive joint process estimation, recursive least squares lattice filtering, and spectrum estimation. The issues, related to processing of both deterministic and random signals, are further discussed below.
While an estimation error may typically converge to zero for deterministic signals, this is generally not the case for random signals [8]. Therefore, in statistical and adaptive signal processing, it is a common practice for random signals to make them unbiased (i.e., expectation of the estimation error converging to zero). As explained later, this type of convergence is of a special kind, which is known in functional analysis as weak convergence (see, for example, [10,11]). Therefore, many important results in functional analysis are obtained in terms of weak convergence and weak topology, which potentially have significant implications to the subfields of statistical signal processing and adaptive signal processing. Moreover, it is usually desirable in estimation theory to identify optimal filters, which bridges the discipline of signal processing to that of optimization theory. To this end, researchers in modern signal processing often deal with random processes for which optimization problems become more challenging, and the usage of advanced mathematical tools is justified.
From a historical perspective, the names of some of the spaces used in functional analysis are those of early-time mathematicians who had originally developed the theories of these spaces. Indeed much of the theoretical work has been associated with the names of eminent mathematicians (e.g., Gauss, Lagrange, Euler, and Kolmogorov). In fact, the Hilbert space, which is a central topic in functional analysis, is one of the most commonly used mathematical frameworks of signal processing and the associated optimization [12]. The unique features of Hilbert spaces are explained in the paper from these perspectives. However, the names of other well-known spaces (e.g., metric spaces and normed spaces) were given based on the technical properties of these spaces; many of the spaces, frequently used in functional analysis, have been named based on quite different historical backgrounds.
We have presented a concise and focused review of key concepts of functional analysis in this paper, which have strong relevance to modern signal processing. The most important spaces from the perspectives of functional analysis, considered in this paper, are metric/topological spaces, Banach spaces, and Hilbert spaces. The relations among these and other spaces are illustrated in Figure 1. Other relevant vector spaces like summable ( p ), Lebesgue-integrable ( L p ), and Hardy ( H p ) spaces are also introduced in the paper.
The paper is organized in four sections, including the current section, and an Appendix A. Section 2 introduces Banach spaces and their relevant theorems, where special emphases are laid on the p / L p spaces, H p spaces, spectral factorization, and weak topology in the setting of Banach spaces. Section 3 presents Hilbert spaces and their relevant features (e.g., Fourier series expansion and the orthogonality principle) along with some applications to signal processing and detection theory, such as wavelets, Karhunen-Loéve (KL) expansion, and reproducing-kernel Hilbert spaces (RKHS). Section 4 summarizes and concludes the paper. The Appendix A in this paper introduces elementary concepts and definitions in real analysis, probability theory, and topological spaces, which should be helpful for understanding the fundamental principles of functional analysis as applied to various concepts of signal processing; however the readers, who are familiar with these concepts, may only selectively refer to the Appendix A.

2. Banach Spaces for Signal Analysis

This section deals with Banach spaces for general applications to signal processing; it also introduces the concepts of Hardy spaces especially for digital signal processing. Further details on Banach spaces are provided in standard books on functional analysis such as Bachman and Narici [1] and Naylor and Sell [2].

2.1. Introduction to Banach Spaces

We start this subsection with the definition of a Banach space which is a complete normed space as defined below.
Definition 1
(Banach Spaces). Let a vector space X be defined over a field K , where examples of K are the field of real numbers ( R , + , · ) and the field of complex numbers ( C , + , · ) . Let a function : X R , called norm and denoted as x x , have the following properties:
  • (positivity) x X , x 0 , x X , and x = 0 if and only if x = 0 .
  • (homogeneity) x X and c K , c x = | c | x .
  • (triangular inequality) x , y X , x + y x + y .
Then, X , is called a normed vector space, where the norm serves as a metric. A real (resp. complex) normed linear space that is complete (i.e., where every Cauchy sequence converges in the space) is called a real (resp. complex) Banach space.
Example 1.
The spaces of p sequences, 1 p , form an important class of Banach spaces, which are extensively used in digital signal processing. These are linear vector spaces of all real (resp. complex) sequences x { x n } such that n = | x n | p < , where the p -norm is defined as:
x p n = | x n | p 1 / p if 1 p <
x sup n Z | x n | if p =
Some of the theorems on p spaces [2], which are extensively used in the analyses of discrete-time signals, are presented below.
Theorem 1
(Hölder Inequality [2]). Let 1 < p < and 1 p + 1 q = 1 . If f p and g q , then f · g 1 and f · g 1 f p g l q .
Proof. 
See pp. 550–551 in Naylor and Sell [2]. □
It is noted that Hölder Inequality also holds for p = 1 and p = .
Theorem 2
(Minkowski inequality [2]). If 1 p and f , g p , then f + g p f p + g p .
Proof. 
See pp. 550–551 in Naylor and Sell [2]. □
It is noted that Lebesgue-integrable versions of p spaces, for applications to continuous signal processing, are called L p spaces [4]. In L p spaces, Hölder inequality and Minkowski inequality are similar to their respective p -versions in Theorems 1 and 2.
Next we focus on a few systems-theoretic applications of Banach spaces, which would require the operation of convolution.
Theorem 3
(Convolution inequality [13]). For the sequences u p for p [ 1 , ] and h 1 , the convolution product h u p and h u p h 1 u p .
Proof. 
See p. 241 in Desoer and Vidyasagar [13]. □
Lemma 1
(Barbalat Lemma). If { x n } l p for some p [ 1 , ) , then lim n | x n | = 0 .
Proof. 
Let us assume that lim n | x n | 0 is true. Then, there exists a subsequence { | x n j | } bounded below by a real number ε > 0 , which implies that { | x n j | p } is bounded below by ε p so that n | x n | p as n . This contradicts the assertion { x n } p . □
Let a linear discrete-time dynamical system with an impulse response h [ n , k ] be excited by an input signal u to yield an output signal y .
Definition 2
(BIBO-stability). A system is said to be bounded-input-bounded-output (BIBO)-stable if every u y . More generally, the system is called p -stable if u p y p , where p [ 1 , ] .
For a linear shift-invariant (LSI) system, the impulse response h [ n , k ] takes the form h [ n k ] , where the output is given by the convolution y = h u as [14]:
y [ n ] = k = h [ n k ] u [ k ] = k = u [ n k ] h [ k ]
Using Theorem 3, if h 1 and u p for some p [ 1 , ] , then it follows that [13]:
y p h 1 u p
It is noted that h 1 is a sufficient condition for the system to be p -stable. Furthermore, using Lemma 1, it follows that if y p for some p [ 1 , ) , then y [ n ] 0 as n . This information is useful, for example, in the design of a linear shift-invariant estimation system, where the output signal represents the estimation error. If the system impulse response is h 1 , then the estimation error is bounded and converges asymptotically to zero if the input signal u p for some p [ 1 , ) .
Example 2
(Adaptive Filtering). In a general setting, let us consider an adaptive filtering problem in Figure 2, where a measurement vector x [ n ] x 1 [ n ] , x 2 [ n ] , , x N [ n ] T is used to construct an estimate, d ^ [ n ] ( h x ) [ n ] , of the desired signal d [ n ] by a linear shift-variant filter h [ n ] [7]. Then, the task is to synthesize an adaptive algorithm to update the filter h [ n ] such that the estimation error e [ n ] d [ n ] d ^ [ n ] 0 as n . Using Lemma 1, this could be achieved if e p for some p [ 1 , ) in the adaptive algorithm.
If a dynamical system at any time n does not depend on the future (i.e., the system is only dependent on the past and the present) input(s), then the system is said to be causal [15] and the convolution in Equation (1) reduces to
y [ n ] = k = n h [ n k ] u [ k ]
If, in addition, u [ k ] = 0 k < 0 , then it follows that
y [ n ] = k = 0 n h [ n k ] u [ k ]

2.2. Hardy Spaces and Spectral Factorization for Signal Processing

This subsection introduces the concept of Hardy spaces H p , 1 p , which constitute a class of Banach spaces with a special structure; this structure is very useful for digital signal processing [3]. In particular, H 2 and H spaces are of importance in robust control theory and it will be seen later in this section that the H 1 space also plays an important role for power spectrum factorization in digital signal processing.
Recalling that, for a linear shift-invariant system with an impulse response h [ n ] and input u [ n ] , the output y [ n ] is obtained by convolution [14] as: y = h u . Then, by setting z e i ω where ω is the frequency in radians, the z-transform of the impulse response h is defined as:
H ( z ) = n = h [ n ] z n
which is known as the system transfer function (The one-dimensional z-transform of the discrete-time impulse response h [ k ] is the ratio of two polynomials: H ( z ) N ( z ) D ( z ) , where the degree of N ( z ) is less than or equal to that of D ( z ) for physically realizable systems. However, for the multi (i.e., n)-dimensional z-transform, where n N / { 1 } , the resulting transfer function is given as the ratio of the numerator and denominator multinomials:
H ( z ) N ( z 1 , , z n ) D ( z 1 , , z n )
The analysis of multi-dimensional z-transform (e.g., in signal processing of spatio-temporal processes) is significantly more complicated than that of one-dimensional z-transform, because the fundamental theorem of algebra may not be applicable to multinomials while it is always applicable to polynomials.) in the z-domain.
The system H ( z ) is stable if the sum in Equation (4) converges, and the region of convergence (ROC) is called the stablity region, where all poles of H ( z ) are located inside the unit circle with its center at zero in the complex z-plane. The system is said to be minimum-phase if all zeros of H ( z ) are located inside the unit circle. If all zeros of H ( z ) are located outside the unit circle, then the system is called maximum-phase [16], and the system is called non-minimum-phase if at least one zero of H ( z ) is located outside the unit circle.
Definition 3
(Analytic Functions). Let D r ( z 0 ) { z C : | z z 0 | < r } be the open disc of radius r > 0 with center at z 0 C . A complex-valued function f ( r e i θ ) , where θ [ 0 , 2 π ) , is said to be analytic in D r ( z 0 ) if the derivative of f ( r e i θ ) exists at each point of D r ( z 0 ) .
Given p [ 1 , ] , the Hardy space H p is a set of analytic functions f ( r e i θ ) with bounded H p -norm defined as:
f H p sup r ( 0 , 1 ) 1 2 π 0 2 π d θ | f ( r e i θ ) | 1 p for p [ 1 , )
f H sup | z | < 1 | f ( z ) | , for p =
The following theorem, due to Paley and Wiener [14], presents a fundamental result in the H 1 -space, which is important for spectral factorization in signal processing and for innovation representation of random processes.
Theorem 4
(Paley-Wiener). Let S ( z ) be a complex-valued function of the complex variable z. If l n ( S ) H 1 , then there exists a real positive constant K 0 and a complex-valued function H c a ( z ) corresponding to a causal stable system with a causal stable inverse such that
S ( z ) = K 0 H c a ( z ) H c a h e r ( 1 / z ¯ )
where the superscript “her" indicates the Hermitian, i.e., complex conjugate of transpose of a vector/matrix, and z ¯ is the complex conjugate of z. If, in addition, S ( z ) is a rational polynomial, the above factors H c a ( z ) and H c a h e r ( 1 / z ¯ ) are minimum-phase and maximum-phase components, respectively. This is called the Paley-Wiener condition.
Proof. 
The proof of the Paley-Wiener Theorem is given in details by Therrian [14]. □
It follows from Equation (1) that, for a linear shift-invariant stable system with a deterministic LSI impulse response h [ n ] and a wide sense stationary (WSS) input signal u [ n ] , the expected value of the output y [ n ] is:
E [ y [ n ] ] = k = h [ n k ] E [ u [ k ] ]
Since the input u is WSS, expected values, m y and m u , of the output y and input u , respectively, are related as:
m y = k = h [ k ] m u
Autocorrelation of a random vector x [ k ] is denoted as r x x [ k ] E x [ k ] x h e r [ k ] , and the cross-correlation between the output y and the input u is given by
r y u [ n 1 , n 0 ] = k = h [ n 1 k ] r u u [ k n 0 ]
The above equation leads to the following important relations between correlation functions [14]:
r y u [ ] = h [ ] r u u [ ] and r y y [ ] = h [ ] r u u [ ] h h e r [ ]
where the superscript her indicates the Hermitian, i.e., the complex conjugate of transpose of a vector/matrix.
The Fourier transform of r x x [ k ] for a WSS random sequence x [ k ] is called the power spectral density function [7], defined as:
S x x ( e i ω ) k = e i ω k r x x [ k ]
and its inverse Fourier transform, which is equal to the autocorrelation function, is obtained as:
r x x [ k ] = 1 2 π π π d ω e i ω k S x x ( e i ω )
The z-transform of the autocorrelation function for a WSS random sequence x [ n ] is called the complex spectral density function and is defined as:
S x x ( z ) k = r x x [ k ] z k
and its inverse is given by the contour integral
r x x [ k ] = 1 2 π i C d z z k 1 S x x ( z )
Since the autocorrelation function of a zero-mean white noise with variance σ w 2 is given by r w σ w 2 δ [ k ] , the power spectral density is a constant σ w 2 for a stationary white noise.
Using the property that the convolution in the time domain is a product in the Fourier transform domain and using Equation (9), it follows that
S y x ( e i ω ) = H ( e i ω ) S x x ( e i ω )
where H ( e i ω ) is the system transfer function (i.e., the Fourier transform of h [ k ] ). A few algebraic computations yield the following relation [14]:
S y y ( e i ω ) = H ( e i ω ) S x x ( e i ω ) H h e r ( e i ω )
In a similar manner, the following relations are obtained for the complex spectral density
S y x ( z ) = H ( z ) S x x ( z ) and S y y ( z ) = H ( z ) S x x ( z ) H h e r ( 1 / z ¯ )
Let us consider a WSS random sequence { x [ k ] } whose complex spectral density satisfies the Paley-Wiener condition:
n ( S x x ) H 1 , i . e . , π π | n S x x ( e i ω ) | d ω <
Then, by Theorem 4, there exists a real positive constant K 0 and a complex-valued transfer function H c a ( z ) of a causal stable system with a causal stable inverse such that
r x x ( z ) = K 0 H c a ( z ) H c a h e r ( 1 / z ¯ )
Remark 1.
A process, whose (complex) spectral density satisfies Equation (17), is called a regular process (see [7,14]). The spectral density factorization given by Equation (18) has important applications in signal processing. This includes what is called innovations representation of the random process [14], in view of which, any regular process can be realized as the output of a causal linear filter H c a ( z ) driven by a white noise with variance K 0 as shown in Figure 3.
It is worth-mentioning that this type of process covers a wide range of random processes. In particular, any process whose complex spectral density is a rational function of z is a regular process.
Example 3
([14]). Consider a random sequence x [ n ] with a complex spectral density function:
S x x ( z ) = ( 1 / a ) z ( a + 1 / a ) + z 1
which could be re-written as:
S x x ( z ) = 1 a z + ( 1 + a 2 ) a z 1
= 1 ( 1 a z 1 ) . 1 ( 1 a z )
Using Paley-Wiener Theorem, x [ n ] can be realized as the output of a causally stable system, given by:
H c a ( z ) = 1 1 a z 1
excited by a zero-mean white noise with unit variance σ 2 = 1 . It is important to note that since S x ( z ) is a rational polynomial, H c a ( z ) should be minimum-phase. This is the case for the one given by Equation (19).
Since the function can be factored as:
S x x ( z ) = 1 a z + ( 1 + a 2 ) a z 1 = 1 ( z a ) . 1 ( z 1 a )
a possible pitfall here is to choose
H c a ( z ) = 1 z a = z 1 1 a z 1
The term in Equation (20) is not minimum-phase because it has a zero at | z | = . Moreover, the inverse H c a 1 ( z ) = z a is not causal. Therefore, the spectral factorization with H c a ( z ) given by Equation (20) is not physically realizable for the given random sequence { x [ k ] } .
As mentioned before, any random process whose complex spectral density is a rational polynomial is a regular process, and therefore it satisfies the Paley-Wiener condition. However, this is not a necessary condition for being a regular process as seen in the following example.
Example 4
([14]). Let a random sequence x [ n ] have a complex spectral density S x ( z ) = e z + z 1 .
Then, the corresponding power spectral density S x ( e i ω ) = e 2 c o s ω satisfies the Paley-Wiener condition that is given as:
π π d ω | n S x ( e i ω ) | = π π d ω | 2 c o s ω | <
Therefore, the given random sequence is regular and has an innovations representation. The spectral factorization can be done as follows:
S x ( z ) = 1 . e z 1 . e z
Then, the causal factor is given by
H c a ( z ) = e z 1
which converges everywhere except at z = 0 . The impulse response of the filter is: h c a [ n ] = 1 n ! U [ n ] , where U [ n ] is the unit (discrete) step function, because
H c a ( z ) = e z 1 = k = 0 1 k ! z k
So, the given random sequence can be realized as the output of a system, with a transfer function given by Equation (21), which is driven by a zero-mean white noise with a unit variance (i.e., σ 2 = 1 ).
In fact, a regular process is related to the corresponding predictable process that can be predicted with zero error. The relation between these two processes are given by the following fundamental theorem [7].
Theorem 5
(Wold Decomposition Theorem). A general random sequence x [ n ] can be written as the sum of two processes as:
x [ n ] = x r [ n ] + x p [ n ]
where x r [ n ] is a regular process and x p [ n ] is a predictable process, with x r [ n ] being orthogonal to x p [ n ] , i.e., E { x r [ m ] x p h e r [ n ] } = 0 m , n .
Proof. 
The proof is given in [7]. □

2.3. Weak Topology in a Banach Space

It follows from the Appendix A that an appropriate collection of open sets in a metric space defines its topology, and such a topology is called a metric topology or strong topology. In fact, a base for the strong topology on a Banach space X is the collection of all open balls, i.e., sets of the form:
{ f X : f g < r } ,
where the center g is a vector/function in X and the radius r is a positive real number. In this topology, convergence of a sequence, { f n } , of functions in X to a limit g in X is referred to as strong convergence, which implies that g f n 0 and is denoted by f n s g . Besides strong convergence, other notions of convergence (e.g., weak convergence and uniform convergence) have been introduced in the literature, which play significant roles in the theory of Banach algebra [1].
We now introduce the notions of weak convergence and weak topology. Given a Banach space X over a field K , let F { F 1 , F 2 , } be a set of bounded linear functionals (A functional is a mapping of a vector space X into its field K . Then, the set of all linear bounded (equivalently, linear continuous) functionals in X is called the dual space X .) on X, i.e., each F i is an element in the dual space X and hence F X . Given an ε > 0 and a vector/function f 0 X , let us define the set:
Ω ( F ; f 0 , ε ) { f X ; | F i ( f ) F i ( f 0 ) | < ε , F i F }
A class of such sets is obtained by varying ε 0 + in Equation (24) to establish the notions of weak convergence and weak topology. Some of these convergence concepts in the space of linear bounded operators are briefly explained in the following definitions, which are introduced for different notions of convergence of sequences { T k } of bounded linear operators in Banach spaces.
Definition 4
(Convergence in operator norm or uniform convergence). Let T k B L ( V , V ) be a bounded linear operator from V into V. Then, the sequence { T k } converges to some T B L ( V , V ) in the operator norm (also called uniform convergence) if the induced norm ( T T k ) i n d lim k sup x V = 1 ( T T k ) x V = 0 , which is denoted as: T k u T .
Definition 5
(Strong convergence). Let T k B L ( V , V ) be a bounded linear operator from V into V. Then, the sequence { T k } converges strongly to some T B L ( V , V ) if lim k ( T T k ) x V = 0 x V , which is denoted as T k s T .
Definition 6
(Weak convergence). Let T k B L ( V , V ) be a bounded linear operator from V into V. Then, the sequence { T k } converges weakly to some T B L ( V , V ) if
F V x V , lim k | F ( T x ) F ( T k x ) | = 0 ,
which is denoted as T k w T .
Remark 2
(Convergence in operator norm). ⇒ (Strong convergence) ⇒ (Weak Convergence). The converse is not true, in general.
To show ( T k u T ) ⇒ ( T k s T ), we proceed as:
x V , ( T T k ) x V ( T T k ) i n d x V implies that, given T k u T , i.e., lim k ( T T k ) i n d = 0 , it follows that lim k ( T T k ) x V = 0 x V , i . e . , T k s T .
To show ( T k s T ) ⇒ ( T k w T ):, we proceed as:
T k s T lim k ( T T k ) x V = 0 x V . Let f V ; then, it follows from linearity and boundedness of the functional f that | f ( x ) | f i n d x V . Therefore, x V f V , lim k | f ( T x ) f ( T k x ) | = 0 T k w T
We demonstrate the falsity of the converse by two counterexamples, one for each case.
(Strong convergence) ⇏ (Convergence in operator norm): Let us define x { ξ n : n N } and a sequence of bounded linear operators T k : 2 2 k N as:
T k x { 0 , 0 , 0 , , 0 f i r s t k t e r m s , ξ k + 1 , ξ k + 2 , }
Therefore, T k is a bounded linear operator, i.e., T k B L ( 2 , 2 ) . Since x 2 , it follows that
lim k T k x 2 = 0 lim k T k = s 0 B L ( 2 , 2 )
However, the limit may not converge in the induced norm, lim k sup x 2 = 1 T k x 2 = 1 as seen by choosing x = { 0 , 0 , 0 , , 0 f i r s t k t e r m s , ξ k + 1 , ξ k + 2 , } with x 2 = 1 lim k T k u 0 B L ( 2 , 2 ) .
Therefore, (Strong convergence) ⇏ (Convergence in operator norm).
(Weak convergence) ⇏ (Strong convergence): Let us define a sequence of bounded linear operators T k : 2 2 k N as:
T k x = { 0 , 0 , 0 , , 0 f i r s t k t e r m s , ξ 1 , ξ 2 , }
where x { ξ n : n N } . It is given that { T k } is a sequence of bounded linear operators, i.e., each T k B L ( 2 , 2 ) . Furthermore, in this Hilbert space setting, it follows from the Riesz Representation Theorem that every f 2 can be represented as:
f ( x ) = x , y 2 = n = 1 ξ n η n , where y = { η k : k N }
It follows by Cauchy-Schwarz inequality that, as k
| f ( T k x ) | 2 = | T k x , y | 2 n = 1 | ξ n | 2 m = k + 1 | η m | 2 0
However,
T k x 2 = x 2 k N x 0 2 such that
lim k T k x 2 0 lim k T k s 0 B L ( 2 , 2 )
Therefore, (Weak convergence) ⇏ (Strong convergence).
Remark 3.
It is noted that, for finite-dimensional vector spaces, the notions of strong convergence and weak convergence are indistinguishable. Equivalently, we make the following statement:
In a finite-dimensional Banach space V, the weak topology generated by V is the same as the strong topology generated by V.
However, in the analysis of stochastic processes, we deal with infinite-dimensional spaces of signal functions, which may not have the same criteria for weak convergence and strong convergence. This is especially applicable to statistical signal processing, where the expectation of the estimation error is required to weakly converge to zero without having the strong convergence of the error signal itself to zero.
Based on the concept of weak convergence, weak topology is defined as follows:
Definition 7
(Convergence in weak topology). Given a Banach space X, let there be a class of bounded linear functionals F X , and let ( F ) be the topology in X generated by F . Then, for a given vector/function g X , a sequence { f n } X is said to converge to g in the weak topology ( F ) , denoted as f n w g in ( F ) , provided that F α ( f n ) converges strongly to F α ( g ) , denoted as F α ( f n ) s F α ( g ) F α F .
Weak convergence in Definition 7 is a generalization of weak convergence as introduced in the functional analysis literature, which implies that a sequence { f n } X converges weakly to some g X if G ( f n ) s G ( g ) G X [10].
Remark 4.
The concept of topological spaces and weak topology are important for learning using statistical invariants (LUSI). In a machine learning paradigm, learning machines often compute statistical invariants for specific problems with the objective of reducing the expected values of errors in a such way that preserves these invariants. In contrast to classical machine learning that employs the mechanism of strong convergence for approximations to the desired function, LUSI can significantly increase the rate of convergence by combining the mechanisms of strong convergence and weak convergence [17]. Furthermore, the notion of weak topology is also important when dealing with shift spaces for signal analysis that uses symbolic dynamics, as explained in [18,19].

3. Hilbert Spaces for Signal Processing

This section introduces the concept of Hilbert spaces, which forms the backbone in the disciplines of signal processing and other fields of engineering. Details are provided in many textbooks such as Naylor and Sell [2].
Definition 8
(Hilbert Spaces). Let a vector space X be defined over a field K , which is either R or C . A function , : X × X K is called an inner product if, for x , y , z X and α K , the following conditions hold:
1. 
(positive definiteness) x , x > 0 when x 0 ;
2. 
(additivity) ( x + y ) , z = x , z + y , z ;
3. 
(homogeneity) α x , y = α x , y ;
4. 
(symmetry) x , y = y , x ¯
Then, X , , is called a inner product space or a pre-Hilbert space, and a complete inner product space (i.e., where every Cauchy sequence converges in the space) is called a real (resp. complex) Hilbert space, depending on whether the vector space is defined over R (resp. C ).
The following two properties are immediate consequences of the four properties in Definition 8:
  • x , ( y + z ) = x , y + x , z ;
  • x , α y = α ¯ x , y ;
It is also noted that every inner product space is a normed space with the norm x x , x x X [2].
Example 5.
An example of a Hilbert space is the 2 space of square-summable sequences. Given two sequences x = { x n } and y = { y n } in 2 , the inner product is given by x , y n = x ¯ n y n . Two vectors x and y in a Hilbert space H are said to be orthogonal if x , y = 0 . Given a subspace V H , its orthogonal complement is denoted as: V { u H : u , v = 0 v V } ; consequently, V V = H
Hilbert spaces have many common interesting properties that make them to be important in optimization theory [12]. As we will see in the sequel, these properties form the core of many fundamental results in adaptive and statistical signal processing, and they are established through the following theorem.
Theorem 6
(Riesz Representation Theorem [2,5]). Let H be a a Hilbert space. Then, for every bounded linear functional f : H C , there exists a unique y H such that f ( x ) = y , x H x H .
Proof. 
The proof is given in [2] in pp. 345–346. □
Remark 5.
For the Hilbert spaces 2 (resp. L 2 ), this result can be obtained by using a theorem [5], which states that, given p [ 1 , ) , q (resp. L q ) is isometrically isomorphic to the dual space of p (resp. L p ) provided that 1 p + 1 q = 1 , where q is called the conjugate of p. Since the conjugate of p = 2 is q = 2 , it follows that 2 is isometrically isomorphic to 2 , and similar relations hold for L 2 and ( L 2 ) (for example, see [3]); hence, 2 and L 2 are reflexive. Generalization of this fact is stated as the following theorem.
Theorem 7.
Every Hilbert space is reflexive, i.e., H is isometrically isomorphic to its dual space H .
The proof of this theorem is given in many textbooks on functional analysis (e.g., [5,10,20]).
Another important property of Hilbert spaces, which is widely used in signal processing in combination with the previous two properties (see Theorens 6 and 7), is given as the following theorem [5]:
Theorem 8
(Orthogonal Projections). Let H be a Hilbert space, and let V H be a closed subspace of H, implying that V is also a Hilbert space. Then, it follows that
(i) H = V V . That is, given x H , there exists a unique pair v V and u V such that x = u + v .
(ii) v V ( x ) is the unique vector in V having minimal distance from a vector x H , while u V ( x ) is the unique vector in V having minimal distance from x.
(iii) The orthogonal projections x v V ( x ) and x u V are linear continuous operators, with norm 1 .
Proof. 
The proof is given in [2] in pp. 300–305. □
Remark 6.
It follows from Theorem 8 that the decomposition in Equation (22) in Section 2 is indeed unique. Based on this fact, any random process generally consists of two unique orthogonal components; a predictable component and an unpredictable component. That is, if one wants to predict x [ n ] by using N past observations { x n N , x n N + 1 , , x n 1 } , then let x ^ [ n ] = k = 1 N a k x [ n k ] denote an optimal linear prediction of x [ n ] . Such a prediction can be obtained by applying the orthogonality principle, where the prediction error is given by e [ n ] x [ n ] x ^ [ n ] = x [ n ] k = 1 N a k x [ n k ] , and the process x [ n ] can be expressed as:
x [ n ] = x ^ [ n ] + e [ n ] = x ^ [ n ] + x [ n ] k = 1 N a k x [ n k ]
Hence, the part x ^ [ n ] represents the predictable part of x [ n ] , which corresponds to x p [ n ] in Wold decomposition Theorem in Equation (22), while the error e [ n ] represents the unpredictable part of x [ n ] , which corresponds to x r [ n ] in Wold decomposition Theorem. That is, the regular process represents the difference between the random process and its optimal prediction. Therefore, the output of H c a 1 ( z ) represents only the new part of information, brought by x [ n ] , which cannot be extracted from the past observations. Therefore, the output of H c a 1 ( z ) is called innovations process as depicted in Figure 3b.
Another interesting result on Hilbert spaces is stated in the following theorem [2,10].
Theorem 9
(Bessel Inequality). Let H be a Hilbert space and let V = s p a n { e 1 , e 2 , } be a subspace of H. If P V : H V denotes the orthogonal projection of elements in H into V, then, the Bessel inequality
k 1 | ( x , e k ) | 2 = P V x 2 x 2
holds for every x H . Moreover,
k 1 ( x , e k ) e k = P V x
This theorem has an important implication to signal processing as explained in the following subsection.

3.1. Fourier Series Expansion in a Hilbert Space

In the Hilbert space L 2 ( [ t 0 , t 0 + T ] ) , which is the space of all square-integrable periodic functions f : [ t 0 , t 0 + T ] C with t 0 R with a period T ( 0 , ) , an inner product is defined as:
f , g t 0 t 0 + T d t f ¯ ( t ) g ( t ) f , g L 2 ( [ t 0 , t 0 + T ] )
Let the set of functions S { φ n ( t ) , n Z } , where
φ n ( t ) e i ω n t T ; ω n 2 π n / T n Z
Then, it follows by setting T = 2 π that
φ m , φ n = 1 2 π π π d t e i m t ¯ e i n t = 1 if m = n 0 , if m n
and thus φ m , φ n = δ m n m , n Z
where δ m n is called the Kronecker delta. Moreover, it turns out that s p a n ( S ) is dense in L 2 ( [ t 0 , t 0 + T ] ) , i.e., the completion s p a n ( S ) ¯ = L 2 ( [ t 0 , t 0 + T ] ) (see [10]). Therefore, given any f L 2 ( [ t 0 , t 0 + T ] ) , there exists a sequence of scalars { c k } C such that
lim N f k = N N c k φ k L 2 = 0
That is,
f ( t ) = m s k = c k φ k ( t )
Therefore, { φ n ( t ) ; n Z } is in fact an orthonormal basis of L 2 ( [ t 0 , t 0 + T ] ) . The infinite sum k = c k φ k is called the Fourier series expansion of f, where { c k } are the Fourier coefficients. Using Equation (29) and taking the inner product of both sides of Equation (30) by φ n , for any n Z , yields
c n = m s φ n , f ( t )
Hence, Equation (30) can be rewritten in the following form:
f ( t ) = m s n = φ n , f ( t ) φ n ( t )
Moreover, it follows from Equation (26) in Theorem 9 that
f L 2 2 = k = | c k | 2
In view of Equation (32), Fourier expansion of any square integrable signal f L 2 ( [ t 0 , t 0 + T ] ) can be decomposed as a linear combination of harmonic modes φ k with frequencies ω k [21], where each Fourier coefficient c k represents the signal’s component associated with each mode φ k . Furthermore, Equation (33) reveals how signal’s energy f L 2 2 is distributed over the signal’s components { c k } and demonstrates an important fact that, for each component c k , the value | c k | 2 represents a part of the signal’s energy contributed by the component c k . This fact plays a central role in signal compression, where a signal f L 2 ( [ t 0 , t 0 + T ] ) is approximated by using as few Fourier coefficients as possible; this is accomplished with a minimum approximation error by considering those values of { c k } with large magnitudes and by discarding those coefficients with small magnitudes.
Now we summarize the main results of Fourier series expansion of periodic functions as a theorem.
Theorem 10
(Fourier Series Theorem). Let { φ n } be an orthonormal set in a Hilbert space H. Then, the following statements are equivalent:
1. 
{ φ n } is an orthonormal basis of H, i.e., { φ n } is a complete orthonormal set in H.
2. 
(Fourier series expansion) Any vector x H can be expanded as: x = m s n N x , φ n φ n . Note: The inner products x , φ n are called Fourier coefficients of the vector x.
3. 
(Parseval Equality) For any two vectors x , y H , the inner product: x , y = n N x , φ n y , φ n ¯
4. 
The norm: x 2 = n N | x , φ n | 2 x H
5. 
Let U be a subspace of H such that U contains the sequence { φ n } . Then, U is dense in H, i.e., U ¯ = H .
Proof. 
The proof is given in [2] in pp. 307–312. □

3.2. Fourier Transform and Inverse Fourier Transform

For decomposition by Fourier series expansion, a function needs to be periodic as seen in Section 3.1. To extend this analysis to non-periodic functions, we first consider square-integrable periodic functions f : [ T / 2 , T / 2 ] C and let T so that the restriction of periodic functions can be removed. Then, a combination of Equations (28) and (31) yields:
c n = m s 1 T T / 2 T / 2 d t e i ω n t f ( t )
and we define:
f ^ T ( ω n ) T c n
Having n and ω n ω as T , it follows that
f ^ ( ω ) lim T , n f ^ T ( ω n ) = d t e i ω t f ( t )
Now, we have Fourier transform of a signal f L 2 ( R ) . Since L 2 ( R ) is the completion L 1 ( R ) L 2 ( R ) ¯ , we impose a mild restriction: f ( t ) to be both absolute-value integrable and square-integrable. Nevertheless, this restriction is satisfied if f is an analytic function [21].
To obtain the inverse Fourier transform, we substitute Equations (28) and (35) into Equation (30), which yields
f ( t ) = m s 1 T n = e i ω n t f ^ T ( ω n )
By defining ω n n T , we have Δ ω n ω n + 1 ω n = 1 T . Then, substitution of 1 T into Equation (37) for Δ ω n yields
f ( t ) = m s n = Δ ω n f ^ T ( ω n ) e i ω n t
In the limits T and n , Equation (38) becomes the inverse Fourier transform by using the Riemann sum [4]:
f ( t ) = m s d ω e i ω t f ^ ( ω )
Equation (33) can also be rewritten as:
f T 2 = n = Δ ω n | f ^ T ( ω n ) | 2
This formula shows that a signal f ( t ) L 2 ( R ) has, at any given time t, (possibly) uncountably many harmonic components distributed over the frequency range < ω < , and the magnitude of the harmonic component at a frequency ω is given by the signal’s Fourier transform f ^ ( ω ) . By taking the limits T and n , it follows from Equations (33) and (34) that
f L 2 2 = d t | f ( t ) | 2 = d ω | f ^ ( ω ) | 2 = f ^ L 2 2
The above relation is known as Plancherel’s theorem [3], which implies that the total energy of the signal, obtained in the time domain t R is re-distributed over the frequency domain ω R such that the energy density at each frequency ω is | f ^ ( ω ) | 2 . It is worth-mentioning that the inner products of two functions f and g in the time domain and the frequency domain is related by:
f , g L 2 = f ^ , g ^ L 2
which is known as Parseval’s identity [2].
In many signal processing applications, the signal f is complex-valued with a discrete domain, i.e., f : Z C . Then the discrete-time Fourier transform (DTFT) is given by:
f ^ ( ω ) = n = f ( n ) e i n ω
From Equation (43), it follows that
| f ^ ( ω ) | = | n = f ( n ) e i n ω | n = | f ( n ) e i n ω | = n = | f ( n ) |
Therefore, a sufficient condition that guarantees the DTFT to be well-defined is that f 1 . The original sequence can be recovered from its DTFT by the inverse discrete-time Fourier transform (IDTFT)
f ( n ) = 1 2 π π π d ω f ^ ( ω ) e i n ω
Expressing the frequency from radians/sec to Hertz, i.e., by setting ω = 2 π ξ , it follows that
f ( n ) = 1 1 d ξ f ^ ( ξ ) e i 2 π n ξ
Although the Fourier transform plays a central role in signal analysis, it considers the time-averaged frequency behavior of the signal by integrating over the entire time domain < t < . This property reduces the capability of capturing abrupt (i.e., rapid) changes which may occur in the signal; capturing such rapid changes in the signal is crucial in many applications, such as detection of faults and anomalies. In order to remedy this shortcoming of Fourier transform, the signal is integrated over a time window, instead of integrating over the entire time domain. This gives rise to the so-called windowed Fourier transform (WFT) which augments Fourier transform with a time-localization property that would provide information about the signal simultaneously in time and frequency domains [21,22]; a quantum-mechanics-based explanation of time-frequency localization is briefly explained in [23].

3.3. Windowed Fourier Transform in a Hilbert Space

A function g : R C is said to have a compact support B R , and we say s u p p ( g ) = B if g vanishes outside its compact domain B, i.e., g ( t ) = 0 t B . Given a function f : R C and a t R , let us define
f t ( u ) g ¯ ( u t ) f ( u )
where g ¯ ( · ) is the complex conjugate of g ( · ) , and s u p p ( g ) [ T , 0 ] for a positive real number T. Hence f t is a localized version of f and s u p p ( f t ) [ t T , t ] [21]. Then, the windowed Fourier transform (WFT) of f is the Fourier transform of f t , which is given as:
f ˜ ( ω , t ) f ^ t ( ω ) = d u e i ω u f t ( u )
and the inverse WFT is obtained as:
f ( t ) = 1 g 2 d ω d t g ω , t ( u ) f ˜ ( ω , t )
where g ω , t ( u ) e i ω u g ( u t ) .
By following Equation (48) and Equation (49), an inner product is defined as:
g ω , t , f f ˜ ( ω , t )
Using Parseval’s identity in Equation (42), we have
g ω , t , f = g ^ ω , t , f ^ = f ˜ ( ω , t )
Example 6.
An example of the window function g is:
g ( u ) = 1 + c o s ( π u ) , 1 u 1 0 , otherwise
As mentioned in a previous subsection, if the signal is discrete, its DTFT is used to provide a frequency representation of the signal. Although the signal f ( n ) in this case is discrete in the time domain, its DTFT is a continuous function of the frequency ω . However, most of the devices used in signal processing are digital, and therefore it is more convenient to deal with a discrete frequency representation of the signal. Moreover, the discrete signal f ( n ) in many cases represent a measurement data provided by some sensors, and such signals f ( n ) are usually of finite length. The discrete Fourier transform (DFT) is a useful tool in signal processing that accommodates for these two issues as explained below.
Given a finite-length discrete signal f : { 0 , N 1 } C , the DFT is given as:
f ˜ [ k ] = n = 0 N 1 f [ n ] e i 2 π k n / N
The inverse discrete Fourier transform IDFT is:
f [ n ] = 1 N k = 0 N 1 f ˜ [ k ] e i 2 π k n / N
Returning to the continuous WFT, if a function f is windowed over a time interval, the resulting WFT would have a time-localization property as seen earlier in the end of Section 3.2. Moreover, Equation (51)) shows that WFT of f localizes f ^ ( ω ) to a neighborhhod of ω . Therefore, WFT has both time and frequency localizations. However, due to the uncertainty principle, these two kinds of localization have different physical interpretations and are mutually exclusive in the sense that making a WFT f ˜ ( ω , t ) sharper in time makes it more flat in frequency and vice versa [21,23]. Moreover, WFT is not efficient in scanning signals that involve time intervals much shorter or much longer than the window length T [21]. To address these issues, the notion of wavelet transform has been introduced in Section 3.4, which produce an efficient analysis tool to capture signal features occurring over short and long intervals.

3.4. Wavelet Transform in a Hilbert Space

The continuous wavelet transform (CWT) of a function f : R C is defined as:
f ˜ ( s , t ) d u ψ ¯ s , t ( u ) f ( u ) = ψ s , t , f
where ψ s , t , called the wavelet, is a scaled and translated version
ψ s , t ( u ) = | s | p ψ u t s where s 0
of what is called a mother (or basic) wavelet  ψ ψ 1 , 0 ; and ψ ¯ s , t is the complex conjugate of ψ s , t .
It is noted from Equation (55) that when | s | > 1 , ψ s , t is a stretched version of ψ , and when | s | < 1 , ψ s , t is a compressed version of ψ . Moreover, if s < 0 then ψ s , t is a reflected version of ψ . For example, these stretching, compression, and reflection processes can be conveniently done on the time axis. The exponent term p in Equation (55) is a real number that stretches or compresses ψ along the vertical axis. The idea of using p in Equation (55) is to keep a desired norm unchanged when scaling the wavelet ψ s , t . For example, if p = 1 , then both ψ and ψ s , t have the same L 1 norm; and if p = 1 / 2 , then ψ and ψ s , t have the same L 2 norm [21].
Using Parseval’s identity, Equation (54) can be written as
f ˜ ( s , t ) = ψ s , t , f = ψ ^ s , t , f ^
where ψ ^ s , t is the Fourier transform of ψ s , t . This equality shows that wavelets transform localizes signals in both time and frequency domains, where sharpness of these localizations is controlled by the scaling factor s and the choice of the mother wavelet ψ .
Example 7.
Morlet wavelet is a (frequency-modulated) mother wavelet which is given in the time domain as:
ψ ( u ) = e i 2 π ξ 0 u e u 2 / 2
whose Fourier transform is
ψ ^ ( ξ ) = e ( ξ ξ 0 ) 2 / 2
where ξ 0 is the center frequency around which the signal is localized in the frequency domain.
Various forms of the mother wavelet ψ have been reported in the wavelet literature [21,22]. All of these wavelet forms should satisfy the admissibility condition:
C ψ d ξ | ψ ^ ( ξ ) | 2 | ξ | <
where ψ ^ ( ξ ) is the Fourier transform of ψ ( t ) .
At a fixed scale s, the CWT of a signal f ( u ) yields information relevant to the feature contained in the signal at the scale s, and the behavior of this feature over time is captured by translating ψ s , t over t. Then this process is repeated for different scales by changing s to capture other signals’ features that are relevant to different scales.
Given a CWT of a signal f L 2 , the original signal f can be reconstructed by
f ( u ) = m s 1 C ψ d t d s f ˜ ( s , t ) s 2 ψ s , t ( u )
where C ψ is a constant depending on the wavelet ψ , and Equation (60) shows that any signal f L 2 can be represented as a superposition of shifted and dilated wavelets [24].
For a discrete signal f : Z C , the discrete wavelet transform (DWT) is used with a discrete wavelet as:
ψ s , t [ u ] = | s | p ψ u t s
where s is the scaling parameter and t is the shifting parameter. The most commonly used discrete wavelets have the following values of the parameters:
s = 2 j , t = k 2 j , and p = 1 / 2
where j is an integer that controls the scaling parameter and specifies the level of wavelet decomposition of the signal, and k is another integer which controls the shifting parameter. Substitution of these values into Equation (61) yields the most common form of the discrete wavelet
ψ j , k [ n ] = 1 2 j ψ n k 2 j 2 j
Notice that large values of j result in large scaling parameters which stretch the wavelet function and let the DWT capture low-frequency features in the signal. On the other hand, small values of j would make the DWT more capable of capturing high-frequency features by decreasing the scaling parameter [21,22].
Given a wavelet level j, the DWT of a sequence { f [ n ] } consists of the following two parts:
The average coefficients  { A j [ k 2 j ] } are given by:
A j [ k 2 j ] n = f [ n ] ϕ j , k [ n ] = n = f [ n ] 1 2 j ϕ n k 2 j 2 j
and the detail coefficients  { D j [ k 2 j ] } are described by:
D j [ k 2 j ] n = f [ n ] ψ j , k [ n ] = n = f [ n ] 1 2 j ψ n k 2 j 2 j
where the scaling function ϕ j , k [ n ] is associated with the wavelet function ψ j , k [ n ] ; full details are given in [21].
Let us now consider a special case of DWT, where the analyses (i.e., computation of f ˜ ( ω , t ) (see Equation (50)) or f ˜ ( s , t ) (see Equation (54)) or their discrete samples) are made directly from relevant integration with necessary values of time-frequency or time-scale parameters, Around 1980, a new method for performing DWT was created, which is known as Multiresolution Analysis (MRA). This method is completely recursive and is therefore ideal for computation, as succinctly described below.
In MRA, we may think of level-1 DWT of f [ n ] as the output of two filters connected in parallel, consisting of a low-pass filter with the impulse response g and a high-pass filter with the impulse response h, as seen in Figure 4. This is known as the filter bank implementation of DWT, consisting of different levels j. The cutoff frequency of each filter in the filter bank equals to a half of the bandwidth of the respective input signal. Hence, the output of each filter has a half of the bandwidth of the original sequence f [ n ] so that it is subsampled by 2. That is,
The average A 1 [ n ] k = f [ k ] g [ 2 n k ]
The detail D 1 [ n ] k = f [ k ] h [ 2 n k ]
Therefore, given a level-j DWT of a discrete-time signal f [ n ] , if A j [ k 2 j ] in the sequence of average coefficients is passed through a parallel combination of identically structured filters g and h, then the output is a sequence of level- ( j + 1 ) DWT of f [ n ] as seen in Figure 4. The features associated with different frequency components of the signal f [ n ] can be captured by using a multilevel wavelet decomposition of f [ n ] via iterative implementation of filter banks in the setting of time and frequency localization (see, for example, [21,23,24]).
Example 8.
Let us consider a function f ( t ) having a wavelet transform f ˜ ( s , t ) , which can be interpreted as the “details" contained at fixed scales s 0 . This interpretation is especially useful in the discrete case for understanding the principles of MRA as seen below.
Let ϕ ( u ) be a zero-mean unit-variance probability density function, which has the following properties:
  • ϕ ( u ) 0 u R ;
  • d u ϕ ( u ) = 1 ;
  • d u ϕ ( u ) u = 0 ;
  • d u ϕ ( u ) u 2 = 1 ;
Assuming that ϕ C n , i.e., ϕ is at least n times differentiable, where n N ., it follows that lim n ± ϕ ( n 1 ) ( u ) = 0 . Now letting ψ n ( u ) ( 1 ) n ϕ ( n ) ( u ) , we have
d u ψ n ( u ) = ( 1 ) n ϕ ( n 1 ) ( ) ϕ ( n 1 ) ( ) = 0
Thus, ψ n satisfies the admissibility condition in Equation (59) and hence can be used to define a CWT.
For s 0 and t R , let ϕ s , t ( u ) = | s | 1 ϕ u t s and ψ s , t n ( u ) = | s | 1 ψ n u t s . Then, ϕ s , t is a probability density with mean t and standard deviation | s | ; and ψ s , t n is qualified to be a wavelet family { ψ n } by setting p = 1 in Equation (61).
As a numerically explicit example, let ϕ represent the zero-mean unit-variance Gaussian density, i.e., ϕ ( u ) = exp ( u 2 / 2 ) 2 π . Since ϕ C , n can be taken to be any positive integer. For instance, ψ 1 ( u ) = ϕ ( 1 ) ( u ) = u exp ( u 2 / 2 ) 2 π and ψ 2 ( u ) = ϕ ( 2 ) ( u ) = ( u 2 1 ) exp ( u 2 / 2 ) 2 π , and so on. Because of the shape of the graph, ψ 2 is popularly known as the Mexican hat mother wavelet, which is often used in engineering applications.

3.5. Karhunen-Loéve Expansion of Random Signals

Karhunen-Loéve (K-L) expansion is a powerful tool that generalizes Fourier series expansion for analysis of random time-dependent signals. The K-L expansion is frequently used in statistical signal processing and detection theory by using deterministic time-dependent orthonormal functions and random-variable coefficients.
Theorem 11
(Karhunen-Loéve Expansion). Let X ( t ) be a zero-mean, second-order random process, defined over [ T / 2 , T / 2 ] where T ( 0 , ) , with a continuous covariance function K X X ( t , τ ) . Then, it follows that
X ( t ) = m s n = 1 X n ϕ n ( t ) t [ T / 2 , T / 2 ]
where the (countable) sequence of (deterministic) functions { ϕ n ( t ) } is a complete orthonormal set of solutions to the following integral equation:
T / 2 T / 2 d τ K X X ( t , τ ) ϕ n ( τ ) = λ n ϕ n ( t ) t [ T / 2 , T / 2 ]
and the random coefficients X n T / 2 T / 2 d t X ( t ) ϕ n h e r ( t ) are mutually statistically orthogonal, i.e.,
E [ X n X m h e r ] = λ n δ m n w i t h t h e K r o n e c k e r d e l t a δ m n
Proof. 
The proof is given in [25,26]. □
Remark 7.
The deterministic functions ϕ n ( t ) are orthonormal in the following sense:
T / 2 T / 2 d t ϕ n ( t ) ϕ m h e r ( t ) = λ n δ m n
Example 9
(K-L expansion of white noise). Let the covariance function of zero-mean stationary white noise w ( t ) be K w w ( t , τ ) = σ 2 δ ( t τ ) . Then, the orthonormal functions ϕ n ( t ) satisfy the K-L integral equation, for all n N , as:
T / 2 T / 2 d τ K w w ( t , τ ) ϕ n ( τ ) = σ 2 T / 2 T / 2 d τ δ ( t τ ) ϕ n ( τ )
It is also true that T / 2 T / 2 d τ K w w ( t , τ ) ϕ n ( τ ) = λ n ϕ n ( t ) , which implies that λ n ϕ n ( t ) = σ 2 ϕ n ( t ) n N . Thus, the choice of these orthonormal functions is arbitrary and all λ n ’s are identically equal to σ 2 . It is concluded that, for any zero-mean white noise, the K-L expansion functions { ϕ n ( t ) } can be any set of orthonormal functions with all eigenvalues λ n = σ 2 .
Example 10
(K-L expansion as an application to detection theory [25]). Let us assume that a waveform X ( t ) is observed over a finite time interval [ T / 2 , T / 2 ] to decide whether it contains a recoverable signal buried in noise, or the signal is completely noise-corrupted (i.e., the signal cannot be recovered). In this regard, we formulate a binary hypothesis testing problem with the hypothesis H 1 of having a recoverable signal and the hypothesis H 0 of complete noise capture, i.e.,
X ( t ) = s ( t ) + w ( t ) : i f H 1 i s t r u e w ( t ) : i f H 0 i s t r u e
where the signal s ( t ) is a deterministic function of time, and the noise w ( t ) is modeled as zero-mean, unit-variance, white Gaussian. Using the K-L expansion, we simplify the above decision problem by replacing the waveform X ( t ) with a sequence { X n } , which reduces to a sequence of simpler problems as:
X n = s n + ω n : i f H 1 i s t r u e ω n : i f H 0 i s t r u e
where s n and ω n are the respective (at most countably many) K-L coefficients of the signal s ( t ) and noise w ( t ) .
Now we take the K-L transform (instead of Fourier transform) of the received signal X ( t ) , where the transform space is the space of sequences of K-L coefficients that are mutually statistically orthogonal random variables. By taking advantage of the facts that the noise is zero-mean Gaussian and that the K-L coefficients are mutually statistically orthogonal, the random variables ω n become jointly independent., i.e., { ω n } is a sequence of independent and identically distributed (iid) random variables. By selecting the first orthonormal function as:
ϕ 1 ( t ) = s ( t ) T / 2 T / 2 d θ s 2 ( θ )
we can complete the rest of the orthonormal set { ϕ n ( t ) } in a valid way. We also notice that all of the random coefficients s k , with the exception of s 1 , will be zeros, i.e., X t is affected by the presence or absence of the recoverable signal. Thus, the distributed detection problem is reduced to the following scalar detection problem:
X 1 = T / 2 T / 2 d θ s 2 ( θ ) + ω 1 : i f H 1 i s t r u e ω 1 : i f H 0 i s t r u e
We note that the scalar X 1 can be computed as:
X 1 = T / 2 T / 2 d θ X ( θ ) s ( θ ) T / 2 T / 2 d θ s 2 ( θ )
which is commonly referred to as a matching operation. In fact, this operation can be performed by sampling the output of a filter whose impulse response is:
h ( t ) = s ( T t ) T / 2 T / 2 d θ s 2 ( θ )
where the parameter T should be chosen sufficiently large to make the impulse response causal. The output of the physically realizable filter at time T is then X t . This filter is called a matched filter and is widely used in the disciplines of communications and pattern recognition.

3.6. Reproducing Kernel Hilbert Spaces

This subsection develops the concept of reproducing kernel Hilbert spaces (RKHS) [14], in which each point in the space is a linear bounded (equivalently, a linear continuous) functional. The continuity (or boundedness) of linear functions implies that if two linear bounded functions f and g are close to each other (i.e., f g is small in the function space), then f and g are also close to each other pointwise, i.e., | f ( t ) g ( t ) | is also small for all t.
The RKHS has many engineering and scientific applications, including those in harmonic analysis, wavelet analysis, and quantum mechanics. In particular, functions from RKHS have special properties that make them useful for function estimation problems in high-dimensional spaces, which is critically important in the fields of statistical learning theory and machine learning [17]. In fact, every function in RKHS that minimizes an empirical risk functional can be expressed as a linear combination of the kernel functions evaluated at the training points. This procedure potentially simplifies the handling of the problem from infinite-dimensional to finite-dimensional.
We now present a formal definition of reproducing kernel Hilbert spaces (RKHS). The presented theory is often applied to real-valued Hilbert spaces and can be extended to complex-valued Hilbert spaces; examples of complex-valued RKHS are spaces of analytic functions.
Definition 9
(Reproducing Kernel Hilbert Spaces). Let T be an arbitrary non-empty set (e.g., the time domain or the spatial domain of a function) and let H be a Hilbert space of real-valued (resp. complex-valued) functions on T, equipped with pointwise vector addition and pointwise scalar multiplication, and the continuous functions in H are evaluated at each point t T . Then, H is defined to be a reproducing kernel Hilbert space (RKHS) if there exist a positive real M t and a continuous linear functional L t on H such that | L t ( f ) | = | f ( t ) | M t f H t T f H . [Note: Although M t is constrained to be a positive real, it is possible that sup t T M t = .]
Remark 8.
Definition 9 is rather a weak condition to ensure the existence of an inner product and the evaluation of every functional on H at every point in the domain T. From the application perspectives, a more useful definition would be to construct an inner product of a given function f H with another function K t H , which is the so-called reproducing kernel function for the Hilbert space H; the RKHS has taken its name from here.
To make Definition 9 more useful for many applications, we make use of Reisz representation theorem (Theorem 6) which states that there exists a unique K t H with the following reproducing property for each f H , which takes values at any given t T as:
f ( t ) = L t ( f ) = K t , f H
Since, for a given t T , the function K t H takes values in R (resp. C ) and having another K τ H associated with the parameter τ T and a corresponding functional L τ on H, it follows that
K t ( τ ) = L τ ( K t ) = K τ , K t H
The above situation can be interpreted as follows: K τ is a time translation of K t from t to τ if the set T is the time domain of the functions in the Hilbert space. This allows us to redefine the reproducing kernel of the Hilbert space H as a function K : T × T R ( resp . C ) as: K ( t , τ ) K τ , K t H .
Example 11
(Bandlimited approximation of Dirac delta function in the RKHS setting). Let us consider the space of continuous signals that are also band-limited with frequencies under the compact support, i.e., in the range of [ 2 π Ω , 2 π Ω ] , where the cutoff frequency Ω ( 0 , ) . It is noted that K t ( ) is a bandlimited version of the Dirac delta function, because K t ( τ ) converges to the delta distribution, expressed as δ ( τ t ) in the weak sense, as the cutoff frequency Ω tends to infinity.
Let us define T = R and H = f C 0 ( T ) : supp ( f ^ ) [ Ω , Ω ] , where C 0 ( T ) is the space of continuous functions whose domain is T, and the Fourier transform of f is: f ^ ( ξ ) R d t exp ( i 2 π ξ t ) f ( t ) and the inverse Fourier transform of f ^ ( ξ ) is: f ( t ) R d ξ exp ( i 2 π ξ t ) f ^ ( ξ ) . Then, it follows by Cauchy-Schwarz inequality and Plancherel theorem that:
| f ( t ) | 2 Ω Ω d ξ | e i 2 π ξ | 2 Ω Ω d ξ | f ^ ( ξ ) | 2 = 2 Ω f H 2
i.e., | f ( t ) | 2 Ω f H .
It follows from the relation: f ( t ) = L t ( f ) = K t , f H , established earlier, that the functional L t and the RKHS kernel function K t are bounded. Therefore, H is indeed an RKHS.
By choosing the kernel function in this case as: K t ( τ ) = sinc 2 π Ω ( τ t ) sin 2 π Ω ( τ t ) 2 π Ω ( τ t ) , and by taking lim Ω K t ( τ ) = δ ( τ t ) , it follows that as Ω , the Fourier transform of the kernel K t ( τ ) becomes
K t ^ ( ξ ) ) = d τ exp ( i 2 π ξ τ ) K t ( τ ) = d τ exp ( i 2 π ξ τ ) δ ( τ t ) ) = exp ( i 2 π ξ t )
This is a consequence of frequency modulation due to the time-shifting property of Fourier transform. Then, it follows by using Plancherel theorem that:
K t , f H = d τ K ¯ t ( τ ) f ( τ ) = d ξ K ^ ¯ t ( ξ ) f ^ ( ξ ) = d ξ f ^ ( ξ ) exp ( i 2 π ξ t ) = f ( t )
Thus, the reproducing property of the kernel is established as the cutoff frequency Ω .

4. Summary and Conclusions

This paper sheds light on some of the key concepts from functional analysis, which provide a unified mathematical framework for solving problems in engineering and applied sciences, and especially in modern signal processing. Additionally, the simple (and yet elegant) way, by which this framework facilitates formulation of different topics in signal processing along with several relevant examples, enables solving many problems in science and engineering by utilizing the concepts from the discipline of functional analysis. Some of the important results from functional analysis can find their ways to contribute for further advances in statistical signal processing and adaptive signal processing. Nevertheless one of the main difficulties in doing so is the existing gap in the terminologies and technological languages used in these two (apparently different) fields; and this paper attempts to (at least partially) bridge this gap.

Author Contributions

Conceptualization, N.F.G., A.R. and W.K.J.; methodology, N.F.G., A.R. and W.K.J.; software, N.F.G. and A.R.; formal analysis, N.F.G. and A.R.; model preparation and validation, N.F.G. and A.R.; data curation, N.F.G. and A.R.; writing—original draft preparation, N.F.G., A.R. and W.K.J.; writing—review and editing, N.F.G., A.R. and W.K.J.; funding acquisition, A.R. All authors have read and agreed to the published version of the manuscript.

Funding

The reported work has been supported in part by the U.S. Air Force Office of Scientific Research under Grant No. FA9550-15-1-0400, by the U.S. Army Research Office under Grant No. W911NF-20-1-0226, and by the U.S. National Science Foundation under Grant no. CNS-1932130. Findings and conclusions or recommendations, expressed in this publication, are those of the authors and do not necessarily reflect the views of the sponsoring agencies.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Appendix A. Preliminary Concepts

This appendix introduces several preliminary (but critical) concepts from real analysis, probability theory, and topology, mainly taken from Naylor and Sell [2], Royden [4], Stark and Woods [25], Ash [27], and Munkres [28]. These references are expected to be helpful, along with their cited materials therein, for appropriate understanding of key concepts that are frequently encountered in the discipline of modern signal processing.

Appendix A.1. Metric Spaces and Topological Spaces

This subsection introduces rudimentary concepts of metric and topological spaces. Details are available in the afore-mentioned standard textbooks.
Definition A1
(Metric Spaces). Let X be a non-empty set. A function ρ : X × X R , where R is the space of real numbers, is called a metric (or a distance function) on X if the following conditions hold for all x , y , z X :
(i) 
Positivity: ρ ( x , y ) 0 , and ρ ( x , y ) = 0 iff x = y ;
(ii) 
Symmetry: ρ ( x , y ) = ρ ( y , x ) ;
(iii) 
Triangular Inequality: ρ ( x , z ) ρ ( x , y ) + ρ ( y , z ) .
The pair ( X , ρ ) is called a metric space; if there is no ambiguity on ρ, the metric space is denoted only by X.
Example A1.
A well-known example of a metric space is the n-dimensional Euclidean space ( R n , ρ ) , where n is a positive integer, ρ ( x , y ) i = 1 n | x i y i | 2 for every vector x = [ x 1 , , x n ] T and y = [ y 1 , , y n ] T in R n .
Remark A1.
The set X, upon which the metric ρ can operate, is an arbitrary nonempty set. The conditions (i)–(iii) in Definition A1 are obvious if ρ operates on R n . However, in general, there can be other types of metric operators and X may not be R n ; an example is the Hamming distance defined on sets of symbol sequences, which is widely used in error correction theory to measure the distance between two code words.
Definition A2
(Open and Closed Sets). A set E X in a metric space ( X , ρ ) is called open if, for all y E , there exists an open ball B ε ( y ) { x E : ρ ( x , y ) < ε } , which is of radius ε > 0 with center at y. A set F X is called closed if the complement X F is open in ( X , ρ ) .
Definition A3
(Cauchy Sequence). A sequence { x n } in a metric space ( X , ρ ) is called a Cauchy sequence if ε > 0 n ( ε ) N such that ρ ( x k , x ) < ε k , > n . In other words, ρ ( x k , x ) 0 as k , .
Definition A4
(Completeness of a metric space). A metric space is called complete if every Cauchy sequence converges in the metric space,
Definition A5
(Sequential Compactness). A metric space ( X , ρ ) is said to be sequentially compact if every sequence of points { x 1 , x 2 , } in ( X , ρ ) contains a convergent subsequence { x n 1 , x n 2 , } , and it is called sequentially precompact if every sequence of points { x 1 , x 2 , } in ( X , ρ ) contains a Cauchy subsequence { x n 1 , x n 2 , } (see Definition A3 and also see [29]).
In contrast to metric spaces, where a distance function ρ on a set X is used to introduce key concepts (e.g., neighborhood, open set, closed set, and convergence), a more powerful approach is to specify a system in terms of open sets to introduce such properties; this leads to the notion of a topological space (e.g., [28]).
Definition A6
(Topological Spaces). A topology on a nonempty set X is a collection ℑ of subsets of X, which has the following properties:
1. 
ℑ must contain the empty set and the set X.
2. 
Any union of the members in ℑ, belonging to an arbitrary (i.e., finite, countable, or uncountable) (A set is defined to be countable if it is bijective to the set, N , of positive integers; and a finite or a countably infinite set is often called at most countable. An infinite set, which is not countable, is called uncountable. For example, the set of integers, Z R , is countable, while an interval ( a , b ) { x R : a < x < b } is uncountable. These concepts lead to the fundamental difference between “continuous-time (CT) analog" and “discrete-time (DT) digital” signal processing.) subcollection of sets must be contained in ℑ.
3. 
The intersection of the members of any finite subcollection of ℑ must be contained in ℑ.
Then, the pair ( X , ) is called a topological space, and the members of are called open sets of ( X , ) ; if B is an open set in ( X , ) , the complement of B (i.e., X B ) is called a closed set in ( X , ) . If there is no confusion regarding , then is often omitted from ( X , ) and only X is referred to as a topological space.
Definition A7
(Topological Basis). A basis B for a topology ( X , ) is a collection of open sets in ℑ, called basis elements, if the following two conditions hold:
1. 
For each x X , there exists at least one basis element B B such that x B .
2. 
If x B 1 B 2 , where B 1 , B 2 B , then there exists a basis element B 3 B such that x B 3 and B 3 B 1 B 2 .

Appendix A.2. Random Variables and Stochastic Processes

This subsection introduces rudimentary concepts of random variables and stochastic processes. Details are available in textbooks such as [25,27,30].
Definition A8
(algebra and σ -algebra). Let F be a (non-empty) collection of subsets of a (non-empty) set Ω having some or all of the following properties.
(a) 
Ω F .
(b) 
If A F then A c F , where A c Ω A .
(c) 
If A 1 , A 2 , , A n F then i = 1 n A i F .
(d) 
If A 1 , A 2 , F then i = 1 A i F ,
Then, F is called an algebra (or a field ) if the properties (a), (b) and (c) are true. If, in addition, the property (d) is true, then F is called a σ- algebra (or a σ- field ).
Remark A2.
The largest σ-algebra of a nonempty set Ω is the collection of all subsets of Ω, which is the power set 2 Ω . On the other hand, the smallest σ-algebra consists of two sets ϕ and Ω, i.e., the indiscrete σ-algebra { ϕ , Ω } .
Definition A9
(Borel Sets). Given a non-empty collection D of subsets of Ω, the smallest σ-algebra containing D is called the σ-algebra generated by D . The Borel σ -algebra B ( R ) is the σ-algebra generated by the collection of all open intervals { ( a , b ) : a , b R } in the usual topology of R . Members of B ( R ) are called Borel sets.
Definition A10
(Measure). A countably additive measure μ on a σ-algebra F is a non-negative, extended real-valued function on F such that if { A 1 , A 2 , } forms an at most countable (i.e., finite or countably infinite) collection of disjoint sets in F , then μ ( n A n ) = n μ ( A n ) . A measurable space is a pair ( Ω , F ), and a measure space is a triple ( Ω , F , μ ) , where Ω is a non-empty set, F is a σ-algebra of subsets of Ω, and μ is a measure on F . The sets in F are called measurable sets.
Example A2.
Let Ω = R n , where n N and the Borel set B ( R n ) is the associated σ-algebra. Then, μ : B ( R n ) [ 0 . ] is called the n-dimensional Lebesgue measure, and R n , B ( R n ) , μ is called the n-dimensional Lebesgue measure space. For n = 1 , i.e., in the 1-dimensional real space R , given an interval S B ( R ) , the measure μ ( S ) is the length of the interval S. Similarly, for two-dimensional (i.e., n = 2 ) and three-dimensional (i.e., n = 3 ) Lebesgue measures, μ ( S ) denotes the area and volume measures, respectively.
Definition A11
(Probability Spaces). If μ ( Ω ) = 1 , then μ is called a probability measure, usually denoted by P, and the triplet ( Ω , F , P ) is called a probability space.
Definition A12
(Measurable Functions). Let ( Ω 1 , F 1 ) and ( Ω 2 , F 2 ) be two measurable spaces. A function f : ( Ω 1 , F 1 ) ( Ω 2 , F 2 ) is called ( F 1 F 2 ) measurable if the inverse image f 1 ( A ) F 1 A F 2 . If Ω 2 = R and F 2 = B ( R ) then f is said to be Borel measurable.
Definition A13
(Random Variables and Random Sequences). A random variable X on a probability space ( Ω , F , P ) is a Borel measurable function from Ω to R . Similarly, a sequence of random variables { X 1 , X 2 , } is called a discrete random process.
Remark A3.
A function f : ( R , U ) ( R , U ) is continuous in the usual topology if f 1 ( A ) is open for every open set A U . Therefore, any continuous function is Borel measurable. Furthermore, a function f : R R which is continuous almost everywhere on R (i.e., except on a set of measure zero) is also Borel measurable. As another example, the unit step function on a compact set S R , which is discontinuous in R in the usual topology, is also a Borel-measurable function.
Now, we introduce the concept of the expected value of a random variable in a probability measure space ( Ω , F , P ) . Given a random variable X, the expected value of X is denoted as E [ X ] (see [25] or [30]). Along this line, two random variables X and Y on ( Ω , F , P ) are said to be equal in the mean square (ms) sense, denoted as: X = m s Y if E | X Y | 2 = 0 (see [25]). Similarly, two random variables X and Y on are said to be equal in the almost sure (as) sense, denoted as: X = a s Y , if X ( ζ ) Y ( ζ ) is allowed ζ S Ω such that P [ S ] = 0 .
Given a random process x ( t ) , the autocorrelation is defined as:
r x ( t , τ ) E [ x ( t ) x h e r ( τ ) ]
and the autocovariance is defined as
c x ( t , τ ) E [ ( x ( t ) E [ x ( t ) ] ) ( x ( τ ) E [ x ( τ ) ] ) h e r ]
where the superscript h e r , called Hermitian, indicates the complex conjugation of a complex variable, or the complex conjugation of transpose of a complex vector/matrix.
A random process x(t) is called stationary (in the strict sense) if its statistics are not affected by a time translation [25], i.e., x(t) and x( t + ε ) have the same statistics for any real number ε . A random process x(t) is said to be wide-sense stationary [7,25] if
  • The expected value E [ x ( t ) ] is a constant for all t;
  • The autocorrelation r x ( t , τ ) depends only on the difference ( t τ ) , not explicitly on both t and τ .

References

  1. Bachman, G.; Narici, L. Functional Analysis; Academic Press: New York, NY, USA, 1966. [Google Scholar]
  2. Naylor, A.; Sell, G. Linear Operator Theory in Engineering and Science, 2nd ed.; Springer-Verlag: New York, NY, USA, 1982. [Google Scholar]
  3. Rudin, W. Real and Complex Analysis; McGraw-Hill: Boston, MA, USA, 1987. [Google Scholar]
  4. Royden, H. Real Analysis, 3rd ed.; Macmillan: New York, NY, USA, 1989. [Google Scholar]
  5. Kreyszig, E. Introductory Functional Analysis with Applications; John Wiley & Sons: Hoboken, NJ, USA, 1978. [Google Scholar]
  6. Bobrowski, A. Functional Analysis for Probability and Stochastic Processes; Cambridge University Press: Cambridge, UK, 2005. [Google Scholar]
  7. Hayes, M. Statistical Digital Signal Processing and Modeling, 1st ed.; Wiley: Hoboken, NJ, USA, 1996. [Google Scholar]
  8. Haykin, S. Adaptive Filter Theory, 4th ed.; Prentice Hall: Englewood Cliffs, NJ, USA, 2002. [Google Scholar]
  9. Farhang-Boroujeny, B. Adaptive Filters Theory and Applications; John Wiley & Sons: Hoboken, NJ, USA, 2003. [Google Scholar]
  10. Bressan, A. Lecture Notes on Functional Analysis with Applications to Linear Partial Differential Equations; American Mathematical Society: Providence, RI, USA, 2013. [Google Scholar]
  11. Reed, M.; Simon, B. Methods of Modern Mathematical Physics Part 1: Functional Analysis; Academic Press: Cambridge, MA, USA, 1980. [Google Scholar]
  12. Luenberger, D. Optimization by Vector Space Methods; John Wiley & Sons: Hoboken, NJ, USA, 1969. [Google Scholar]
  13. Desoer, C.; Vidyasagar, M. Feedback Systems: Input-Output Properties; Academic Press: Cambridge, MA, USA, 1975. [Google Scholar]
  14. Therrien, C. Discrete Random Signals and Statistical Signal Processing; Prentice Hall: Englewood Cliffs, NJ, USA, 1992. [Google Scholar]
  15. Proakis, J.; Manolakis, D. Digital Signal Processing: Principles, Algorithms, and Applications, 3rd ed.; Macmillan Publishing Company: New York, NY, USA, 1998. [Google Scholar]
  16. Oppenheim, A.; Schafer, R. Discrete-Time Signal Processing; Prentice Hall: Englewood Cliffs, NJ, USA, 1989. [Google Scholar]
  17. Vapnik, V.; Izmailov, R. Rethinking statistical learning theory: Learning using statistical invariants. Mach. Learn. 2019, 108, 381–423. [Google Scholar] [CrossRef] [Green Version]
  18. Ghalyan, N.F.; Ray, A. Symbolic Time Series Analysis for Anomaly Detection in Measure-invariant Ergodic Systems. J. Dyn. Syst. Meas. Control. 2020, 142, 061003. [Google Scholar] [CrossRef]
  19. Ghalyan, N.F.; Ray, A. Measure invariance of symbolic systems for low-delay detection of anomalous events. Mech. Syst. Signal Process. 2021, 159, 107746. [Google Scholar] [CrossRef]
  20. Lorch, E. Spectral Analysis; Oxford University Press: New York, NY, USA, 1962. [Google Scholar]
  21. Kaiser, G. A Friendly Guide to Wavelets; Birkhauser: Boston, MA, USA, 1994. [Google Scholar]
  22. Mallat, S. A Wavelet Tour of Signal Processing: The Sparse Way, 3rd ed.; Academic Press: Amsterdam, The Netherlands, 2009. [Google Scholar]
  23. Ray, A. On State-space Modeling and Signal Localization in Dynamical Systems. ASME Lett. Dyn. Syst. Control. 2022, 2, 011006. [Google Scholar] [CrossRef]
  24. Vetterli, M.; Kovacevic, J. Wavelets and Subband Coding; Prentice-Hall, Inc.: Hoboken, NJ, USA, 1995. [Google Scholar]
  25. Stark, H.; Woods, J. Probability and Random Processes with Applications to Signal Processing; Prentice-Hall: Upper Saddle River, NJ, USA, 2002. [Google Scholar]
  26. Helstrom, C. Elements of Signal Detection and Estimation; Prentice Hall: Englewood Cliffs, NJ, USA, 1995. [Google Scholar]
  27. Ash, R. Real Analysis and Probability; Academic Press: Boston, MA, USA, 1972. [Google Scholar]
  28. Munkres, J. Topology, 2nd ed.; Prentice-Hall: Upper Saddle River, NJ, USA, 2000. [Google Scholar]
  29. Shilov, G. Elementary Real and Complex Analysis; Dover Publication Inc.: Mineola, NY, USA, 1996. [Google Scholar]
  30. Papoulis, A. Probability, Random Variables, and Stochastic Processes, 2nd ed.; McGraw-Hill, Inc.: Boston, MA, USA, 1984. [Google Scholar]
Figure 1. Relationship among different spaces in functional analysis.
Figure 1. Relationship among different spaces in functional analysis.
Sci 04 00040 g001
Figure 2. An adaptive filter consisting of a shift-variant filter h with an adaptive algorithm for updating the filter coefficients.
Figure 2. An adaptive filter consisting of a shift-variant filter h with an adaptive algorithm for updating the filter coefficients.
Sci 04 00040 g002
Figure 3. Innovations representation of a random process. (a) Signal model. (b) Inverse filter.
Figure 3. Innovations representation of a random process. (a) Signal model. (b) Inverse filter.
Sci 04 00040 g003
Figure 4. Implementation of level-j and level- ( j + 1 ) MRA filter banks.
Figure 4. Implementation of level-j and level- ( j + 1 ) MRA filter banks.
Sci 04 00040 g004
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Ghalyan, N.F.; Ray, A.; Jenkins, W.K. A Concise Tutorial on Functional Analysis for Applications to Signal Processing. Sci 2022, 4, 40. https://doi.org/10.3390/sci4040040

AMA Style

Ghalyan NF, Ray A, Jenkins WK. A Concise Tutorial on Functional Analysis for Applications to Signal Processing. Sci. 2022; 4(4):40. https://doi.org/10.3390/sci4040040

Chicago/Turabian Style

Ghalyan, Najah F., Asok Ray, and William Kenneth Jenkins. 2022. "A Concise Tutorial on Functional Analysis for Applications to Signal Processing" Sci 4, no. 4: 40. https://doi.org/10.3390/sci4040040

Article Metrics

Back to TopTop