Iterative Pilot-Based Reference Frame Estimation for Improved Data Rate in Two-Dimensional Display Field Communications

Kim, Byung Wook; Singh, Pankaj; Jung, Sung-Yoon

doi:10.3390/app13179916

Open AccessArticle

Iterative Pilot-Based Reference Frame Estimation for Improved Data Rate in Two-Dimensional Display Field Communications

by

Byung Wook Kim

^1,†

,

Pankaj Singh

^2,†

and

Sung-Yoon Jung

^2,*

¹

Department of Information and Communication Engineering, Changwon National University, Changwon 51140, Republic of Korea

²

Department of Electronic Engineering, Yeungnam University, Gyeongsan 38541, Republic of Korea

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Appl. Sci. 2023, 13(17), 9916; https://doi.org/10.3390/app13179916

Submission received: 20 March 2023 / Revised: 15 April 2023 / Accepted: 19 April 2023 / Published: 1 September 2023

(This article belongs to the Special Issue Optical Camera Communications and Applications)

Download

Browse Figures

Versions Notes

Abstract

:

Recently, display-to-camera (D2C) communication, including display field communication (DFC), has gained attention due to advancements in display technology and the widespread availability of cameras in handheld devices. In this study, we proposed an iterative pilot-based reference-frame estimation scheme to increase the data rate of a 2D-DFC system. To estimate the reference frame, pilot symbols are inserted between the data symbols of the transmitted image frames. Using pilot symbols, we can compensate for the distortion in the received frame and estimate the data pixels of the reference frames. After the first iteration, we use some of the data symbols as virtual pilot symbols for the next iteration. This process is repeated using both the original and virtual pilots; furthermore, by conducting several iterations, all the data pixels of the reference frame are estimated to reconstruct the reference frame. Simulation results show that the proposed scheme significantly boosts the achievable data rate of the 2D-DFC communication system by almost twofold, while maintaining the unobtrusiveness of the display.

Keywords:

display-to-camera communication; display field communication; iterative channel estimation; image reconstruction

1. Introduction

Display-to-camera (D2C) communication [1,2,3] is an application of visible light communication (VLC) [4], in which an LCD and a camera sensor can communicate for device-to-device communication. Owing to the vast popularity of mobile devices and the widespread availability of displays, D2C communication is the next promising candidate with the potential to replace conventional approaches, such as QR codes and 2D barcodes. In D2C communication, information is encoded on the display screens of smartphones, laptops, advertisement boards, etc. Another device with a camera sensor (such as a smartphone) can capture the screen and decode data using image analysis. Conventional approaches require space and are obtrusive to the human eye. For instance, a QR code placed at the corner of a commercial advertisement is not visually aesthetic and produces distractions. Second, the data transmission capability of a QR code, such as the URL of a product’s homepage, is extremely limited. Although increasing attention is paid to embedding images into QR codes to mitigate the above limitations [5,6,7], a new approach of embedding the data directly into the image frames of a display might completely replace the conventional approach. This can be attributed to the inherent advantages of D2C communication, such as higher data rates and unobtrusiveness of the display to normal human viewers. D2C communication has the potential to enable a wide range of applications in areas, such as security [8,9], healthcare, and smart homes [10,11]. A key advantage of D2C communication is its security. Because transmission occurs through light, it is much more difficult to intercept the signal compared to other wireless communication methods such as radio frequency or Bluetooth. This makes D2C communication particularly useful for applications such as mobile banking, access-control systems, and healthcare devices. In addition, D2C communication can enable new types of interactive experiences. For example, it can be used to allow users to interact with displays in public spaces by simply pointing their cameras at the screen. This could open new possibilities for interactive advertising, digital signage, and other types of public information displays [12].

Approaches for embedding data into image frames can be broadly divided into two categories: spatial domain and spectral domain embedding. For example, in the work of Wang et al. [13], information bits are carried with spatially complementary visual patterns assembled into complementary temporal frames. This study uses the concept of complementary frames displayed at a high frame rate to ensure a normal viewing experience. HiLight [14] encodes data into pixel translucency changes for any screen content using an alpha channel. Here, an additional image layer (a black matte, fully transparent) is created on top of the content image layer, which is dedicated to data communication and is referred to as the communication layer. To transmit the data, the communication layer was divided into grids, and data were encoded into the pixel translucency change of each grid without affecting the user’s viewing experience.

By contrast, data can be embedded in the spectral domain of an image [15]. Spectral-domain data embedding captures the characteristics of the human visual system better [16]. In other words, spectral domain techniques can take advantage of the fact that the human eye perceives different parts of the spectrum differently, allowing selective embedding in less perceptually important regions. In addition, spectral domain techniques can be more robust to compression and other signal processing algorithms that often affect the spatial domain more strongly than the spectral domain [17]. One of the spectral-domain data embedding approaches is display field communication (DFC) [18,19], where data are embedded into the frequency domain of an image by employing the properties associated with the frequency coefficients of an image. The DFC approach was robust against visual artifacts observed on the screen, even with a lower framerate.

In a previous study [20], we experimentally implemented a 1D-DFC approach based on 1D discrete cosine transform (DCT) and machine learning. The paper evaluates the proposed scheme using an actual DFC link and demonstrates the practical implementation and performance of our approach for various system design parameters. In particular, we first adopted DCT to transform a spatial-domain image into its spectral-domain equivalent. Addition allocation and subtraction data retrieval techniques were used to reduce computational complexity during the data-embedding process. Moreover, channel coding was applied to overcome the data errors caused by D2C wireless channel. After capturing the displayed image using a camera, the display region was extracted using the a object detection deep learning technique. Extensive real-world experiments were performed, considering various geometric distortions, noise, and different standard input images.

Although DFC is robust against visual artifacts at low frame rates and perspective distortion, it uses reference frames that are used at the camera receiver to decode the data-embedded frames correctly. Although complementary (or reference) frames compensate for the visual artifacts on the screen and assist in data decoding, they significantly diminish the data rate of the overall system. To address this problem, it is essential to estimate the reference frame at the receiver end. Our previous study introduced a comparable method for reconstructing reference frames in one-dimensional DFC systems [21]. In this study, we extend the iterative spectral image estimation approach to enhance the data rate of two-dimensional DFC (2D-DFC) systems. To estimate the channel, we first embedded pilot pixels into data-embedded frames. Using these pilot pixels, we obtained the least squares (LS) estimate, followed by interpolation of the channel at the data pixels based on the LS estimate of the pilot signals. Subsequently, we commenced the iteration process by assuming the correctness of the decoded symbols and using some of them as virtual pilots in the next iteration to reestimate the information pixels. This iteration is repeated several times to estimate the information pixels more accurately. After fully estimating the reference image, we employ a zero-forcing (ZF) receiver to demodulate the data. The simulation results demonstrate that the proposed scheme outperforms the conventional 2D-DFC scheme by a factor of two in terms of the achievable data rate (ADR) while also maintaining the unobtrusiveness of the display by embedding the data in the high-frequency regions of the transmit frames. Additionally, we conducted simulations that considered perspective distortion, which refers to the misalignment of cameras and displays. This implies that the camera and display are not perfectly aligned and can cause distortions in the visual output. This simulation provides an opportunity to test the feasibility of conducting experiments under similar conditions.

The remainder of this paper is organized as follows. Section 2 provides an overview of the pilot-based 2D-DFC system, including the data embedding and pilot insertion mechanism. In Section 3, we propose an iterative scheme for reconstructing the reference image along with virtual pilot pixel selection criteria. Section 4 presents the simulation results in terms of the symbol error rate (SER), achievable data rate (ADR), and peak signal-to-noise ratio (PSNR) for various system design criteria. In addition, we performed simulations considering the misalignment between the camera and display, and compared our proposed scheme with the conventional 2D-DFC scheme. Finally, Section 5 concludes the paper.

2. System Description

The DFC scheme involves pointing a digital camera on an electronic screen to capture the display output [18]. The 2D-DFC scheme involves embedding the data in two dimensions of an image frame [19]. Figure 1 shows a typical block diagram of a 2D-DFC system with pilot signal assistance. As shown in Figure 1, at the transmitter, the modulator maps binary input data bits to data symbols, which are then embedded into the spectral domain of a image. Then, the 2D-inverse discrete Fourier transform (IDFT) was applied, and the resulting image was displayed on the screen. In conventional 2D-DFC, reference image frames (without data embedding) are inserted between neighboring data-embedded frames to minimize the visual artifacts that may be visible to the human eye. However, in the current system model, reference frames were not transmitted.

At the receiver, the camera captures a sequence of images from the screen, which are then transformed from the spatial domain into the frequency domain using a 2D-discrete Fourier transform (DFT). In the first iteration, the information symbols are decoded based on pilot observations. Once all the information symbols were decoded and presumed to be accurate, some pixels were used as virtual pilots for the second iteration, and the spectral domain image was reconstructed accordingly. The subsequent iterations refine the image pixel estimates using the feedback information symbol estimates.

2.1. Data Embedding

The DFC has a data-embedding approach that operates in the frequency domain. To begin the process, the image frames are first converted into the spectral domain using 2D-DFT, as shown in Figure 1. The DFC scheme exploits the fact that the information content of an image is concentrated in specific regions of its frequency domain representation. Specifically, the amplitudes of the low-frequency components are located at the corners of the 2D spectrum, whereas those of high-frequency components are situated at the center [18]. In other words, the low-frequency components containing important information about the image are concentrated at the four corners of the spectral-domain image. This characteristic allows the DFC scheme to use the areas surrounding the corners for data and pilot embedding. These areas were chosen because they offer a means of minimizing the perceptual image distortion that may occur owing to data embedding. Mathematically, the 2D-DFT of a spatial-domain image,

I_{t}

, of size

P \times Q

can be taken as

I_{F} = F_{P} I_{t} F_{Q},

(1)

where

I_{F}

represents the spectral domain image, and

F_{P}

and

F_{Q}

represent

P \times P

and

Q \times Q

DFT matrices, respectively.

Simultaneously, the binary information bits are modulated using a modulation scheme to embed them into a frequency-domain image. To ensure the invisibility of the embedded data and maintain the real and positive values of the gray-scale intensity components of the data-embedded image, the data components must exhibit conjugate symmetric properties. Thus, the data matrix at the ith time index

X [i]

is defined as

X [i] = [\begin{matrix} x_{{1, 1}} & x_{{1, 2}} & \dots & x_{{1, s_{q} + L_{q}}} \\ ⋮ & ⋱ & ⋮ \\ x_{{s_{p} + L_{p}, 1}} & \dots & x_{{s_{p} + L_{p}, s_{q} + L_{q}}} \end{matrix}],

(2)

where

x_{{p, q}}

denotes the data element at the

(p, q)

-pixel position. The starting pixel values at which the data are embedded in a row and column are denoted by

s_{p}

and

s_{q}

, respectively. The pixel widths at which the data and pilots are loaded in the row and column directions are denoted as

L_{p}

and

L_{q}

, respectively [19].

In the 2D-DFC scheme, the data embedding is performed by multiplying the data coefficient with the frequency component of the image. Therefore, a frequency-domain image loaded with data at the ith time index, denoted by

D_{F} [i]

, can be expressed as

D_{F} [i] = [\begin{matrix} I_{F 1} [i] \circ X [i] & I_{F 2} [i] & I_{F 3} [i] \\ I_{F 4} [i] & I_{F 5} [i] & I_{F 6} [i] \\ I_{F 7} [i] & I_{F 8} [i] & I_{F 9} [i] \circ flip (\bar{X} [i]) \end{matrix}],

(3)

where ∘ represents the Hadamard product operator,

I_{F 1 - F 9}

represents the nine subimages made out of the original image

I_{F}

, and

flip (\cdot)

is the flip operation. The operation

flip (\bar{X} [i])

is expressed as follows:

flip (\bar{X} [i]) = [\begin{matrix} {\bar{x}}_{{s_{p} + L_{p}, s_{q} + L_{q}}} & \dots & {\bar{x}}_{{s_{p} + L_{p}, 1}} \\ ⋮ & ⋱ & ⋮ \\ {\bar{x}}_{{1, s_{q} + L_{q}}} & \dots & {\bar{x}}_{{1, 1}} \end{matrix}],

(4)

where

\bar{x}

denotes the conjugate of x.

The above equation indicates that only the top left and bottom right corners of the frequency-domain image are utilized for embedding the data. Furthermore, the elements of the data matrix, denoted by

x_{{p, q}}

, can be expressed as

x_{{p, q}} = \{\begin{matrix} 1, & p < s_{p}, q < s_{q} \\ x_{{p, q}}, & otherwise \end{matrix} .

(5)

This equation indicates that the portion of the frequency-domain image where the low-frequency information is concentrated is set to one, while the remaining portion is utilized for embedding data. Finally, as depicted in Figure 1, the 2D-IDFT operation was employed to convert the frequency-domain image back into the spatial domain to be displayed on the electronic screen. Mathematically, this can be expressed as

D_{t} [i] = F_{P}^{H} D_{F} [i] F_{Q}^{H},

(6)

where

F_{P}^{H}

and

F_{Q}^{H}

are Hermitian transposes of the 2D-DFT matrices. Thus, the spatial domain image is rendered through the electronic display, and the data are transmitted simultaneously through the image.

Figure 2 depicts the data embedded in the frequency-domain image and their effect on the spatial-domain image. Figure 2a illustrates the location of the sub-band, where the data and pilots are loaded. In particular, the white region in the frequency-domain image represents the sub-band region, where the data and pilots are embedded. As mentioned previously, to introduce fewer detectable artifacts, the data and pilots were loaded in the high-frequency range. The low-frequency components of an image generally exhibit smooth color variations, whereas the high-frequency components exhibit sharp variations. Because the low-frequency regions of an image contain the primary perception of the human eye, it is preferable to embed the data in the high-frequency region. In this way, during the simultaneous transmission of images on the screen, any image distortions in sequential images are almost imperceptible to the human eye. Figure 2b shows the corresponding spatial-domain image. For comparison, a data-embedded image (without pilots) is displayed in Figure 2c [19]. Both images appear similar in the spatial domain, and even minor differences are imperceptible to the human eye.

2.2. Pilot Insertion

The embedding of data in the DFC system involves the insertion of uniformly spaced pilot symbols within each data matrix

X [i]

, where

N_{p}

pilots are inserted. Each data matrix consists of

s_{p} + L_{p}

pixels per column, divided into

N_{p}

groups, with each group containing B adjacent vertical pixels. In each group, the first pixel is dedicated to transmitting the pilot signal. Thus, the DFC data matrix can be represented as

X [i] = [\begin{matrix} x_{{1, 1}}^{p} & x_{{1, 2}}^{p} & \dots & x_{{1, s_{q} + L_{q}}}^{p} \\ x_{{2, 1}} & x_{{2, 2}} & \dots & x_{{2, s_{q} + L_{q}}} \\ ⋮ & ⋱ & ⋮ \\ x_{{m, 1}}^{p} & x_{{m, 2}}^{p} & \dots & x_{{m, s_{q} + L_{q}}}^{p} \\ ⋮ & ⋱ & ⋮ \\ x_{{s_{p} + L_{p}, 1}} & \dots & x_{{s_{p} + L_{p}, s_{q} + L_{q}}} \end{matrix}],

(7)

where

B = (s_{p} + L_{p}) / N_{p}

, and

s_{p} + L_{p} = N_{p} + N_{i}

. Here,

N_{i}

denotes the total number of information symbols per column in the data matrix.

Figure 3 illustrates the insertion of pilot symbols and data symbols into the frequency domain sub-image, where the pilot symbols are shown to be uniformly inserted in each column. The DFC symbol modulation on the lth pixel can be expressed as

X [i] = X (m B + l) = \{\begin{matrix} X^{p} (m) & l = m B + 1 \\ data pixel & l = elsewhere \end{matrix},

(8)

where

l = 1, 2, \dots, L

and

X^{p} (m)

represents the mth pilot symbol.

3. Iterative Image Estimation

Assuming perfect alignment between the data transmitting screen and camera, the images received through the D2C link can be represented as

Y_{t} [i] = H_{t} [i] * D_{t} [i] + N_{t} [i],

(9)

where ∗ denotes the convolution operation,

Y_{t}

is the received data-embedded image,

H_{t}

is the reference or channel image, and

N_{t}

is the additive white Gaussian noise (AWGN) matrix. After receiving the images through the D2C link, they are converted into the frequency domain using the transformation matrices denoted by

F_{P}

and

F_{Q}

. This can be mathematically represented as

\begin{matrix} Y_{F} [i] & = F_{P} Y_{t} [i] F_{Q} \\ = F_{P} [H_{t} [i] D_{t} [i]] F_{Q} + [F_{P} N_{t} [i] F_{Q}] \\ = H_{F} [i] D_{F} [i] + N_{F} [i], \end{matrix}

(10)

where

Y_{F} [i]

denotes the frequency-domain image;

H_{F} [i]

denotes the Fourier transform of

H_{t} [i]

; and

D_{F} [i]

denotes the Fourier transform of the captured image, denoted by

D_{t} [i]

. The noise matrix in the frequency domain is denoted as

N_{F} [i]

.

The pilot signals

{Y_{F}^{p} [i]}

are extracted from the received frequency-domain image sequence

{Y_{F} [i]}

and the channel image

{H_{F} [i]}

is obtained from the information conveyed by the extracted pilot signals

{H_{F}^{p} [i]}

. With the knowledge of the channel response

{H_{F} [i]}

, the transmitted data samples

{D_{F} [i]}

can be recovered using the ZF receiver, given by

{\hat{D}}_{F} [i] = \frac{Y_{F} [i]}{{\hat{H}}_{F} [i]},

(11)

where

{\hat{H}}_{F} [i]

is the estimate of a channel image. After demodulation, the reconstructed source binary information data pixels are obtained at the receiver output.

3.1. Pilot Signal Estimation

The pilot signals are uniformly distributed within each column of the data-embedded images. Consequently, because the pilot signal is present only in certain pixels, the channel response of the nonpilot (or information) pixels must be estimated by interpolating the neighboring pilot pixels. As stated previously, the pilot pixels were first extracted from the received image frame, and the channel response was estimated using both the received and known pilot pixels. The channel response of the data-bearing pixels is then interpolated using the neighboring pilot channel responses. For simplicity, we consider the first DFC symbol without loss of generality. Let

H_{F}^{p} = {[H_{F}^{p} (0), H_{F}^{P} (1), \dots, H_{F}^{p} (N_{p} - 1)]}^{T}

(12)

be the response of pilot pixels, and

Y_{F}^{p} = {[Y_{F}^{p} (0), Y_{F}^{P} (1), \dots, Y_{F}^{p} (N_{p} - 1)]}^{T}

(13)

be a vector of the received pilot signals, both of size

N_{p} \times 1

, where

N_{p}

denotes the number of pilot pixels. The received pilot signal vector

Y_{F}^{p}

is expressed as follows:

Y_{F}^{p} = D_{F}^{p} H_{F}^{p} + N_{F}^{p} .

(14)

Then, the estimate of the pilot pixels based on the least squares (LS) criterion is given by

\begin{matrix} {\hat{H}}_{F}^{p} & = {[H_{F}^{p} (0), H_{F}^{p} (1), \dots, H_{F}^{p} (N_{p} - 1)]}^{T} \\ = {[D_{F}^{p}]}^{- 1} Y_{F}^{p} \\ = {[\frac{Y_{F}^{p} (0)}{D_{F}^{p} (0)}, \frac{Y_{F}^{p} (1)}{D_{F}^{p} (1)}, \dots, \frac{Y_{F}^{p} (N_{p} - 1)}{D_{F}^{p} (N_{p} - 1)}]}^{T} . \end{matrix}

(15)

3.2. Data Pixel Interpolation

After estimating the image pixels at the pilot tones, the channel responses of the data pixels were interpolated using the adjacent pilot tones. In this study, the piecewise-cubic interpolation method was considered because it provides a better fit to the channel response and produces a smooth and continuous polynomial that is fitted to the given pixel points. The interpolator is defined as [22]

\begin{matrix} {\hat{H}}_{F} [i] & = {\hat{H}}_{F} [m B + l] \\ = α_{1} {\hat{H}}_{F}^{p} [m + 1] + α_{0} {\hat{H}}_{F}^{p} [m] + B α_{1} {\hat{H^{'}}}_{F}^{p} [m + 1] - B α_{0} {\hat{H^{'}}}_{F}^{p} [m], \end{matrix}

(16)

where

m = 0, 1, \dots, N_{p} - 1

. The term

{\hat{H^{'}}}_{F}^{p} [m]

represents the first order derivative of

{\hat{H}}_{F}^{p} [m]

. The coefficients

α_{1}

and

α_{0}

are given by

α_{1} = \frac{3 {(B - l)}^{2}}{B^{2}} - \frac{2 {(B - l)}^{3}}{B^{3}}

(17)

and

α_{0} = \frac{3 l^{2}}{B^{2}} - \frac{2 l^{3}}{B^{3}},

(18)

respectively.

3.3. Image Re-Estimation Using Virtual Pilots

In this section, we discuss an image estimation method that exploits virtual pilot pixels and pilot pixels in the re-estimation of the image. As illustrated in Figure 3, the virtual pilot pixels were chosen from among the available data pixels obtained after the initial demodulation. The selection of virtual pilot pixels was based on two conditions. First, the magnitude of the selected data pixels should be sufficiently large to ensure their suitability for image estimation. Second, the channels for the virtual pilot pixels must be highly correlated with those for the pilot pixels; otherwise, they would not contribute to improving the quality of the data pixel estimates. By incorporating the selected virtual pilot pixels with the original pilot signals, the image is re-estimated, and the newly generated reference image estimate is used for symbol detection in the subsequent iteration. This process was repeated until suitable termination conditions were achieved. Figure 3 shows the locations of the pilot and virtual pilot pixels in the upper sub-band of the frequency-domain image, which is represented in Figure 2a. Here,

d_{1}

represents the first column used to embed the data, with

s_{p}

and

s_{q}

denoting the starting pixel locations for the row and column, respectively. A total of

L_{p}

and

L_{q}

data bits were embedded in the upper sub-band of the frequency-domain image.

Let

N_{v}

denote the number of virtual pilot pixels utilized for the reference image re-estimation. The virtual pilot observations can be expressed in vector form as

Y_{F}^{v} = D_{F}^{v} H_{F}^{v} + N_{F}^{v},

(19)

where the data symbols in

D_{F}^{v}

are unknown a priori to the receiver and should, therefore, be chosen from among all the available data pixels. By stacking the pilot observation vector

Y_{F}^{p}

and the virtual pilot observation

Y_{F}^{v}

, the observation vector for image re-estimation can be obtained as follows:

[\begin{matrix} Y_{F}^{p} \\ Y_{F}^{v} \end{matrix}] = [\begin{matrix} D_{F}^{p} \\ D_{F}^{v} \end{matrix}] [\begin{matrix} H_{F}^{p} \\ H_{F}^{v} \end{matrix}] + [\begin{matrix} N_{F}^{p} \\ N_{F}^{v} \end{matrix}] .

(20)

For the next iteration, the LS estimate of

Y_{F}^{v}

is added to the virtual pilots. The LS estimate of the new pilot observation vector can be expressed as:

[\begin{matrix} {\tilde{H}}_{F}^{p} \\ {\tilde{H}}_{F}^{v} \end{matrix}] = {[\begin{matrix} D_{F}^{p} \\ D_{F}^{v} \end{matrix}]}^{- 1} [\begin{matrix} Y_{F}^{p} \\ Y_{F}^{v} \end{matrix}],

(21)

yielding the pixel estimates at the pilot and virtual pilot positions in the first iteration. The pixel estimates for the remaining data symbols are calculated using the interpolation method discussed in Section 3.2. The image is then fully re-estimated using the pilots and virtual pilots, and the information pixels are demodulated using the ZF receiver as

\begin{matrix} {\tilde{D}}_{F}^{i} & = \frac{Y_{F}^{i}}{{\tilde{H}}_{F}^{i}} \\ = {[\frac{Y_{F}^{i} (0)}{{\tilde{H}}_{F}^{i} (0)}, \frac{Y_{F}^{i} (1)}{{\tilde{H}}_{F}^{i} (1)}, \dots, \frac{Y_{F}^{i} (N_{p} - N_{v} - 1)}{{\tilde{H}}_{F}^{i} (N_{p} - N_{v} - 1)}]}^{T} . \end{matrix}

(22)

As Equation (21) relies on both the pilot and virtual pilot tones, it is anticipated that an improved reference image estimation, and therefore an enhanced SER, can be achieved with increasing iterations as the virtual pilot pixels become more refined.

4. Simulations

The performance of the proposed reference-frame estimation scheme was evaluated in the presence of an AWGN channel. The simulation environment used in this study is similar to that used in [21]. Specifically, a 2D-DFC system with a

256 \times 256

-pixel data-embedded Lena image displayed on the screen was utilized (cf. Figure 2b). The camera was perfectly aligned with the screen to prevent energy loss. BPSK modulation was employed on data symbols with uniformly spaced pilots, and the SER, ADR, and PSNR at the camera decoder output were used as performance metrics. The ADR was computed using the following formula:

ADR = 30 A N_{i} (1 - SER),

(23)

where A is the pixel area for data embedding and

N_{i}

is the number of information symbols per frame. A standard off-the-shelf camera receiver of 30 fps was considered, and the data were embedded in every frame. Of all the data pixels, a total of

10 %

were designated as pilot pixels, and a maximum of five iterations were performed. However, the iteration was terminated when no significant improvement in the overall 2D-DFC performance was observed. The data were embedded in the high-frequency region of the Lena image to minimize visual artifacts on the screen, because a reference image frame was not transmitted. The starting pixel values of the symbols (

s_{p}

and

s_{q}

) and number of embedded data symbols (

L_{p}

and

L_{q}

) were set as 90 and 30, respectively.

Figure 4 illustrates the SER performance of the proposed reference-frame estimation scheme. As the number of iterations of the scheme increases, the SER gradually approaches that of the ideal scheme, that is, conventional 2D-DFC. This can be attributed to the iterative refinement of the pixel estimate output at both the pilot and virtual pixels, which resulted in better interpolated values for the information symbols. Furthermore, we observed that, as the SNR increased, the proposed method achieved significant performance improvements, and the virtual pilot pixels became more accurate with increasing iterations. The uniqueness of the scheme lies in its ability to improve the performance iteratively by utilizing both pilot and virtual pilot pixels while keeping the pilot density low at only

10 %

.

Figure 5 depicts the ADR of the proposed reference frame estimation scheme, which is the primary motivation for this study. We observed that, even without iterations, the data rate was higher than that of the conventional 2D-DFC scheme. Furthermore, with increasing number of iterations, the data rate becomes significantly higher, nearly doubling. The primary reason for this improvement is the elimination of reference frames in the proposed scheme. In conventional 2D-DFC, a reference frame is employed to decode each data frame at the receiver, whereas, in the proposed scheme, the reference frame is estimated at the receiver using transmitting pilots, thereby eliminating the need for reference frames. To enhance the performance, virtual pilots are further used iteratively. This demonstrates that the use of reference frames severely limits the data rate of 2D-DFC systems, and the proposed scheme enhances the performance by eliminating their use. The iterations were terminated after no significant improvement in the data rate was observed.

Figure 6 shows the PSNR as a function of the number of iterations. PSNR was used to measure the quality of the reconstructed image compared to the original image. Higher PSNR values indicate better image quality. To compute the PSNR between images, we first computed the mean square error (MSE) as

MSE = \frac{1}{P Q} \sum_{P, Q} {[{\tilde{D}}_{t} [i] - D_{t} [i]]}^{2},

(24)

where

{\tilde{D}}_{t} [i] (= F_{P}^{H} {\tilde{D}}_{F} [i] F_{Q}^{H})

is the reconstructed image in the spatial domain and

D_{t} [i]

is the transmitted spatial domain image given by (6). We can then compute the PSNR between the transmitted and reconstructed images as:

PSNR = 10 {log}_{10} (\frac{R^{2}}{MSE}),

(25)

where R is the maximum pixel value.

As shown in Figure 6, the proposed iterative 2D-DFC scheme ensures the perceptual unobtrusiveness of the data embedding, as the reconstructed image quality improves with an increasing number of iterations. The visual features of the image were primarily located at low frequencies, whereas details and noise were present at higher frequencies. Because data embedding occurs in the high-frequency subband, visual artifacts are hardly noticeable. Although the proposed iterative method results in a slightly reduced PSNR performance compared to the conventional method, the data rate is nearly doubled. Therefore, the proposed method is beneficial for applications requiring high-speed data transmission through D2C links.

Perspective Distortion

Although the above results are based on the assumption of perfect alignment between the camera and display, in a real-life situation, a camera may not always be aligned frontally with the display screen, and there may be instances of distortion due to the tilting or rotation of the camera relative to the display. This type of distortion degrades DFC performance and is modeled as a perspective distortion [23]. Perspective distortion causes straight lines in the scene to appear curved or skewed in the resulting image and objects farther away from the camera appear to be smaller. In the simulations, the perspective transformation matrix is computed using the geometric transformation of a set of matched control points. A projective transformation can be represented by a

3 \times 3

matrix known as a homography matrix. Given a point in homogeneous coordinates, which is represented as a

3 \times 1

column vector

{[x, y, w]}^{T}

, a projective transformation can be represented as

{[x^{'}, y^{'}, w^{'}]}^{T} = H {[x, y, w]}^{T},

(26)

where

{[x^{'}, y^{'}, w^{'}]}^{T}

is the transformed point in homogeneous coordinates, and

H

is the homography matrix. The homography matrix

H

can be computed from a set of corresponding points in the two images using a method known as direct linear transformation (DLT). Given n corresponding points, the homography matrix can be computed by solving a system of linear equations of the form:

A \times h = 0,

(27)

where

A

is a 2n × 9 matrix,

h

is a 9 × 1 column vector containing the elements of

H

in row-major order, and the symbol “×” denotes the vector cross-product. When the homography matrix is computed, it can be used to transform the transmitted images.

At the camera receiver, the boundaries of the electronic display should be accurately detected before data retrieval. Harris corner detection and Hough transform can be used to recognize the borders of the display for precise image alignment. The distorted image can then be resized to its original size by obtaining missing pixels using interpolation. The image is then restored using a homography estimation.

Figure 7 illustrates the effects of perspective distortion on an image, and the corresponding recovery using our proposed scheme. As shown in Figure 7a, the image was subjected to skewing and rotation, particularly at the edges. Figure 7b shows the corrected image, which is more accurately proportioned and shaped. Figure 8 presents the performance of the proposed scheme on an image affected by perspective distortion, which was subsequently corrected. Figure 8a shows the symbol error rate (SER), which shows an improvement in performance with an increasing number of iterations using virtual pilots. Our proposed scheme approaches the ideal 2D-DFC scheme despite the poor SER owing to substantial distortion in the image. Figure 8b shows the achievable data rate of the scheme, which is lower than that of the perfect alignment case. Nevertheless, our proposed scheme outperformed the conventional 2D-DFC scheme and achieved almost two-fold higher data rates even in the presence of perspective distortion.

5. Conclusions

This study proposes a novel scheme for increasing the data rate of a 2D-DFC system by introducing a reference frame estimation method. The proposed scheme involves computing the estimates at the pilot pixels, followed by the interpolation of the estimates at the information pixels using piecewise cubic interpolation. The scheme then selects virtual pilots based on specific criteria among the data symbols, and uses them as pilot pixels in the next iteration. This iterative process was repeated, and the reference image was re-estimated each time. Simulation results show that the proposed scheme improves the data rate of the 2D-DFC system by almost two-fold at the cost of a slightly reduced PSNR; thus, the proposed scheme provides a way to eliminate the use of reference image frames that typically limit the data rate of 2D-DFC systems. Overall, this study presented a promising approach for enhancing the performance of D2C communication systems.

Author Contributions

Conceptualization, S.-Y.J.; methodology, B.W.K., P.S. and S.-Y.J.; software, P.S.; validation, B.W.K., P.S. and S.-Y.J.; formal analysis, B.W.K. and P.S.; investigation, B.W.K. and P.S.; resources, P.S. and S.-Y.J.; data curation, P.S.; writing—original draft preparation, P.S.; writing—review and editing, B.W.K., P.S. and S.-Y.J.; visualization, B.W.K., P.S. and S.-Y.J.; supervision, B.W.K. and S.-Y.J.; project administration, B.W.K. and S.-Y.J.; funding acquisition, B.W.K. and P.S. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Research Foundation of Korea (NRF) grants funded by the Korea government (MSIT) (No. 2022R1A2B5B01001543 and 2022R1G1A1004799). ※ MSIT: Ministry of Science and ICT.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Xu, J.; Klein, J.; Jochims, J.; Weissner, N.; Kays, R. A reliable and unobtrusive approach to display area detection for imperceptible display camera communication. J. Vis. Commun. Image Represent. 2022, 85, 103510. [Google Scholar] [CrossRef]
Chen, C.; Huang, W.; Zhang, L.; Mow, W.H. Robust and unobtrusive display-to-camera communications via blue channel embedding. IEEE Trans. Image Process. 2018, 28, 156–169. [Google Scholar] [CrossRef] [PubMed]
Tamang, L.D.; Kim, B.W. Deep D2C-Net: Deep learning-based display-to-camera communications. Opt. Express 2021, 29, 11494–11511. [Google Scholar] [CrossRef] [PubMed]
Pathak, P.H.; Feng, X.; Hu, P.; Mohapatra, P. Visible light communication, networking, and sensing: A survey, potential and challenges. IEEE Commun. Surv. Tutor. 2015, 17, 2047–2077. [Google Scholar]
Pena-Pena, K.; Lau, D.L.; Arce, A.J.; Arce, G.R. QRnet: Fast learning-based QR code image embedding. Multimed. Tools Appl. 2022, 81, 10653–10672. [Google Scholar]
Ahlawat, S.; Rana, C.; Sindhu, R. A Review on QR Codes: Colored and Image Embedded. Int. J. Adv. Res. Comput. Sci. 2017, 8, 410–413. [Google Scholar]
Garateguy, G.J.; Arce, G.R.; Lau, D.L.; Villarreal, O.P. QR images: Optimized image embedding in QR codes. IEEE Trans. Image Process. 2014, 23, 2842–2853. [Google Scholar] [CrossRef] [PubMed]
Zhao, J.; Li, X.Y. SCsec: A secure near field communication system via screen camera communication. IEEE Trans. Mob. Comput. 2019, 19, 1943–1955. [Google Scholar] [CrossRef]
Guri, M.; Bykhovsky, D.; Elovici, Y. Brightness: Leaking sensitive data from air-gapped workstations via screen brightness. In Proceedings of the 2019 12th CMI Conference on Cybersecurity and Privacy (CMI), Copenhagen, Denmark, 28–29 November 2019; pp. 1–6. [Google Scholar]
Le, N.T.; Hossain, M.A.; Jang, Y.M. A survey of design and implementation for optical camera communication. Signal Process. Image Commun. 2017, 53, 95–109. [Google Scholar] [CrossRef]
Zhang, X.; Liu, J.; Ba, Z.; Tao, Y.; Cheng, X. MobiScan: An enhanced invisible screen-camera communication system for IoT applications. Trans. Emerg. Telecommun. Technol. 2022, 33, e4151. [Google Scholar] [CrossRef]
Tamang, L.D.; Kim, B.W. Real-time Optical Wireless Communications with the Kiosk Display and Off-the-shelf Camera. In Proceedings of the International Conference on Future Information & Communication Engineering, Virtual Event, 25–27 February 2021; Volume 12, pp. 133–136. [Google Scholar]
Wang, A.; Li, Z.; Peng, C.; Shen, G.; Fang, G.; Zeng, B. Inframe++: Achieve simultaneous screen-human viewing and hidden screen-camera communication. In Proceedings of the 13th Annual International Conference on Mobile Systems, Applications, and Services, Florence, Italy, 19–22 May 2015; pp. 181–195. [Google Scholar]
Li, T.; An, C.; Xiao, X.; Campbell, A.T.; Zhou, X. Real-time screen-camera communication behind any scene. In Proceedings of the 13th Annual International Conference on Mobile Systems, Applications, and Services, Florence, Italy, 19–22 May 2015; pp. 197–211. [Google Scholar]
Tamang, L.D.; Kim, B.W. Spectral Domain-Based Data-Embedding Mechanisms for Display-to-Camera Communication. Electronics 2021, 10, 468. [Google Scholar] [CrossRef]
Tsui, T.K.; Zhang, X.P.; Androutsos, D. Color image watermarking using multidimensional Fourier transforms. IEEE Trans. Inf. Forensics Secur. 2008, 3, 16–28. [Google Scholar] [CrossRef]
Tsai, S.; Liu, K.; Yang, S. An efficient image watermarking method based on fast discrete cosine transform algorithm. Math. Probl. Eng. 2017, 2017. [Google Scholar] [CrossRef]
Kim, B.W.; Kim, H.C.; Jung, S.Y. Display field communication: Fundamental design and performance analysis. J. Light. Technol. 2015, 33, 5269–5277. [Google Scholar] [CrossRef]
Jung, S.Y.; Kim, H.C.; Kim, B.W. Implementation of two-dimensional display field communications for enhancing the achievable data rate in smart-contents transmission. Displays 2018, 55, 31–37. [Google Scholar] [CrossRef]
Kim, Y.-J.; Singh, P.; Jung, S.-Y. Experimental Evaluation of Display Field Communication Based on Machine Learning and Modem Design. Appl. Sci. 2022, 12, 12226. [Google Scholar] [CrossRef]
Singh, P.; Jung, S.Y. Data decoding based on iterative spectral image reconstruction for display field communications. ICT Express 2021, 7, 392–397. [Google Scholar] [CrossRef]
Shen, Y.; Martinez, E. Channel estimation in OFDM systems. In Freescale Semiconductor Application Note; Freescale Semiconductor, Inc.: Austian, TX, USA, 2006; pp. 1–15. [Google Scholar]
Hartley, R.; Zisserman, A. Multiple View Geometry in Computer Vision; Cambridge University Press: Cambridge, UK, 2003. [Google Scholar]

Figure 1. Block diagram of reference image reconstruction scheme for 2D display field communications.

Figure 2. Illustration of data embedding in the frequency domain (a) location of high-frequency sub-bands, (b) the corresponding spatial-domain image to be displayed on the screen, and (c) data-embedded image for comparison [19].

Figure 3. Pilot pixels allocation and virtual pilot selection in the high-frequency sub-band of a frequency domain image.

Figure 4. Comparison of SER performance between the proposed 2D-DFC scheme with iterative processing and the conventional 2D-DFC scheme that employs a reference image.

Figure 5. Comparison of ADR performance between the proposed 2D-DFC scheme with iterative processing and the conventional 2D-DFC scheme that employs a reference image.

Figure 6. Comparison of PSNR performance of the proposed iterative pilot-based 2D-DFC scheme with conventional 2D-DFC scheme [19].

Figure 7. Distortion correction of received image affected by perspective distortion. (a) Perspective distortion. (b) Perspective correction.

Figure 8. Performance of the proposed scheme in the case of perspective distortion. (a) Symbol error rate. (b) Achievable data rate.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Kim, B.W.; Singh, P.; Jung, S.-Y. Iterative Pilot-Based Reference Frame Estimation for Improved Data Rate in Two-Dimensional Display Field Communications. Appl. Sci. 2023, 13, 9916. https://doi.org/10.3390/app13179916

AMA Style

Kim BW, Singh P, Jung S-Y. Iterative Pilot-Based Reference Frame Estimation for Improved Data Rate in Two-Dimensional Display Field Communications. Applied Sciences. 2023; 13(17):9916. https://doi.org/10.3390/app13179916

Chicago/Turabian Style

Kim, Byung Wook, Pankaj Singh, and Sung-Yoon Jung. 2023. "Iterative Pilot-Based Reference Frame Estimation for Improved Data Rate in Two-Dimensional Display Field Communications" Applied Sciences 13, no. 17: 9916. https://doi.org/10.3390/app13179916

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Iterative Pilot-Based Reference Frame Estimation for Improved Data Rate in Two-Dimensional Display Field Communications

Abstract

1. Introduction

2. System Description

2.1. Data Embedding

2.2. Pilot Insertion

3. Iterative Image Estimation

3.1. Pilot Signal Estimation

3.2. Data Pixel Interpolation

3.3. Image Re-Estimation Using Virtual Pilots

4. Simulations

Perspective Distortion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI