A Novel Video Transmission Latency Measurement Method for Intelligent Cloud Computing

Wu, Yiliang; Bai, Xue; Hu, Yendo; Chen, Minghong

doi:10.3390/app122412884

Open AccessArticle

A Novel Video Transmission Latency Measurement Method for Intelligent Cloud Computing

by

Yiliang Wu

,

Xue Bai

,

Yendo Hu

^* and

Minghong Chen

College of Ocean Information Engineering, Jimei University, Xiamen 361021, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2022, 12(24), 12884; https://doi.org/10.3390/app122412884

Submission received: 20 October 2022 / Revised: 8 December 2022 / Accepted: 12 December 2022 / Published: 15 December 2022

(This article belongs to the Special Issue Computational Intelligence in Image and Video Analysis)

Download

Browse Figures

Versions Notes

Abstract

:

Low latency video transmission is gaining importance in time-critical applications using real-time cloud-based systems. Cloud-based Virtual Reality (VR), remote control, and AI response systems are emerging use cases that demand low latency and good reliability. Although there are many video transmission schemes that claim low latency, they vary over different network conditions. Therefore, it is necessary to develop methods that can accurately measure end-to-end latency online, continuously, without any content modification. This research brings these applications one step closer to addressing these next generation use cases. This paper analyzes the cause of end-to-end latency within a video transmission system, and then proposes three methods to measure the latency: timecode, remote online, and lossless remote video online. The corresponding equipment was designed and implemented. The actual measurement of the three methods using related equipment proved that our proposed method can accurately and effectively measure the end-to-end latency of the video transmission system.

Keywords:

real-time video; timecode; remote online; lossless remote video online

1. Introduction

With the rapid development of cloud based real-time AI, network video services are now available for interactive applications requiring a closed loop response [1]. Cloud VR and autonomous systems, including drones, vehicles, and robots, can achieve the same level of performance at fraction of the cost through cloud processing. Vehicles that can carry advanced accident detection systems, micro drones navigating through complex forests at high speeds, VR users with light weight googles without suffering dizziness, robots that recover from trips; all these applications require reliable low latency delivery of content to and from the cloud. One critical content component is video.

Different video applications require different video transmission latencies to ensure QOS (quality of service). One-way video transmission applications such as live video and VOD (video on-demand) only require a latency within 2–5 s [2]. Interactive video applications such as video conference have become more and more popular. The latency of video software such as Zoom is between 300 ms and 1 s [3]. For emerging applications such as cloud based real-time AI control, VR, and cloud games that require high interactivity, the latency is required to be less than 20 ms in order maintain control over these vehicles in their dynamic environments, or to avoid users’ feelings of vertigo during use [4]. With the rise of new applications, people’s requirements for real-time video are also increasing. The transmission mode of video applications has gradually developed from one-way video transmission to interactive video transmission. In order to guarantee the QoS of video latency in interactive video scenes, it is necessary to monitor the video transmission delay of each application in real time and online.

There exists a significant amount redundancy in raw digitized video, including spatial redundancy, temporal redundancy, and coding redundancy. In order to ensure the efficient remote transmission of video, it is necessary to apply video compression technology to the video transmission process. This paper defines a video transmission system with five processing stages:

Capture
Compression
Transmission
Decompression
Display

Each stage will bring its corresponding latency.

Capture

Almost all cameras in the market have a certain latency between capturing and outputting video information. Sven Ubik et al. [5] proposed a method to measure the latency of cameras and tested a series of cameras. The measurement results show that the latency generated by the Blackmagic URSA Mini Pro 4.6 K (for South Melbourne, Victoria, Australia) is the smallest, between 4 and 8 ms, whereas most of the camera latencies are at range between 1 and 3 video frames.

2.: Compression/Decompression

The latency introduced by video compression and decompression is related to the complexity of video encoding and decoding algorithms. Following the processing complexity, video compression can be divided into two types: light-weight compression and hybrid compression.

Light-weight compression is mainly based on image coding. For example, Apple ProRes is a variable bit rate video codec that can independently encode and decode each video frame with a compression ratio between 4 and 6 for real-time video editing [6,7]. TICO is a codec that uses only intra-frame compression with a compression ratio of 4:1; this is used in the 4K UHD TV industry [8]. JPEG-XS is a lightweight codec, which can achieve a compression ratio of 6:1 and is used in applications within the live broadcast and the AR/VR systems [9,10,11] space. Light-weight compression is mostly used in professional fields. Although the compression ratio is small, the latency is greatly reduced.

Due to limited bandwidth, videos on the internet are usually transmitted after hybrid compression, with a compression ratio in the range of 250–500 [12,13,14]. The current international mainstream video coding standards include H.264/AVC, H.265/HEVC, H.266/VVC, AVS [15], and VP9/AV1 [16]. These compression standards adopt a hybrid coding framework composed of prediction coding, transform coding, quantization, filter processing, and entropy coding modules. With the development of technology, there exist many extended processing methods within each coding module. For example, predictive coding has forward prediction, backward prediction, and bidirectional prediction. Although these optimized encoding methods improve encoding efficiency, they come at a cost of increased computational complexity [17,18,19].

3.: Transmission

Network transmission is subjected to real-time traffic conditions. Thus, it is necessary to set buffers in the coding end, decoding end, and within the network routers to ensure limited data loss. Network congestion has a significant impact on the network. In the event of a bufferbloat [20], it will lead to long queue delays [21,22]. The fluctuating video bit rate will also cause network congestion, resulting in transmission delay and buffering delay of subsequent video streams [23]. To alleviate these issues, congestion controls in transmission protocols [24] are developed. The introduction of these buffers will directly increase the overall latency of the transmission system.

4.: Display

The latency of most current monitors can be found on the corresponding website [25]. Due to the influence of E-sports games, monitors with a latency of only a few milliseconds have been introduced into the market.

In order to meet the latency requirements in different application scenarios, it is necessary to accurately perceive the end-to-end latency of video transmission. At present, there exist limited measurement solutions for video transmission latency. A common method is to use a simple side-by-side shooting method. A time counter is placed next to the transmission result display screen, the value of the time counter is displayed in the screen content, and the difference between the two times is used as the latency measurement value. This method is simple and easy to operate. The measurement can be completed with two mobile phones combined with the video transmission system, but the accuracy is low, not continuous, and cost prohibitive for permanent operation.

The work in [26] is specifically designed to measure the end-to-end delay of computer video chat applications. The sending computer displays a barcode representing the time on the screen, and the transmitting receiver reads the barcode on the video chat screen to obtain the sending time. The accuracy of the method cannot be guaranteed due to the influence of screen refresh and software execution speed. Reference [27] is used to measure the end-to-end delay of a video surveillance system on the same computer. There is a bar screen on which the computer displays changes at fixed times. The digital surveillance camera captures the change video and transmits it to the computer for decoding and compares it with the change sending time to obtain the end-to-end delay. This method has limited measurement accuracy and application scenarios. Reference [28] mainly focuses on the evaluation of QUIC on Web, cloud storage, and video workloads, focusing on the generation and measurement of network transmission delay. This measurement does not include the delay caused by a codec, buffer, etc. Reference [29] introduces the delay model of end-to-end transmission of real-time video, which is composed of sub-models of capture, encoding, network, decoding, rendering, and display refresh. However, this method is only used in simulation experiments without actual measurement. In reference [30], in order to reduce the delay of glass to glass (G2G) and the glass to glass algorithm (G2A), frame hopping and preemption methods are proposed to reduce the delay, and the delay measurement model is built by using light-emitting diodes (LED) and phototransistors (PT). However, the measurement accuracy can only reach 0.5 ms.

The main contribution of this paper is to propose methods that can deliver continuous latency monitoring over existing networks without the disruption or distortion of the original content. A timecode method is used to measure the short distance video transmission delay. Remote online measurement is used to measure the remote video transmission delay over a network. Lossless remote video online measurement realizes lossless online measurement without content distortion. A time delay measurement device is proposed and implemented. The performance was validated over a typical network.

This paper is organized as follows. In Section 2, the latency caused by video compression, decompression, and network transmission is analyzed. In Section 3, we propose three methods for measuring video end-to-end latency. Then, we present some practical experience with video latency measurements in Section 4. Finally, we conclude the paper in Section 5.

2. Analysis of Video Transmission Latency

The latency caused by the camera and the display is relatively independent. Through proper selection of the camera and the display, this will minimize the impact on the overall delay. The key to reducing the real-time video transmission latency lies in video codecs and video transmission.

Transmission and codecs are interrelated. On one hand, a burst data stream output by the encoder going into the transmission network will increase network congestion. On the other hand, it is necessary to reduce the video bit rate to ensure the real-time stream transmission when the transmission conditions are limited. Hence the introduction of a cache and rate control throughout the transmission path. The total latency (D_latency) of the transmission system, including encoding, decoding, and transmission, is represented by the encoder latency (D_latency). This system delay is effectively the accumulation of the following components: encoder latency (D_enc), transmission latency (D_net), decoding latency (D_dec), and the latency caused by the buffer of each stage (D_cache). This shown in Figure 1 and represented in Equation (1).

D_latency = D_enc + D_net + D_dec + D_cache

(1)

2.1. Video Codec Latency

Video coding standards have evolved over many generations. Codecs are also becoming more complex in order to deliver higher quality video with limited bandwidth. The codec latency is composed of encoding frame reordering latency, encoding processing latency, encoding buffering latency, decoding processing latency, and decoding frame reordering latency.

The video to be encoded is fed into the encoder in chronological order. Video coding prediction includes intra-prediction, inter-prediction, and bidirectional prediction. Intra-prediction exploits the correlation between neighboring pixels by using the reconstructed pixels within the frame to derive predicted values through extrapolation from already coded pixels. Inter-prediction is the use of block-based motion compensation to generate a prediction model from one or more previously encoded image frames. Bidirectional prediction allows an encoder to obtain information from frames that are forward in the video stream, instead of just previously compressed frames. In typical applications, intra-prediction, inter-prediction, and bidirectional prediction are mixed to perform frame level prediction to achieve a balance between latency and quality. This will cause the order of video entering the encoder to be inconsistent with the actual encoding and decoding order, and some frames need to wait for encoding and decoding, resulting in frame reordering latency. Figure 2 describes the reference relationship of the RA (Radom Access) mode of HEVC. In this case, a GOP contains eight video frames and the picture number (Picture Order Count, POC) is 0–7. Due to the predictive reference relationship, the playback order of images is inconsistent with the encoding order, so the encoding order of POC 1 frame is behind POC 4 and 2 frame. Therefore, a frame reordering latency of three frames will be introduced in the encoding stage. In a scene of 50 frames/s, the latency of 3 frames is 60 ms. Likewise, the frame cannot be decoded until those two frames are decoded. The overall DOC (decoding order count) is shown in Figure 2. This part of the latency depends on the data arrival speed and decoding speed because the encoding of the current frame needs to refer to the data of the encoded frame. The GOP structure can be dynamic. The theoretical maximum latency caused by reordering can reach the size of the GOP.

In order to meet low-latency applications, HEVC has introduced the LDP (low-latency P) mode, in which bidirectional prediction frames are canceled. Each incoming frame can be encoded immediately without waiting for other reference frames. Therefore, no reordering delay is introduced. Using this mode reduces the reordering latency, but the encoding performance decreases by 9~42% [31].

In the encoding process, a large number of redundant searches need to be performed in the current frame and the reference frame to determine the most efficient encoding mode. At the same time, in order to adapt to the local characteristics of the video frame content, the to-be-encoded image should be divided into a series of encoding blocks. Fixed size macroblocks are used in AVC. HEVC uses a quad-tree division method. In addition to the quad-tree division method, VVC also introduces multi-type tree division. By increasing the division method, the coding efficiency of image details is enhanced, and the corresponding division and search computational complexity are also significantly increased. Compared with HEVC, VVC increases the coding complexity by a factor of about 7. The increase in computational complexity extends the encoding time. In actual implementation, parallel operations can be used to optimize the encoding structure to reduce encoding time, but parallelism is essentially the scheduling optimization of encoding and does not reduce the processing volume of encoding and decoding itself. The latest method uses machine learning to reduce the complexity of codec slicing and motion search operations and reduces the coding complexity by 20~70% under the premise of 5% or less compression performance loss [32,33,34,35]. However, the time required for entropy encoding and related data access is unavoidable. Similarly, the decoding process also brings some latency due to the large amount of computation.

2.2. Network Transmission Latency and Code Rate Control

The network transmission latency is mainly determined by the network bandwidth and buffer size. In order to reduce this part of the latency, a stable bit rate and a buffer as small as possible are required.

In video transmission, the available transmission bandwidth is usually limited to a certain range and is easily reduced by various interferences, especially in a wireless environment. Under the hybrid coding structure, the RC (rate control) mechanism chooses to adjust a series of coding parameters, usually including the partition model, prediction model and QP (quantization parameter), to control the compressed code rate within the available bandwidth. All video coding standards have their own recommended RC models. MPEG-2 adopts TM5 [36], H.263 adopts TMN8 [37], H.264/AVC adopts JVTG012 [38], and H.265/HEVC adopts JCTVC-H0213 [39] and JCTVC-K0103 [40]. The latest coding standard, H.266/VVC, adopts JVET-K0390 [41]. The research and implementation of rate control are mainly based on rate control algorithms in the Q domain [42], ρ domain [43], and λ domain [44]. These algorithms have high precision for average rate control. The bit number control accuracy for frame-level or finer-grained coding units is not enough to adapt to the case of small network buffers, especially when the encoded video contains fast-moving objects, significant object occlusions, and scene changes.

The compressed video data stream needs to be sent using a network transmission protocol. In the early days, TCP or UDP was used for transmission. There is a serial number in the TCP protocol to observe the receiving status of the receiving end. The retransmission mechanism can ensure no packet loss, but it easily causes long delays and network congestion. UDP is a connectionless protocol commonly used in video broadcasting. It does not guarantee that the packets all reach the receiving end. If the packet is lost, it will not be retransmitted, so the latency is lower. Compared with the TCP protocol, the UDP protocol will have problems such as packet loss. To solve these problems, some techniques, such as forward error correction, are sometimes adopted to ensure transmission by increasing guard bytes and sacrificing part of the bandwidth [45]. Some new transport schemes, such as WebRTC, take full advantage of the uncontrolled nature of UDP. The bottom layer of the transmission uses the UDP protocol, and the upper layer uses the SRTP (Secure Real-Time Transport Protocol) and automatic adaptation rate conversion to improve the video transmission quality. However, in WebRTC the encoding rate is changed on a second time scale. When the network is congested, some of the encoded frames are still sent out due to the response lag. The encoder is then paused until the congestion ends. The connection between the transport layer and the encoding layer of the current video transmission system is loosely coupled. For the transport layer, the information of the coding layer is lagging. The congestion control of the transport layer packets does not match the rate control of the encoding layer exactly. For this reason, it is necessary to configure corresponding buffers in each stage of transmission to alleviate the problem. However, the cache will bring queuing latency. The work in [46] evaluates the current network capacity and optimizes the compression length of each frame to achieve lower video latency and better video quality over variable network paths. The limitation of this method is that the intermediate information in the encoding process needs to be obtained in real time and it is not suitable for most encoders.

3. Methodology

The measurement accuracy of the end-to-end delay of video transmission systems in previous work is generally in units of frames. This accuracy is insufficient for interactive video applications with very low latency requirements. To this end, we designed and implemented three methods to accurately measure the end-to-end latency for different application scenarios.

3.1. Method 1: Timecode Latency Measurement

The measurement scheme is shown in Figure 3a. The camera captures raw video and feeds it into the latency measurement device. The latency measurement device takes the arrival time of the synchronization pulse of the video frame as the input time of each frame. The frame input time and frame number are injected into the fixed position of the original video picture in the form of a barcode, and the video with the timecode is encoded, transmitted, and decoded to obtain a reconstructed video. The reconstructed video is also sent to the latency measurement device and the device records the arrival time of the synchronization pulse of the reconstructed video frame as the time when each frame of the reconstructed video reaches the display end. The barcode is read at the fixed position of the reconstructed video to obtain the frame number and frame input time and the arrival time and input time of the same frame are compared to obtain the end-to-end latency of each frame of the video transmission system.

The timecode injected into the original video appears as a short barcode at the bottom of the screen, as shown in Figure 3b.

The timecode contains frame number and frame input time information, and a color block of 16 × 16 pixels represents a bit, and the 0 and 1 of the bit are represented by the colors YcbCr (0, 80, 80) and YcbCr (80, 80, 80). Taking a 1280 × 720 video frame as an example, there are 80 horizontal blocks in total, which can represent data with a data bit width of 80 bits. Splicing the data to be represented (e.g., frame number and timestamp), it can be written as: {Frame_cnt [7:0], Time_cnt [31:0]}, which uses a total of 40 bits.

The time value in the actual measurement uses a 32-bit counter that counts with a 25 MHz clock. When the frame synchronization signal of the original video is detected, the value of the counter is saved in the latency measuring device as the input time

T_{i} (n)

of each frame of the original video, where n represents the frame number. After encoding, transmitting, and decoding, the reconstructed video arrives at the receiving end and the frame synchronization signal time of each reconstructed video frame is

T_{r} (n)

, where n represents the frame number. According to Equation (2), the delay D(n) of the nth frame in the video transmission system can be obtained.

D (n) = {\begin{matrix} (T_{r} (n) - T_{i} (n)) * 40 n s, T_{r} (n) \geq T_{i} (n) \\ (2^{32} + T_{i} (n) - T_{r} (n)) * 40 n s, T_{r} (n) < T_{i} (n) \end{matrix}

(2)

where

T_{i} (n)

represents the input time of the input latency measurement device for each frame of the original video,

T_{r} (n)

represents the frame synchronization signal time for each frame, and n represents the frame number.

Because the input time of the original video and the arrival time of the reconstructed video are obtained from the same clock source, an error of ±1 may be caused by overcounting or undercounting a number. The variance

S_{D}

of the latency measurement error is recorded as the sum of the variance

S_{i}

of the original video input time acquisition error and the variance

S_{r}

of the arrival time acquisition error of the reconstructed video. The variance of the measurement error can be described by Equation (3):

S_{D} = S_{i} + S_{r}

(3)

Since the unit time of the 25 MHz clock is 40 ns, the variance

S_{D}

of the latency measurement is 3200

{ns}^{2}

, and the theoretical standard deviation of the latency measurement can be deduced to be 56.6 ns [47], which meets the requirements of high-precision online latency measurement.

Compared with the side-by-side shooting method, the measurement accuracy of Method 1 is significantly improved, but there are still shortcomings. First, the original video content is partially destroyed in the measurement process. Although the influence is not large, this method cannot be used for online measurement in formal video applications. Secondly, since both the original video and the decoded reconstructed video are sent to the time delay measurement device, this method is limited to local measurement and cannot be used for long-distance online real-time measurement.

3.2. Method 2: Remote Online Measurement

Although Method 1 has higher measurement accuracy, it needs to connect both the video of the sender and the video of the receiver to the same latency measurement device. Due to this limitation, remote video transmission cannot be measured. To this end, we designed Method 2, and the measurement scheme is shown in Figure 4. The solution includes a flashing light and a latency measurement device at the remote receiver. Time synchronization of remotely located measuring equipment uses GPS signals. The flashing light is controlled by the GPS PPS (pulse per second) to flash once per second, and the camera will send the captured pictures including the flashing light to the latency measurement device through the video transmission system. The device records the time when each frame enters the device and finds the frame when the flashlight just starts to light up. The fractional part of the time (in milliseconds) that the frame with the light on enters the device is the end-to-end latency of the video transmission.

The PPS signal that controls the flash has precise clock synchronization and the signal accuracy is less than 50 ns. The PPS signal of the GPS module we use is a pulse signal with a period of one second and a pulse width of 200 ms. The PPS signal controls the power switch to drive the light to flash, so that the flashing light flashes once per second, lasting 200 ms each time. The latency measurement device adopts a 32-bit counter driven by a 25 MHz clock as the timing value. When the device is powered on, the 32-bit timing counter is reset by the rise of the PPS signal. In normal operation, the rising edge of the PPS signal is used to reset the timer counter once every 10 min to ensure the time synchronization between the flashing light and the latency measuring device.

Taking a 720p@60 camera as an example, when the light is on for 200 ms, the video shot will have approximately 12 consecutive frames in which the light is on. The video enters the latency measurement device through encoding, transmission, and decoding. The measurement device compares the brightness values of the previous and previous frames for five consecutive frames. The device determines that there is a partial increase in brightness in a frame, that is, the light is on, and continuously observes whether the position continues to remain in the next 4 frames. If the position maintains the brightness, it can determine the frame when the light just began to turn on. A 32-bit counter value

C o u n t_{(i)}

is recorded representing the time the frame during which the latency measurement device entered, I represents the number of measurements.

C o u n t_{(i)}

can be converted into real-time

T_{i}

in seconds. It can be expressed by Equation (4):

T_{i} = \frac{C o u n t_{(i)} \times 40 ns}{10^{9}}

(4)

The fractional part of timing time

T_{i}

is the end-to-end latency value

D_{i}

of video transmission, it can be expressed by Equation (5):

D = {T_{i}} = T_{i} - [T_{i}]

(5)

Compared with Method 1, Method 2 is simple in operation and does not need to access the original video and the reconstructed video from the measurement device at the same time. It only needs to place the flash device in the field of view of the camera and connect the decoded reconstructed video to the latency measurement device to complete the latency measurement. Method 2 realizes remote measurement, but the disadvantage of Method 2 is that it is still a lossy video measurement, because the flashing light must be in the video content, which partially destroys the original video picture. Method 2 uses the camera to shoot the flashing light to determine the same frame, the shooting may have a delay within one frame, and the measurement error is large. The biggest limitation is that, because GPS signals cannot be received indoors, this method can only be used for outdoor measurements.

3.3. Method 3: Lossless Remote Video Online Measurement

When using Method 2 for measurement, both the sender and the receiver must have GPS signals and the measurement accuracy is low. For this reason, we developed a method that can accurately measure the end-to-end latency of remote video without loss based on the IEEE1588 time synchronization protocol. The measurement scheme is shown in Figure 5.

A latency measurement device is placed at the transmitter and receiver of the remote video transmission system, respectively, and the device uses the built-in IEEE1588 protocol to achieve time synchronization. The original video and the reconstructed video are copied and connected to the transmitter latency measurement device and the receiver latency measurement device, and no changes are made to the video transmission system. The sending-end latency measuring device calculates the hash value of each input frame and combines it with the entry time of the frame to package and send it to the receiving-end latency measuring device through the network. The latency measurement device at the receiving end calculates the hash value according to the same algorithm for each frame of the reconstructed video and stamps it with a time stamp. At the receiving end, the latency measurement device matches the hash value of the original video frame and the hash value of the reconstructed video frame, and a successful hash value matching indicates that the corresponding frame is determined. The end-to-end latency of the frame can be obtained by comparing the timestamps of the corresponding frames.

In order to accurately find the correspondence between the reconstructed video and each frame of the original video, this method uses video-aware hashing technology, which includes three processing contents: feature information extraction, video-aware hash extraction, and hash code matching. Most video codecs based on hybrid coding frameworks use motion search for predictive coding, therefore, this method selects motion information features of video frames as hash feature information to reflect changes in video content. Different from the commonly used video-aware hashing algorithm based on gradient orientation centroids, this method chooses to use a difference-valued hash-aware compression algorithm. In order to ensure the accuracy of the measurement, the measuring device at the sending end and the measuring device at the receiving end must maintain time synchronization. In order to ensure the accuracy of the measurement, the measuring device at the sending end and the measuring device at the receiving end must maintain time synchronization. We adopt the IEEE 1588 protocol and implement it in software, which can achieve a synchronization accuracy of 20 ns under a 100 ms synchronization period, which meets the requirements for accurate online measurement of video transmission delay [48]. The specific processing flow is shown in Figure 6.

Taking the original video sequence

O_{n} (i, j)

with a resolution of 1280

\times

720 as an example, the reconstructed video sequence after encoding and decoding is

R_{m} (i, j)

, where (i, j) represents the pixel coordinates in the video, n and m, respectively, represent the n and m frames of the original video and the reconstructed video. The luminance components

Y_{O n} (i, j)

and

Y_{R m} (i, j)

of the original video

O_{n} (i, j)

and the reconstructed video

R_{m} (i, j)

are selected for subsequent processing.

The latency measurement devices at the sending end and at the receiving end record the input time of each frame of the original video and the reconstructed video. When the device detects the frame synchronization pulse of each frame, it records the value of the 32-bit counter driven by the internal 25 MHz clock as the time stamp of the frame input to the latency measurement device. The time represented by the counter is guaranteed to be synchronized by the IEEE1588 protocol inside the sender and receiver devices.

In order to reduce the amount of data to be processed while retaining useful information, the original video frame and the reconstructed video frame are subjected to block and down sampling processing. The device divides the

Y_{O n} (i, j)

and

Y_{R m} (i, j)

frames into 16

\times

16-pixel blocks and assigns the average pixel value of each block to

Y_{O n}^{'} (i^{'}, j^{'})

and

Y_{R m}^{'} (i^{'}, j^{'})

, as shown in Equation (6). The downsampling of the original video frame and the reconstructed video frame is realized, and the original video frame sequence

Y_{O n}^{'} (i^{'}, j^{'})

and the reconstructed video frame sequence

Y_{R m}^{'} (i^{'}, j^{'})

, with a resolution of 80

\times

45, are obtained, where

i^{'} \in (0, 44), j^{'} \in (0, 79), i = 16 \times i^{'}, j = 16 \times j^{'}

.

Y^{'} (i^{'}, j^{'}) = \frac{\sum_{i}^{i + 15} \sum_{j}^{j + 15} Y (i, j)}{256}

(6)

The downsampled original video sequence

Y_{O n}^{'} (i^{'}, j^{'})

and the downsampled reconstructed video sequence

Y_{R m}^{'} (i^{'}, j^{'})

are subjected to frame difference processing according to Equation (7), and obtain the video sequence s

δ_{O (n)} (i^{'}, j^{'})

and

δ_{R (m)} (i^{'}, j^{'})

that reflect the pixel changes of the previous and previous frames.

Δ (i^{'}, j^{'}) = | Y_{n + 1}^{'} (i^{'}, j^{'}) - Y_{n}^{'} (i^{'}, j^{'}) |

(7)

A threshold value

ε

is selected to binarize

δ_{O (n)} (i^{'}, j^{'})

and

δ_{R (m)} (i^{'}, j^{'})

according to Equation (8). The pixel value greater than the threshold value is set to 1 and the pixel value less than the threshold value is set to 0 to obtain binarized images

B_{O (n)} (i^{'}, j^{'})

and

B_{R (m)} (i^{'}, j^{'})

.

B (i^{'}, j^{'}) = {\begin{matrix} 1, F (i^{'}, j^{'}) > ε \\ 0, F (i^{'}, j^{'}) \leq ε \end{matrix}

(8)

where

F (i^{'}, j^{'})

represents the pixel value of the pixel with coordinate

(i^{'}, j^{'})

. The binarized image is expanded line by line to obtain the hash codes of each frame of the original video and the reconstructed video. The hash code and frame timestamp of each frame of the original video are composed of hash packets which are packaged and sent to the latency measurement device at the receiver, as shown in Figure 7.

The latency measurement device at the receiver performs hash matching to find the original video frame that matches the reconstructed video. In the hash library, HashBD, at the receiver,

H_{O (n)}

and

H_{R (m)}

are used to represent the hash codes of the original video frame and the reconstructed video frame respectively, and

H_{O (n)}^{k}

,

H_{R (m)}^{k}

is the value of the kth bit of the hash code. The Hamming distance is used to calculate the distance of two code strings, that is, the corresponding bits of the reconstructed video frame feature hash value and the original video frame feature hash value are XOR processed and the number of 1 is counted, as shown in Equation (9). The distance determines the correspondence between two frames. The smaller the distance, the stronger the correlation between the two frames, and the less the correlation. In this way, the original video frame matching with the reconstructed video frame can be found, and the end-to-end latency of video transmission can be obtained by using the time stamps carried by the two video frames.

D (m, n) = \sum (H_{O (n)}^{k} \oplus H_{R (m)}^{k})

(9)

Method 3 can carry out high-precision real-time latency monitoring online without damaging the transmitted video. It balances the tradeoff between computational demand, out of band bandwidth utilization, and robustness across different content. The test accuracy is similar to that of Method 1, and the remote measurement function of Method 2 can be realized and is not limited to the scenario with a GPS signal. It can be used as a convenient, continuous, and reliable method for popularization and application.

4. Evaluation

Combined with the requirements of the above three latency measurement methods on the latency measurement device, we designed a latency measurement device that can meet the requirements of the three measurement methods and implemented the hardware implementation. The design framework and physical drawing are shown in Figure 8, and the device is named CC3030. The device framework is mainly composed of heterogeneous systems based on Xilinx ARTIX-7 series FPGA and STM32F7 series MCU. Among them, FPGA is mainly responsible for image information processing and STM32 is mainly responsible for data processing and external interaction. The information processed by FPGA is transmitted to the STM32 microcontroller and the microcontroller calculates the latency. The device provides two HDSDI ports and two HDMI ports for video input and one HDSDI and one HDMI for OSD (On Screen Display) menu display. It can receive external GPS signals and the device’s configuration and measurement results can be seen on other devices’ web pages through the network port.

The following is an example of three latency measurement methods using different configurations of this self-made CC3030. The camera model used in the measurement is SHD60 and the two codec systems are H.264 codec and H.265 codec. H.264 codec was selected from Sculpture Networks. The encoder model was Snenc1000 (for San Diego, CA, USA), and the decoder model was Sncupid1000 (for San Diego, CA, USA). H.265 codec is the codec used by Huawei Hislicon HI3519 video codec solution.

4.1. Method 1: Timecode Method

The method requires a camera, two display screens, an allocator, codec, and a CC3030 before the measurement. The camera is used for real-time video acquisition and the captured video is passed through a 12 G SDI video one-in, two-out splitter to obtain two-channel zero-time latency video. One input is raw video to the display screen, the other input is CC3030. The original video sequence input to CC3030 is called the source video, and the video input is called the source entry (Ori). The source video is output from CC3030, and, after passing through the codec, the reconstructed video is obtained. The reconstructed video is fed into CC3030, which is called the reconstructed entry (Rec). The time stamp in the reconstructed video is found by FPGA in CC3030 and the transmission latency of the video sequence is determined by comparing it with the local time. In order to facilitate the display of the latency measurement results on the reconstructed video, the displayed time bit is controlled at 0.1 ms.

The video input and output in the measurement process need to pass through the video interface (HDSDI or HDMI). The latency caused by these interfaces is measured in microseconds, which is negligible compared to the latency of measuring video transmission. Since the timestamp is embedded in the existing video source, the latency caused by camera acquisition is not included in the latency measurement results.

Figure 9a is the latency measurement process of video transmission through the H.264 codec. In the image, the right screen shows the original video, and the left screen shows the reconstructed video with the measurement results, and the time stamp is presented as a spline at the bottom of the screen. It can be seen that the real-time video acquisition format of the measurement process is 720P60HZ and the latency measurement result is 51.5 ms.

Figure 9b shows the latency measurement process of video transmission through the H.265 codec. In the same picture, the right screen is the original video and the left screen is the reconstructed video with the measurement results. The real-time video acquisition format of the measurement process is 1080i50HZ and the latency measurement result is 436.6 ms.

4.2. Method 2: Remote Online Measurement

The method requires a camera, a display, a flash, a codec, and a CC3030 before the measurement. In this experiment, the flashing light flickered once every second for 200 ms. When using the camera to capture video, we must ensure that the flashing light is in the capture picture.

When the PPS signal from GPS arrives, the lights start flashing and CC3030 also starts the timing operation. The flash video picture captured by the camera is transmitted and the reconstructed video is obtained at the decoding end. The reconstructed video is input to CC3030 and the FPGA in the device obtains the flashing picture and stops the timing operation after finding it. Finally, the measurement results are printed on the reconstructed video and displayed on the screen.

The frequency of flashing light in this measurement is 1 s. The frequency of the flash determines the range of the measured latency. If the frequency of the flashing light is less than the latency of the transmission system when the FPGA searches for the flashing picture, the latency measurement error will occur due to the stacked flashing picture of the system. Therefore, it is necessary to change the flashing frequency when measuring the transmission system with large latency.

Figure 10a shows the latency result of directly inputting the flash picture collected by the camera into CC3030 without going through the codec, which is the latency of the camera. As can be seen, the video acquisition format is 720P60HZ and the time latency of the camera used for measurement is 34.9 ms.

Figure 10b shows the latency result after the video source is transmitted by the codec. The codec selected for measurement is the H.264 video codec and the video acquisition format is 720P60HZ. It can be seen that the latency generated in the video transmission process is 96.4 ms and 61.5 ms can be obtained after removing the latency caused by the camera video acquisition.

4.3. Method 3: Lossless Remote Video Online Measurement

It is necessary to prepare one CC3030 at the transmitter and one at the receiver before measurement. Since it is impossible to ensure that both the transmitter and receiver of remote video transmission can receive GPS signals, the IEEE1588 protocol is used to realize time synchronization between the transmitter and receiver. In the measurement process, two transmission channels are carried out synchronously. One method is the transmission of the original video to obtain the reconstructed video. The other method is to process the original video and reconstruct the video. At the sending end, the original video is input to CC3030 and the hash code of the original video is obtained. At the receiver, the reconstructed video is input into CC3030 and the hash code of the reconstructed video is obtained. The original video hash code with time stamp is transmitted to the receiver CC3030 through the network, and the original video frame hash code matching the reconstructed video frame hash code is found. The video transmission latency is obtained by using the timestamp carried in their hash codes.

The codec system used in this measurement is the H.264 codec. Table 1 shows the timestamp of the first 10 s of the reconstructed video and the matched original video frame and calculates the delay value. Due to the limitation of the distance between the sender and the receiver, the latency measurement results are designed to be real-time output in the form of web pages, where the horizontal coordinate is time, the unit of second represents the monitoring once a second in actual operation, and the vertical coordinate is the latency value, the unit of which is millisecond, as shown in Figure 11.

4.4. Comparison of Proposed Methods with Existing Methods

Based on the actual evaluation results, the advantages and disadvantages of the three measurement methods are listed, as shown in Table 2.

The measurement principle of Method 1 is the same as that of Method 3 and Method 3 uses out-of-band hash data and ieee1588 time synchronization to realize lossless remote measurement. The time value in the actual measurement uses a 32-bit counter that counts with a 25 MHz clock. Therefore, the minimum time unit of the timestamp in Method 1 and Method 3 is 40 ns. According to Equation (2), the maximum error between the arrival time of the reconstructed video and the arrival time of the original video due to ±1 error is 80 ns and the accuracy is set as 0.1 us for statistical convenience. The measurement principle of Method 2 is similar to that of Reference [30]. Reference [30] measures time delay introduced by a video transmission system under test to the propagation of light from a light-emitting diode (LED) to a phototransistor (PT). The resistance of the PT decreases when the LED lights up in the displayed image. The sampling rate of PT is 2 kHz, yielding a precision of 0.5 ms. However, the measurement relies on the output of the display with a video camera, and it has limited precision of, e.g., 16.7 ms for a 60 Hz video camera. The actual precision of Reference [30] is subframe, the same as Method 2. The work in [26] uses an embedded barcode to measure the end-to-end delay of computer video chat applications. The precision of the method is subframe due to the influence of screen refresh and software execution speed. Reference [5] measures latency by waveform shift on an oscilloscope. The measurement accuracy can be guaranteed, but it is very inconvenient and cannot be measured online.

5. Summary

In this paper, the generation of latency in video transmission is analyzed and three latency measurement methods are proposed: timecode method, remote online measurement method, and lossless remote video online measurement method. Depending on the deployment scenario, different time latency measurement methods are suitable. The measurement accuracy of Method 1 and Method 3 is similar, but Method 1 is only suitable for local measurement, where the source content is modified with a timestamp. This paper proposes a synchronization framework for Method 3 that balances the out-of-band bandwidth, the measurement accuracy, and the deployment ease using video-aware hash tables. Both Methods 2 and 3 can be used to measure the latency of remote video transmission, but Method 3 does not alter the original video content. One unique value of Method 2 is that it captures the latency of the capturing and the rendering devices (e.g., camera, display). Compared with Methods 1 and 2, Method 3 can achieve high-precision lossless remote video online latency measurement. Method 3 is scalable through today’s networks, providing continuous monitoring. This will bring the latency QoS to a level that is then acceptable for next generation applications, including VR and real-time AI solutions.

In the future, Method 3 can be integrated into cameras, and can directly measure the camera latency. If the camera contains an encoding and decoding system, the video transmission latency can be obtained directly. Furthermore, Method 3 can be extended to not only measure the video transmission latency, but also detect the frame loss in the transmission process.

Author Contributions

Conceptualization, Y.W. and X.B.; Data curation, Y.W. and X.B.; Formal analysis, Y.W. and X.B.; Funding acquisition, Y.H.; Methodology, X.B.; Project administration, Y.W.; Resources, Y.W. and X.B.; Supervision, Y.H. and M.C.; Validation Y.H. and M.C. All authors have read and agreed to the published version of the manuscript.

Funding

The work is supported by the Natural Science Foundation of Fujian Province of China (2021J01868), Innovation and entrepreneurship team project of Fujian provincial department of education, and a national fund cultivation project of Jimei University ZP2020041.

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

References

Liang, S.; Wu, H.; Zhen, L.; Hua, Q.; Garg, S.; Kaddoum, G.; Hassan, M.M.; Yu, K. Edge YOLO: Real-Time Intelligent Object Detection System Based on Edge-Cloud Cooperation in Autonomous Vehicles. IEEE Trans. Intell. Transp. Syst. 2022, 23, 25345–25360. [Google Scholar] [CrossRef]
Durak, K.; Akcay, M.N.; Erinc, Y.K.; Pekel, B.; Begen, A.C. Evaluating the performance of Apple’s low-latency HLS. In Proceedings of the 2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP), Tampere, Finland, 21–24 September 2020. [Google Scholar]
Boland, J.E.; Fonseca, P.; Mermelstein, I.; Williamson, M. Zoom Disrupts the Rhythm of Conversation. J. Exp. Psychol. Gen. 2022, 151, 1272–1282. [Google Scholar] [CrossRef] [PubMed]
3GPP 26.929:QoE Parameters and Metrics Relevant to the Virtual Reality (VR) User Experience. Available online: https://itecspec.com/archive/3gpp-specification-tr-26-929 (accessed on 27 September 2019).
Ubik, S.; Pospisilik, J. Video Camera Latency Analysis and Measurement. IEEE Trans. Circuits Syst. Video Technol. 2021, 31, 140–147. [Google Scholar] [CrossRef]
Apple. ProRes White Paper. Available online: https://www.apple.com/final-cut-pro/docs/Apple_ProRes_White_Paper.pdf (accessed on 27 September 2019).
Society of Motion Picture and Television Engineers. RDD 36:2015—SMPTE Registered Disclosure Doc—Apple ProRes Bitstream Syntax and Decoding Process. RDD 36:2015; Society of Motion Picture and Television Engineers: White Plains, NY, USA, 2016; pp. 1–39. [Google Scholar]
Society of Motion Picture and Television Engineers. RDD 35:2016—SMPTE Registered Disclosure Doc—TICO Lightweight Codec Used in IP Networked or in SDI Infrastructures. RDD 35:2016; Society of Motion Picture and Television Engineers: White Plains, NY, USA, 2016; pp. 1–53. [Google Scholar]
Descampe, A.; Keinert, J.; Richter, T.; Fößel, S.; Rouvroy, G. JPEG XS, a new standard for visually lossless low-latency lightweight image compression. In Proceedings of the Conference on Applications of Digital Image Processing XL, San Diego, CA, USA, 6–10 August 2017. [Google Scholar]
Willème, A.; Descampe, A.; Lugan, S.; Macq, B. Quality and Error Robustness Assessment of Low-Latency Lightweight Intra-Frame Codecs for Screen Content Compression. IEEE J. Emerg. Sel. Top. Circuits Syst. 2016, 6, 471–483. [Google Scholar] [CrossRef]
Buysschaert, C.D.A. Overview of JPEG XS [EB/OL]. 2022. Available online: https://jpeg.org/jpegxs/index.html (accessed on 27 September 2019).
Wiegand, T.; Sullivan, G.J.; Bjontegaard, G.; Luthra, A. Overview of the H.264/AVC video coding standard. IEEE Trans. Circuits Syst. Video Technol. 2003, 13, 560–576. [Google Scholar] [CrossRef] [Green Version]
Sullivan, G.J.; Ohm, J.R.; Han, W.J.; Wiegand, T. Overview of the High Efficiency Video Coding (HEVC) Standard. IEEE Trans. Circuits Syst. Video Technol. 2012, 22, 1649–1668. [Google Scholar] [CrossRef]
Bross, J.C.B.; Liu, S.; Wang, Y.-K. Versatile Video Coding (draft10). Document Rep. JVET-S2001/MPEG m54716.Teleconference, April 2020.
Zheng, X.; Liao, Q.; Wang, Y.; Guo, Z.; Wang, J.; Zhou, Y. Performance Evaluation for AVS3 Video Coding Standard. In Proceedings of the 2020 IEEE International Conference on Multimedia & Expo Workshops (ICMEW), London, UK, 6–10 July 2020. [Google Scholar]
Chen, Y.; Murherjee, D.; Han, J.; Grange, A.; Xu, Y.; Liu, Z.; Parker, S.; Chen, C.; Su, H.; Joshi, U.; et al. An Overview of Core Coding Tools in the AV1 Video Codec. In Proceedings of the 2018 Picture Coding Symposium (PCS), San Francisco, CA, USA, 24–27 June 2018. [Google Scholar]
Ohm, J.R.; Sullivan, G.J.; Schwarz, H.; Tan, T.K.; Wiegand, T. Comparison of the Coding Efficiency of Video Coding Standards—Including High Efficiency Video Coding (HEVC). IEEE Trans. Circuits Syst. Video Technol. 2012, 22, 1669–1684. [Google Scholar] [CrossRef]
Bossen, F.; Bross, B.; Suhring, K.; Flynn, D. HEVC Complexity and Implementation Analysis. IEEE Trans. Circuits Syst. Video Technol. 2012, 22, 1685–1696. [Google Scholar] [CrossRef] [Green Version]
Topiwala, P.; Krishnan, M.; Dai, W. Performance Comparison of VVC, AV1 and EVC. In Proceedings of the Conference on Applications of Digital Image Processing XLII, San Diego, CA, USA, 12–15 August 2019. [Google Scholar]
Gettys, J. Bufferbloat: Dark Buffers in the Internet. IEEE Internet Comput. 2011, 15, 96. [Google Scholar] [CrossRef] [Green Version]
Peng, C.; Fei, M.R.; Tian, E.; Guan, Y.P. On hold or drop out-of-order packets in networked control systems. Inf. Sci. 2014, 268, 436–446. [Google Scholar] [CrossRef]
Zhan, X.S.; Wu, J.; Jiang, T.; Jiang, X.W. Optimal performance of networked control systems under the packet dropouts and channel noise. ISA Trans. 2015, 58, 214–221. [Google Scholar] [CrossRef] [PubMed]
Liang, G.F.; Liang, B. Effect of Latency and Buffering on Jitter-Free Streaming Over Random VBR Channels. IEEE Trans. Multimed. 2008, 10, 1128–1141. [Google Scholar] [CrossRef]
Cardwell, N.; Cheng, Y.; Gunn, C.S.; Yeganeh, S.H.; Jacobson, V. BBR: Congestion-Based Congestion Control. Commun. ACM 2017, 60, 58–66. [Google Scholar] [CrossRef] [Green Version]
Display. Available online: https://displaylag.com/ (accessed on 27 September 2019).
Boyaci, O.; Forte, A.; Baset, S.A.; Schulzrinne, H. vLatency: A Tool to Measure Capture-to-Display Latency and Frame Rate. In Proceedings of the 2009 11th IEEE International Symposium on Multimedia, San Diego, CA, USA, 14–16 December 2009. [Google Scholar]
Hill, R.; Madden, C.; Van Den Hengel, A.; Detmold, H.; Dick, A. Measuring Latency for Video Surveillance Systems. In Proceedings of the 11th Conference on Digital Image Computing: Techniques and Applications, Melbourne, Australia, 1–3 December 2009. [Google Scholar]
Shreedhar, T.; Panda, R.; Podanev, S.; Bajpai, V. Evaluating QUIC Performance Over Web, Cloud Storage, and Video Workloads. IEEE Trans. Netw. Serv. Manag. 2022, 19, 1366–1381. [Google Scholar] [CrossRef]
Wang, H.; Zhang, X.; Chen, H.; Xu, Y.; Ma, Z. Inferring End-to-End Latency in Live Videos. IEEE Trans. Broadcast. 2022, 68, 517–529. [Google Scholar] [CrossRef]
Bachhuber, C.; Steinbach, E.; Freundl, M.; Reisslein, M. On the Minimization of Glass-to-Glass and Glass-to-Algorithm Delay in Video Communication. IEEE Trans. Multimed. 2018, 20, 238–252. [Google Scholar] [CrossRef]
Wenger, S. Temporal scalability using P-pictures for low-latency applications. In Proceedings of the 1998 IEEE Second Workshop on Multimedia Signal Processing (Cat. No.98EX175), Redondo Beach, CA, USA, 7–9 December 1998. [Google Scholar]
Liu, Z.; Yu, X.; Gao, Y.; Chen, S.; Ji, X.; Wang, D. CU Partition Mode Decision for HEVC Hardwired Intra Encoder Using Convolution Neural Network. IEEE Trans. Image Process. 2016, 25, 5088–5103. [Google Scholar] [CrossRef]
Ryu, S.; Kang, J. Machine Learning-Based Fast Angular Prediction Mode Decision Technique in Video Coding. IEEE Trans. Image Process. 2018, 27, 5525–5538. [Google Scholar] [CrossRef]
Xu, M.; Li, T.; Wang, Z.; Deng, X.; Yang, R.; Guan, Z. Reducing Complexity of HEVC: A Deep Learning Approach. IEEE Trans. Image Process. 2018, 27, 5044–5059. [Google Scholar] [CrossRef] [Green Version]
Amestoy, T.; Mercat, A.; Hamidouche, W.; Menard, D.; Bergeron, C. Tunable VVC Frame Partitioning Based on Lightweight Machine Learning. IEEE Trans. Image Process. 2020, 29, 1313–1328. [Google Scholar] [CrossRef]
Wang, L. Rate control for MPEG video coding. Signal Process. Image Commun. 2000, 15, 493–511. [Google Scholar] [CrossRef]
Jyi-Chang, T.l.; Chaur-Heh, S. Modified TMN8 rate control for low-latency video communications. IEEE Trans. Circuits Syst. Video Technol. 2004, 14, 864–868. [Google Scholar]
Li, Z. Adaptive basic unit layer rate control for jvt (jvt-g012). In Proceedings of the Joint Video Team (JVT) 7th Meeting, Pattaya, Thailand, 7–14 March 2003. [Google Scholar]
Choi, H.; Yoo, J.; Nam, J.; Sim, D.; Bajić, I.V. Rate control based on unified rq model for hevc. In Proceedings of the ITU-T SG16 Contribution, JCTVC-H0213, San Jose, CA, USA, 1–10 February 2012; pp. 1–13. [Google Scholar]
Li, B.; Li, H.; Li, L. Rate control by r-lambda model for hevc. In Proceedings of the JCTVC-K0103, JCTVC of ISO/IEC and ITU-T, 11th Meeting, Shanghai, China, 10–19 October 2012. [Google Scholar]
Li, Y.; Liu, Z.; Chen, Z.; Liu, S. Rate control for versatile video coding. In Proceedings of the Joint Video Experts Team (JVET), 11th Meetting, Ljubljana, Slovenia, 10–18 July 2018. [Google Scholar]
Choi, H.; Yoo, J.; Nam, J.; Sim, D.; Bajić, I.V. Pixel-Wise Unified Rate-Quantization Model for Multi-Level Rate Control. IEEE J. Sel. Top. Signal Process. 2013, 7, 1112–1123. [Google Scholar] [CrossRef]
Wang, S.; Ma, S.; Wang, S.; Zhao, D.; Gao, W. Quadratic ρ-domain based rate control algorithm for HEVC. In Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, Vancouver, BC, Canada, 26–31 May 2013. [Google Scholar]
Li, B.; Li, H.; Li, L.; Zhang, J. Domain Rate Control Algorithm for High Efficiency Video Coding. IEEE Trans. Image Process. 2014, 23, 3841–3854. [Google Scholar] [CrossRef] [PubMed]
Kwon, Y.W.; Chang, H.; Kim, J.W. Adaptive FEC control for reliable high-speed UDP-based media transport. In Advances in Multimedia Information Processing-Pcm 2004, Proceedings of the 5th Pacific Rim Conference on Multimedia Part II, Tokyo, Japan, 30 November—3 December 2004; Aizawa, K., Nakamura, Y., Satoh, S., Eds.; Springer Nature: Berlin, Germany, 2004; pp. 364–372. [Google Scholar]
Fouladi, S.; Emmons, J.; Orbay, E.; Wu, C.; Wahby, R.S.; Winstein, K. Salsify: Low-Latency Network Video Through Tighter Integration Between a Video Codec and a Transport Protocol. In Proceedings of the 15th USENIX Symposium on Networked Systems Design and Implementation, Renton, WA, USA, 9–11 April 2018. [Google Scholar]
Shaswary, E.; Tavakkoli, J.; Xu, Y. A New Algorithm for Time-Delay Estimation in Ultrasonic Echo Signals. IEEE Trans. Ultrason. Ferroelectr. Freq. Control. 2015, 62, 236–241. [Google Scholar] [CrossRef] [PubMed]
Popescu, D.A.; Moore, A.W. PTPmesh: Data Center Network Latency Measurements Using PTP. In Proceedings of the 2017 IEEE 25th International Symposium on Modeling, Analysis, and Simulation of Computer and Tele-communication Systems (MASCOTS), Banff, AB, Canada, 20–22 September 2017. [Google Scholar]

Figure 1. Video transmission delay distribution. The squares 1–3 marked in the figure are the video frames transmitted.

Figure 2. Frame reference relationship of HEVC RA mode. In the figure, frame I is an internal coding frame, frame P is a forward reference frame, and frame B is a bidirectional reference frame.

Figure 3. (a) Timecode latency measurement scheme; (b) Spline Bar in the screen. ”bar” indicates the timecode. The picture contains Chinese words: “School of Information Engineering, Jimei University”.

Figure 4. Remote online measurement scheme.

Figure 5. Lossless remote video online measurement scheme.

Figure 6. Lossless remote video delay measurement algorithm.

Figure 7. Hash packet format.

Figure 8. Design framework and physical and picture of delay measurement equipment. The picture contains Chinese words: “School of Information Engineering, Jimei University”.

Figure 9. (a) Delay measurement of H.264 codec; (b) Delay measurement of H.265 codec. The Chinese characters in the two pictures are “School of Information Engineering, Jimei University”.

Figure 10. (a) Camera delay measurement result; (b) Method 2 Delay measurement results.

Figure 11. The delay results of method 3 in the form of web pages. The Chinese words in the picture are “unsafe”.

Table 1. Latency results in the First 10 s of video transmission.

Number i	Network Port Extraction		Latency (ms)
Number i	$T_{i}$	$T_{r}$	Latency (ms)
1	78542	78542	0
2	25078542	25078542	0
3	50078965	51178795	43.9932
4	75079107	9706464	43.9011
5	100079433	103374265	43.8986
6	125079707	129472713	43.9270
7	150079966	155570759	43.9115
8	175080285	181668404	43.8842
9	200080505	207766329	43.9170
10	225080814	233864295	43.9063

Table 2. Pro and cons of 3 proposed methods.

Type	Precision	Distortion	Operational Requirements	Online Measurement	Measuring Frequency
Method 1	0.1 µs	Destination content distortion	Pre-encode access	Yes	Per frame
Method 2	Sub frame accurate	Source Destination content distortion	Site access GPS access	No	Every 5 s
Method 3	0.1 µs	No content distortion	Pre-encode access	Yes	Per frame

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wu, Y.; Bai, X.; Hu, Y.; Chen, M. A Novel Video Transmission Latency Measurement Method for Intelligent Cloud Computing. Appl. Sci. 2022, 12, 12884. https://doi.org/10.3390/app122412884

AMA Style

Wu Y, Bai X, Hu Y, Chen M. A Novel Video Transmission Latency Measurement Method for Intelligent Cloud Computing. Applied Sciences. 2022; 12(24):12884. https://doi.org/10.3390/app122412884

Chicago/Turabian Style

Wu, Yiliang, Xue Bai, Yendo Hu, and Minghong Chen. 2022. "A Novel Video Transmission Latency Measurement Method for Intelligent Cloud Computing" Applied Sciences 12, no. 24: 12884. https://doi.org/10.3390/app122412884

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Novel Video Transmission Latency Measurement Method for Intelligent Cloud Computing

Abstract

1. Introduction

2. Analysis of Video Transmission Latency

2.1. Video Codec Latency

2.2. Network Transmission Latency and Code Rate Control

3. Methodology

3.1. Method 1: Timecode Latency Measurement

3.2. Method 2: Remote Online Measurement

3.3. Method 3: Lossless Remote Video Online Measurement

4. Evaluation

4.1. Method 1: Timecode Method

4.2. Method 2: Remote Online Measurement

4.3. Method 3: Lossless Remote Video Online Measurement

4.4. Comparison of Proposed Methods with Existing Methods

5. Summary

Author Contributions

Funding

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI