Edge Computing-Based SAT-Video Coding for Remote Sensing

This paper proposes an edge computing-based video coding implementation on an Earth observation satellite (SAT-video coding), which can encode video using limited resources and the power of mini/microsatellites. SAT-video coding proposes a hardware-related quantization (Q) function, hardware reduction of the motion estimation (ME) method, and simplified entropy coding (EC), which reduces the computation complexity. The hardware-related Q reduces hardware resource and power consumption by 72% and 55%, respectively, compared with traditional Q implementation. The hardware reduction of ME reduces resource use compared with regular ME implementation (59% of lookup tables [LUTs] and 79% of Registers). The total number of LUTs used for the simplified EC function is also much lower than other EC hardware implementations. The SAT-video encoder IP uses fewer hardware resources, and the power consumption is estimated at 0.0894 W at a high working frequency (125 MHz). The SAT-video encoding speed is 18.95 frames per second for $2560\times 2560$ video. Therefore, the proposed SAT-video coding is an edge computation suitable for micro/minisatellites. The coding efficiency records the highest compression ratio at 33.8, with a peak signal-to-noise ratio of 34.46 dB. With the important task of designing edge computing based on satellite video encoding, these are adequate values for remote sensing video.


I. INTRODUCTION
With the rapid development of satellites, the demand for remotely acquired video is overtaking that for remotely acquired images. However, onboard remote sensing video has a challenge: the contradiction between limited power, satellite hardware resource, and small capacity of transmission bandwidth between the Earth and orbit with video quality. Video compression standards such as MPEG-2, H.264/AVC, H.265/HEVC are limited in deployment to the resource-constrained onboard satellite because of the enormous computational complexity of these methods [1]- [3]. Therefore, interest in the successful development of methods with which to produce real-time remotely acquired video is growing [4], [5].
The associate editor coordinating the review of this manuscript and approving it for publication was Wenming Cao .
The downlink bandwidth is limited in video coding on a satellite architecture, so reducing the transmission capacity is essential. The proposed edge computing design for video coding on a satellite can be used to encode the raw video from the raw/uncompressed video recorded by a high-resolution satellite camera. The compressed video data can then be transmitted to the ground station via the download link.
A standardized video compression algorithm has four main features: discrete cosine transforms (DCT), coding and quantization (Q), motion compensation (MC), and entropy coding (EC). In hardware reduction methods, the motion estimation (ME) technique based on pipelined design and rapid computing of the minimum sum absolute difference (SAD) on FPGA are introduced to speed up computation [6]- [10]. Moreover, the hardware efficiencies of Q reduce the computation complexity [11], [12]. This paper proposes a highly efficient video encoding, named as SAT-video coding, optimized for satellites' limited hardware resources and power and also achieves the target compression ratio (CR) and acceptable [13] peak signal-to-noise ratio (PSNR), based on hardware resource reduction with high-quality decompressed remotely sensing video. In this study, PSNR is determined as 25 [13], with CR is in the range between 2.03 and 8. 13. The SAT-video encoder is proposed for monochrome video, and the mission with the encoding speed is at least 12 frames per second (fps).
In order to design an edge-computing video coding on satellite, increasing compression speed is the most critical research area. Parallel processing is used to reduce executing time, a graphic processing unit (GPU) and field-programmable gate array (FPGA) are commonly used. Besides, FPGAs are more efficient with less power consumption and better performance [14]. More reason, GPUs are not currently qualified against radiation. Thus, FPGAs are selected as target devices to perform the video compression onboard satellites.
Moreover, the Xilinx Kintex series is selected for edge computing on satellite because of its lower energy consumption than the Virtex series. Moreover, Kintex-7Q FPGA (defense grade) withstands temperatures of [−55 : +125 • C], which is radiation tolerant. Therefore, Xilinx Kintex-7 (K7) is chosen to evaluate the proposed SAT-video coding for remote sensing.
The proposed method not only focuses on presenting a new design of ME, but also on optimizing Q and EC functions to improve the performance of satellite hardware: -Most research on improving Q/IQ in hardware implementations solves floating-point in the division, which takes up numerous resources. The proposed design focuses on solving the problem of floating points in a more straightforward direction that significantly reduces the resource usage of the hardware.
-The primary aim of this study is to provide a video coding method suitable for edge computing. Accordingly, this research proposes a simple ME method related to satellite hardware.
-This paper also presents an adaptive length coding by combining run-length and Huffman coding, which requires less computation complexity and low resources.

II. SAT-VIDEO CODING FOR REMOTE SENSING AND PROPOSED HARDWARE DESIGN
This section presents an SAT-video coding hardware implementation suitable for edge computing. Figure 1 displays the SAT-video encoding flowchart. Each block is represented for each sub-function of the video encoder process, including the Direct Memory Access (DMA) controller [15], pixel value truncation, frame switch, frame to blocks/blocks to frame, DCT/IDCT [16], Quantization/Inverse Quantization, ME, MC, residual generator, and package generator.
The SAT-video coding follows the hardware pipeline design [17] to increase the encoding speed. The new_ frame_in and ref_frame_in are the the I t and I t−1 , respectively. Moreover, the ref_frame_out is the I t−1 frame data in the case of I frame or the Image compensation of I t and I t−1 in the case of P frame. The bitstream_out and MVs_outs are the entropy encoded and motion vector data. Additionally, the package generator function and packet_out handle the satellite sensors information, including satellite latitude/longitude and camera position/angle. Moreover, the SAT-video encoding uses Advanced eXtensible Interface 4 (AXI4) [18], a high-speed data streaming, to stream data between each function. The 8 bits bandwidth is used between frame and blocks communication, ME and MC, and Entropy coding functions. Moreover, the 16 bits bandwidth is used for residual generation, transformation, and quantization functions. Moreover, DMA is used to control the cache data on DDR3 to cache reference frames, motion vectors and entropy encoded data.

A. HARDWARE DESIGN OF THE PROPOSED QUANTIZATION FUNCTION
In video coding, the quantization function is performed on the transformed coefficients -ω k (x, y). The quantization parameter (QP) determines the step size for associating the transformed coefficients with a finite set of steps. This section presents a hardware-friendly quantization operation that combines algorithms and hardware implementations.
The quantization functions consist of equations on DC and AC coefficients. In which, AC coefficients on I and P frames are two different calculations. The quantization equation on the DC coefficient has a floor division of ω k (x, y) by 8 [19] that requires a divider circuit -Equation (1).
With the AC coefficient, quantized values ϕ k (x, y) are quotients of division where dividends are ω k (x, y) and ω k (x, y) − δ 2 for I and P frames, respectively. The divisor is 2 × δ, where δ is a variable in the range 1 to 31 [19]. Equations (2) and (3) denote the quantization method [19] of the AC coefficient for I and P frames.
VOLUME 10, 2022 Equation (4) proposes using a shifting operator instead of the floor division in Equation (1) to improve divider circuit computation.
The division used in Equations (2) and (3) is a floatingpoint computation. Therefore, the quantization of AC coefficients requires multiple clock cycles and uses a large number of lookup tables (LUTs). Equations (5) and (6) are the rounding function of quantization functions for I and P frames, respectively, which reduce the computation complexity of floating-point division. The rounding of quantization functions is then shortened and replaced by shifting operators.
Proposed Equations (5) and (6) use rounding instead of floating-point divisions, which makes the proposed results different from traditional quantized values. The difference between proposed and traditional quantization functions results is in the range of [0:0.5]. However, the hardware implementation of the proposed functions significantly reduces the number of resources used for computation, as displayed in Table 1. Moreover, because the proposed hardware designs the 16 bits calculation for quantization function, the ϕ k (x, y) values domain is [−1639:1639]. Therefore, the difference between the proposed and traditional quantized values in the range of [0:0.5] is acceptable, and the proposed equations are preferable when implementing SAT-video coding edge computing. Figure 2 is the hardware implementation of the proposed quantization function, an FPGA pipelining design, in which one column represents one stage and one row represents one pipeline in the hardware. The signal-in/out arrow, rectangle, and left/right trapezoid represent input/output of pipeline implementation, calculation in one cycle, and the split/combine signal data, respectively. The rectangular circle represents the division module.
The transformed coefficient ω k (x, y) is input to the quantization hardware architecture. The quantization design consists of 3 parallel executions, which are denoted by x, y , z. Process x deals with the domain of positive or negative values. The most critical operation in process x is the comparison operator to check if the value is positive or negative. This result decides the sign operator's calculation in process z. Process y contains two registers. The first is a finite-state machine (FSM) [20], which controls I or P frame stages. The second is the divisor determining, which is the divisor calculation in Equations (5) and (6). The main process in this design (process z) contains the registers that calculate the absolute and dividend function of Equations (4), (5), and (6). The output of this progress calculation is the quantized value (ω k (x, y)). However, the division module is the most complicated stage in quantization function architecture. Therefore, Figure 3 presents the proposed division module implemented in Equations (5) and (6), represented by the rectangular circle in Figure 2.
The proposed division module, illustrated in Figure 3, is hardware friendly due to its use of integer to integer instead of integer to single-precision division. The hardware implementation is as follows: • Step 1: Store dividends into the remainder buffer. • Step 2: Subtract divisor buffer from the remainder buffer. Then, import the remainder value into the Controller.
• Step 3: Comparison: If the remainder ≥ 0, the Controller bypasses the remainder value into the remainder buffer. Then, shift the quotient buffer to the left and set the LSB bit to 1. If the remainder <0, the controller imports the previous remainder value into the remainder buffer. Then, shift the quotient buffer to the left and set the LSB bit to 0.  (5) and (6). • Step 4: The proposed hardware implementation (the red square) processes 17 times in operations of the divider to get a 17-bit quotient, which has one fraction bit in the LSB bit.
To implement the quantization function in Step 4 [19], the divider must process 24 times to obtain a 24-bits quotient to then obtain a single-precision floating-point, following the single-precision-floating point criterion from IEEE 754 standard single-precision floating-point multipliers designed [21]. The output of the division module is stored in 32 bits. However, proposed Equations (5) and (6) use rounding instead of floating-point divisions. Therefore, the proposed hardware implementation of the quantization function only processes 17 times in operations of the divider to get a 17-bit quotient. Furthermore, the proposed division output is cached in 16 bits.
The quantization method [19] implementation with the single-precision-floating point criterion (IEEE 754) and the proposed quantization division (integer-to-integer) module are compared in Table 1. Table 1 also compares simulation hardware and power resources between Q/IQ division [19] module with [21] and the proposed Q/IQ division module, implemented on Xilinx Kintex-7 XC7K410T. The use of hardware resource and power consumption are reduced by 71.99% and 55.76%, respectively.
In summary, with a small deviation (≤0.5 per quantized value) but a considerable hardware improvement, the proposed quantization design is selected to implement video encoding on a satellite for edge computing applications.

B. MOTION ESTIMATION AND MOTION COMPENSATION
The video coding standards, such as H.264 and High Efficiency Video Coding (HEVC), introduce variable blocks concept with sizes from 4 × 4 to 64 × 64 to improve the ME accuracy by increasing the video compression efficiency [22]. However, this concept complicates the encoding process and uses higher hardware resources [23]. Using block 8 × 8 reduces the compressed video size at least four times compared with 4 × 4. Using a small block size increases the number of resources, the compressed data size and decompressed video quality. Therefore, the ME function uses a block size 8 × 8 to define the motion vectors (MVs) in the proposed SAT-video coding architecture.
In addition, the use of 8 × 8 blocks for DCT and Q is more optimized for hardware than other sizes [24]. To avoid excessive use of memory and resources to split blocks and optimize all functions during video encoding, all equations, including DCT, Q, ME, use a block-based structure 8 × 8. Moreover, section 2.3 proposes an EC method by combining RLE and Huffman coding to achieve suitable compression for 8 × 8 blocks. Therefore, the macroblock 8 × 8 is selected to implement edge computing for SAT-video coding for remote sensing. Figure 4 presents the flowchart of the proposed ME function related to FPGA implementation. First, the block matching is processed from the current and reference frames using SAD calculation, presented in Equation (7), which considers a template block at position (x, y) in the current frame I t and the candidate block at position (x + u, y + v) in the previous frame I t−1 .
where g t (·) and g t−1 (·) are pixel values in frames I t , and frame I t−1 , respectively. Assuming the computational complexity per block is O(SAD(u, v)) = O(f (n)), then, with the video size 2560 × 2560 and a search window range (S) of 15 × 15, the complexity of block matching computation on the whole frame is = 23040000(f (n)). This paper proposes an SAD calculation and caching architecture to reduce the O (f (n)) and O(F(n))).
In the comparative study of three-step search (3SS) and diamond search (DS) [25], 3SS has lower computation complexity and level of hardware utilization than DS [26]. Additionally, in comparing the accuracy of different search algorithms in the ME function performed on 18 various video tests, the average PSNR values of full search (FS), 3SS, and DS are 32.55, 32.25, 32.10, respectively [27]. This means that 3SS and DS have approximately 99.08%, 98.62% accuracy, respectively. Therefore, 3SS is a better solution than DS in edge computing on satellites with the accepted video quality. The proposed hardware implementation of the ME function uses a 3SS and a search window range (S) of 15 × 15. Therefore, the initial step determines the search step λ = [4, 2, 1]. Figure 5 shows the detailed ME pipeline hardware design, including all the stages and the instructions in each stage.
Step 1: the central point is the position (x, y) of a pixel of I t . Because the proposed hardware implementation uses the search step of λ, (x + λ k , y + λ k ) are the candidate block positions in the previous frame I t−1 . The proposed hardware-related SAD is represented by Equation (8).
Furthermore, because of λ k , λ k ∈ [0, −λ k , λ k ], the proposed ME function only needs to use count([0, −λ k , λ k ]) 2 = 9 search points instead of 15 × 15 = 225 search points on traditional ME implementation. Therefore, the complexity of the proposed hardware implementation of block matching computation on the whole frame is which is reduced 25 times compared with traditional block matching O(F(n))).
Step 2: A hardware parallel accelerator is applied to optimize the computation complexity f (n). Equation (8) is implemented to find the SAD( λ k , λ k ) at each search point.
Instead of calculating SAD using full-blocks, the proposed ME design process is performed on a column that takes full advantage of the parallel computation of the FPGA to calculate multiple values at the same time, thereby increasing the parallel executing speed without reducing accuracy. Thus, the input in each stage consists of two columns, one from the current frame and the other from the reference frame. Figure 6 displays a traditional FPGA implementation of Equation (8), which uses one abs and two sum operations. Two pixels column values, one of which is taken from the reference block and the other from the current block, become the input of the SAD hardware design. The output is the SAD value between reference and current blocks. In this implementation, the sum1 operator sums the absolute pixel values in the reference and current columns. The sum value of the following two columns is then added to the cached sum1sum1using the sum2sum2 operator. However, with this design, the calculation sum1 cannot be performed in a single cycle with the 8 inputs values on each reference and current column because of dissemination failure, represented by the red regions in Figure 6 (b).
The first solution is using the smaller SAD block size [28] to solve the dissemination failure. However, this solution increases the memory required to cache the SAD values and the number of resources used for the whole video encoder. The second solution is to delay all the pipeline processes. The sum1 operator is delayed to the next cycle. However, extending the processing time to solve one operation calculation time latency is not desirable. Therefore, this paper proposes  a SAD design related to the hardware flow, displayed in Figure 7 (a).
The data received from the abs operation is split into two groups. Each group contains a 1 × 4 column value. The sum1 operator is executed on each group in parallel. Figure 7 (b) illustrates the time flow of the proposed SAD pipeline method. The SAD calculation can be done in one cycle.
The proposed SAD design effectively avoids dissemination failure without delaying the process. Moreover, the proposed pipeline method indeed maintains the calculation time with less performance loss.
Step 3: find the min (λ k ,λ k )∈S SAD( λ k , λ k ) from the 9 SAD cached values in step 2. The central point is updated to (x + λ k , y + λ k ) and we return to step 1 with k + 1.
Step 4: the motion vector is defined as follows: where k, k is the λ index. The ME implementation operates the calculation in 16 bits. However, because λ = [4, 2, 1], λ k , λ k ∈ [−7 : 7]. Therefore, the (λ k , λ k ) is normalized from a (16-bits, 16-bits) structure to (4-bits, 4-bits), which makes the MV size decrease 3.5 times with the same quality, and the compressed video size is reduced.
The regular ME function caches full SAD values [29] to implement 3SS. Each search step uses 9 stages (search locations) to find the min SAD value. Therefore, the hardware uses 255 cache points and 9 × 3 = 27 stages to implement the 3SS in FPGA. This paper proposes an optimized ME module, which caches 27 search points (initial and step 1) and processes finding the min SAD by using 9 pipelines stages (steps 2 and 3). The speed of the ME function is significantly improved and the hardware used is optimized. Table 2 presents the results of a comparison of the performance of the ME function when using a three-step search without optimization of SAD pipeline generation and SAC cache with the proposed implementation. is increased from 28 to 32 because the proposed solution uses BRAMs to cache the location of three-step search points. When designing edge computing for satellites, improvements to LUT and Reg that reduce calculation and power consumption are crucial. The proposed ME design improves the encoding performance of regular ME using the three-step search with a sequences size of 2560 × 2560. Table 3 presents the utilization comparison between the proposed ME implementation and other architecture. A hardware accelerator of integer-pixel motion [29] is selected for comparison with the proposed ME, which introduces an FPGA-based SAD calculation on all the search range. A pipeline ME design [30] with eight parallel SAD calculation hardware is also compared with the proposed method. Real-time DS ME [22] accelerates the search algorithm using cache values of search points, similar to the proposed SAD caching.
The proposed ME function uses fewer LUTs and Registers than others. Although [29] has a larger video size, the LUTs, Registers, and BRAM values only represent a search range ±16 of SAD block 16 × 16. Therefore, resource usage only represents a 256 × 256 video. Furthermore, using block size 16 × 16 reduces the amount of BRAM but also reduces decompressed video quality. Therefore, the performance of the proposed hardware is better than that of [29].
Moreover, the video size used for testing in [22] and [30] is smaller than that used in the simulation performed for the proposed function. The total pixel values that need to be cached and processed are only 1/3 of the proposed architecture. Therefore, the BRAM of the proposed architecture is three times that of [30], which is acceptable. Although the video size is smaller and optimized for hardware, the LUTs and Registers used in [22] and [30] are still much higher than the proposed design. This proves that the optimization of SAD in the proposed implementation is superior, and ME is minimalist.
Xilinx Kintex consumes less energy than the Virtex series. Therefore, Xilinx Kintex-7 (K7) is selected to implement the edge computing-based SAT-video coding for remote sensing.

C. THE OPTIMIZED ENTROPY CODING
Entropy coding plays a vital role in the last stage of encoding and the first stage of the decoding process. Run-length encoding (RLE) and Huffman coding are widely used in image compression. The Octonary repetition tree (ORT) and physical-next-generation secure computing (PHY-NGSC) [31] are optimization algorithms proposed based on RLE to increase compress performance. The means of PSNR value of RLE is more or less same that of the CAVLC (<=6%) and higher than CBAC (31%) [31], [32]. However, CAVLC, CBAC, CABAC are not good in terms of compression speed. This paper proposes a compression algorithm based on RLE that reduces the computation complexity and improves the encoding speed.
Huffman coding is widely implemented on both software and VLSI design [33], [34]. Considering the hardware limitations is most important when designing code for satellite implementation. Therefore, this study proposes an adaptive length coding to combine the run-length coding and Huffman coding suitable for FPGAs implementation and complement each other. Figure 8 presents a hardware process of EC, a FIFO progress whose input is a quantized block. This output is streamed to satellite memory. The proposed hardware implementation is a pipeline design, and all the pipeline blocks are running in parallel. Each stage follows the process outlined in Figure 8.
First, a zigzag scan [35] is applied to increase the continued zero values in the quantization coefficient. Run-length is then used to reduce the length of the quantization coefficient.  Second, based on ISO/IEC 10918-1 [36], a standard Huffman lookup table (ISO/IEC 10918-1) is created and loaded on the BRAM as an initialized step. Huffman coding then encodes each value of the run-length coding results using Huffman lookup table which uses fewer resources than the original Huffman coding. Finally, the Huffman encoded data on each coefficient is concatenated to encoded frame data. Table 4 presents the comparison of hardware performance when comparing the proposed EC with another FPGA EC. The methods in [37], [38], and [2] outline recent hardware, aware of ECs, which are calibrated with parallel processes on FPGA chips. The purpose of Entropy coding is to encode real numbers into binary values. Therefore, Table 4 compares the LUTs used in different EC designs.
Despite the video size being smaller than proposed video simulation, the number of LUTs used in [38] and [2] is much larger than the proposed EC design. Moreover, the larger video input size indicates that the total pixel values entered into EC in [37] are 1.26 times higher than the proposed design. The number of LUTs needed in [37] is 2.16 times higher than the proposed EC. Additional, the larger the block size, the larger the amount of LUTs needed. The higher the clock frequency, the more power is consumed [39]. Therefore, the proposed EC design uses fewer resources than [37], [38], and [2]. This proves that the proposed EC has a faster encoder speed but also consumes less power. Therefore, the proposed EC is suitable in edge computing-based SAT-video coding for remote sensing.
In summary, the SAT-video coding comprises a hardwarerelated quantization function that reduces resource usage and power consumption by up to 71.99% and 55.76%, respectively. Moreover, the amount of resource needed for the proposed SAD pipeline design in the ME function is only 1/3 of the non-pipelined design, and the SAT-video encoding uses a simple EC method, which only uses 1749 LUTs.

III. EXPERIMENT RESULTS
This section describes the test environment, the number of resources used, and the energy consumed by the proposed SAT-video algorithm. Values for CR and video quality (PSNR) are also presented in this section. The testing environment flowchart is presented in Figure 9.
Experiment testing uses two different systems (one acts as a satellite and the other acts as a ground station), interconnected through a network link (a signal transmission between the satellite and the ground station). First, satellite system caches raw video and video configs signals (GOP and QP) on memory, and sends it to Electrical ground support equipment (EGSE) via PCIe. The first DMA controls the signal received from the PCIe and caches video data and configuration into DDR3 (on-board EGSE). Then, SAT-video encoding is processed on EGSE, which adopts a Xilinx Kintex-7 XC7K410T and passes the radiation-tolerant test for a space mission. Verilog, a hardware description language, is used to implement SAT-video encoding. The second DMA controls the encoded data to the PCIe port. Then, the encoded data is returned to the satellite systems as a binary file. Binary data is transmitted to the ground station system via the network. After that, the SAT-video decoding, written in Python -a software description language, decompresses binary data to decompressed video.
The CR is calculated from the raw video size and compressed data size, which is the output from the PCIe port. After the decoding process, the decompressed video is used to evaluate the PSNR. The evaluation test uses QP and a group of pictures (GOP) to control the compressed video size and decompressed video quality.
Remotely sensed video coding on a satellite is a new research topic. Therefore, using uncompressed satellite videos of the right size to perform the test is impossible. All test videos have a preprocessing step to be suitable for testing. The first frame of each test video are presented in Figure 10. Table 5 provides additional information on the different testing videos. Four video sequences are used to evaluate the hardware resources used, power consumption, and CR/PSNR values. The videos include three remote sensing videos captured by satellite and one aerial video captured by UAV, each of which is 2560 × 2560 in size and has 12 frames.    Figure 11 present the hardware resources used by each sub-function of the video encoding process. The experiment results are recorded on the EGSE board and follow the testing process outlined in Figure 9. TFB, IR, and FC in Table 6 and Figure 11 are the accessing coding block, image reconstruction, and input frame controller, respectively.
The ME uses 44% of the LUTs resource. The DCT and IDCT functions use approximately 21%. The proposed Q function optimized for floating-point only uses 2%. The EC with run-length and Huffman look-up table uses few resources at 12%. Table 7 is the hardware implementation's resources comparison between Xilinx Kintex-7 (xc7k410) and Zynq-7000 Kintex-7 (xc7z035). Based on the data presented in Table 10, the proposed hardware implementation of the video encoder is available in both targets (Xilinx and Zynq). Table 8 shows the percentage of resource usage compared with the total resources available in the EGSE board with XC7K410T. The entire proposed video coding algorithm uses VOLUME 10, 2022    Therefore, the proposed SAT-video coding is a lowcomputation algorithm. This hardware design is, thus, a good model for remote sensing video coding on satellites.
The four test sequences in Table 5 are used to evaluate the coding efficiency. Intra coding uses resources but consumes minimal energy, whereas inter coding uses the maximum energy needed for the encoding process. Therefore, this paper considers intra-coding (GOP = 1) and inter-coding (GOP = 12, thus the coding process continued on one I frame and 11 P frames) to evaluate the hardware performance and coding efficiency. Figure 12 is the CR and PSNR evaluation of the proposed SAT-video coding on the four test sequences. QP configuration is used in the ranges between 1 and 29. Figure 12 shows the CR, PSNR, and RD curves of sequence S1, S2, S3, and S4, respectively; each row is represented for each test sequence. The results suggest that the CR and PSNR value ranges are 2.88 to 33.84 and 28.63 to 41, respectively. The proposed SAT-video coding method uses the least resource to match the satellite hardware and power consumption. Furthermore, the highest CR value can be used to reduce the amount of data transmitted from the satellite to the ground station. In addition, the PSNR value is in an acceptable range (42dB ∼ 28dB) with CR (2∼10). The CR and PSNR recorded with four testing sequences are suitable for remote sensing video coding. Figure 13 evaluates the CR and PSNR between proposed SAT-video coding with H.264 and HEVC on test sequence S1, which is a raw remote sensing video. In the intra-coding (GOP 1), the proposed algorithm's CR value is better than that of H.264 and HEVC. In inter-coding (GOP 6 and GOP 12), the CR only has better values in the test cases where δ < 16. Because the primary research target is to provide a video coding algorithm related to a satellite, the ME is optimized by hardware equation and processing. For inter-coding, the CR value depends on both the entropy-coded and MV values. Especially for GOP configurations with more P than I frames, when increasing the δ value, the residual is decreased, then the CR value depends mainly on MV data. Thus, the CR values in the inter-coding are lower than HEVC and H.264 in cases of greater δ value, and the number of frame P is a more significant rate than I in the same GOP. Figure 14 compares two decompressed videos with different levels of degradation; figures 14 (a) and (b) are decompressed from data of video compressed with the QP=2/GOP=1 and QP=16/GOP=12, respectively.
To design edge computing on satellites, the latency in signal processing must be optimized. Therefore, the edge computing-based SAT-video coding has the fastest encoding speed possible, providing a real-time video on the Earth station. Figure 15 displays the running time of the encoding process of one GOP, which has one I frame and one P frame. The running time is the sum of the active I frame and P frame.
where w is the frame width. According to the flow process shown in Figure 15, the frame activities of the I and P frames are defined by Equations (13) and (14), respectively.
Finally, a GOP cycle (g) is defined by Equation (15).
where n is the number of P frames in a GOP and ρ is the number of pixels in a frame (ρ = 2560 × 2560 = 6553600). In VLSI design in general and video coding implementation with FPGA in particular, Frequency is inversely proportional to Gates count but proportional to Power when deployed with the same method and algorithm [41]. Therefore, the proposed SAT-video coding uses a work clock frequency of 125 MHz. Table 9 shows the GOP cycle and the running time for each GOP with a work-clock frequency of 125 MHz. Two GOP cases are used to record CR/PSNR evaluations. The running time is defined by Equation (16). The maximum encoding frame rate per second (fps) is τ -Equation (17).
where n is the total frames in a GOP. The hardware designs of the proposed architecture and other video coding architectures are presented in Table 10. The implementations [1] and [2] have lower coding efficiency relative to HEVC standard, (PSNR) −0.16 dB and −3%, respectively. [3] has the same coding efficiency as HEVC. The efficient coding of [1], [2] and [3] are better than proposed SAT-video coding. However, the hardware required is much larger than the proposed SAT-video coding implementation. Studies [1] and [3] use test video sizes 1.35 times larger, but the number of LUTs used are 6 and 20 times higher. Moreover, the number of DSPs used in the proposed design has the lowest value among the methods investigated. The architecture proposed in [2] is designed for a smaller video size than the proposed architecture but the hardware consumption parameters are larger than that of the proposed video coding implementation.
The power consumption increases almost linearly with the clock frequency [39]. The proposed SAT-video coding is designed to work with the same frequency with [3], and a lower frequency than design [1], [2]. Therefore, the proposed hardware implementation and [3] have lower power consumption than other designs. The proposed SAT-video coding hardware only supports a maximum coding speed of 18.95 fps on the Xilinx Kintex 7 410T. This value is lower than [1], [2] and [3]. However, this design serves to deploy edge computing on satellites, and thus the energy consumption and resource usage are more important. Moreover, the speed of 18.95 fps is sufficient for real-time satellite remote sensing tasks.
Moreover, this research uses Xilinx Power Estimator to estimate the power consumption of the proposed video encoder. The total on-chip power consumption is equal to the sum of device static power, and the design power accounts for 2.892 W, of which GTX, dynamic, and device static account for 0.182 W (6%), 2.462 W (85%), and 0.248 W (9%), respectively. The proposed video encoding in hardware design, presented in Figure 1, only uses 4.91% LUTs, 4.99% Registers, 27.17% BRAMs, and 1.62% DSPs. Therefore, the power consumption of video encoding IP core is estimated at 0.0894 W, which is a low consumption for satellite missions.
The proposed SAT-video coding is an optimized video coding and hardware design for real-time encoding on a satellite. The priority target is to use fewer resources and consume less power. Based on the comparison results displayed in Table 10, the proposed design has the lowest LUTs, Registers, DSPs, and frequency clock among most miniature chip design techniques, and thus, the proposed hardware consumes the least energy of all the designs. The fps recorded in the proposed design is not the highest. Nonetheless, the satellite video retains a wide variety of applications, especially in identifying moving objects on the Earth's surface.

IV. CONCLUSION AND DISCUSSION
This study designs an edge computing-based SAT-video coding for remote sensing. The proposed SAT-video coding algorithm related to hardware has a fast encoding speed (up to 18.95 fps) and low energy consumption (0.0894 W) for 2560 × 2560 video at 125-MHz, which adapts to a satellite's hardware. The proposed SAT-video coding has a CR of up to 33.8, which reduces the data transmission bandwidth to the Earth. Moreover, the decompressed video's quality is evaluated for remote sensing mission with PSNR in the range of [28.63:41.69]. The proposed algorithm is SAT-video coded and implemented on low-value hardware, suitable for the limit of micro/mini-satellites. CR value is within the allowable range of the download link from the satellite to the ground station. In addition, PSNR values and decompressed video perform well for remote sensing tasks.  CYNTHIA S. J. LIU has been working on systems engineering, satellite data processing, and analysis for more than ten years at NSPO. She is currently the Director of the Satellite Image Division, National Space Organization (NSPO), Taiwan. Her expertise is mainly in remote sensing instrument (RSI) systems engineering and satellite image processing, especially on data injection, geometric/radiometric correction, and image restoration techniques. Her recent interests include the FORMOSAT-5 image application and promotion and also preparation for next-generation FORMOSAT-8 images system development.
HSIN-CHIA LIN received the Ph.D. degree in space science from the National Central University, Chungli, in 2012. Since 1992, he has been working in satellite development with the National Space Organization (NSPO), Taiwan. His research interests include the design of the command and data handling and the remote sensing instruments on FORMOSAT satellites. Currently, he is the CubeSat Project Manager with the National Space Organization.