A USRP-Based Testbed for Wideband Ranging and Positioning Signal Acquisition

For validation and demonstration of high accuracy ranging and positioning algorithms and systems, a wideband radio signal generation and acquisition testbed, tightly synchronized in time and frequency, is needed. The development of such a testbed requires solutions to several challenges. Tight time and frequency synchronization, derived from a centrally distributed time-frequency reference signal, needs to be maintained in the hardware of the transmitter and receiver nodes, and wideband signal acquisition requires sustainable data throughput between the receiver and host PC as well as data storage at GB level. This article presents a testbed for wideband radio signal acquisition, for validation and demonstration of high accuracy ranging and positioning. It consists of multiple Ettus X310 universal software radio peripherals (USRPs) and supports high accuracy (<100 ps) time-deterministic, sustainable signal transmission and acquisition, with a bandwidth up to 320 MHz (in dual channel mode) and frequencies up to 6 GHz. Generation and processing of wideband arbitrary signal waveforms is done offline. To realize these features, radio frequency on chip (RFNoC) compatible HDL units were developed for integration in the X310 SDR platform. Wideband transmission and signal acquisition at a lower duty cycle is applied to reduce the data offloading throughput to the host’s personal computer (PC). Benchmarking of the platform was performed to demonstrate sustainable long duration dual channel acquisition. Indoor range measurements with the synchronous operation of the testbed show a decimeter-level accuracy.

and indoor, where people live, work, and travel. In such areas, GNSS performance is degraded as satellite signals are blocked by buildings and other objects, and GNSS receivers may get "confused" by multipath reflections of the ranging signals. Besides this, GNSS is also vulnerable to unintentional or intentional interference (spoofing and jamming) [1].
Novel ranging and location-based services, as used for navigation, sensing, automated-driving, smartphone localization, and augmented reality, demand for increased accuracies at decimeter and subdecimeter level, and also need to operate in GNSS denied areas. Several terrestrial indoor and outdoor ranging systems are available on the market and novel algorithms, techniques, and platforms are developed, to achieve higher accuracies [2]- [5]. Typically, ranging is done based on estimating the time duration between transmission of the signal at the transmitter and arrival of this signal at the receiver. In a dense multipath radio channel, i.e., with a lot of reflections, the receiver may lock on a reflected signal instead of the desired direct path or line-of-sight (LoS) signal. In addition, it may not be able to distinguish the direct path signal from close-in reflections with a slightly longer delay, which then will cause an offset in the estimated timeof-arrival (ToA) of the signal. Since the precision of ToA estimation [6] and the separability of the multipath reflections is inversely proportional to the signal bandwidth, to achieve a high ranging and positioning accuracy wideband ranging signals are needed. Apart from a high range resolution, transmitters and receivers should also be accurately time-aligned, and in case phase-based ranging is applied, also accurate radio frequency synchronization is needed. This can be obtained by connecting the transmitters to an accurate time-frequency reference signal, like a 1 pulse per second (1PPS) timing signal and 10-MHz frequency reference. An accurate time alignment of the transmitters and receivers is important, since even a small error of, e.g., 1 ns already causes a range error of 0.3 m. For the development of high accuracy ranging and position algorithms, a flexible validation and demonstration testbed setup, to collect experimental data within real-life environments, and to demonstrate proof of concept, is highly desirable. The internal errors of such a system should be (substantially) smaller than those aimed at by the algorithm or system to be evaluated.
For the project at hand, the following system requirements are aimed for:-time synchronization between different devices better than 100 ps,-time and frequency references originating This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ from the same source,-wideband signal transmission with a bandwidth > 100 MHz, simultaneous transmission in multiple bands,-tuning band: 1-5 GHz,-sustainable offloading of acquired signals,-standalone operation of the anchor stations (transmitters).
To tackle the subject of wideband data transmission and acquisition, a system may be developed for a specific application. In [7], a data acquisition platform based on sub-Nyquist sampling is designed to test wideband multistandard receivers. An IR-UWB transmission and acquisition platform with the periodic transmission of pulses and equivalent time sampling signal acquisition is shown in [8].
Software defined radio (SDR) systems are flexible programmable hardware devices that can be used for prototyping of very diverse radio technologies. SDR devices are available from low complexity and low cost to very advanced and expensive equipment [9], [10]. These devices have a DSP unit for signal processing operations and separate ADC/DAC units and RF frontends for transmission and reception of radio signals. More advanced SDRs also contain an FPGA for implementing custom on-device signal processing algorithms. An SDR can be configured and controlled through a connected (PC). Data processing can be performed in the SDR, or the data can be offloaded to the PC (by a high speed connection) for further offline processing.
Examples of SDR systems for prototyping, validation, and demonstration related to, ranging and positioning can be found in [11]- [13]. For instance, an SDR-based TDoA indoor localization system using WiFi signals is presented in [11]. Multichannel-based ranging techniques for narrowband wide area networks are demonstrated in [12]. In [13], an SDR architecture for navigation using CDMA signals is presented. Many of these works, while using the flexibility of the SDRs, are very application specific. The whole platform is generally built to validate or demonstrate specific ranging or positioning techniques.
In this article, we discuss the development and testing of a more general SDR-based generic radio signal transmission and acquisition testbed, using high-end Ettus X310 Universal Software Radio Peripheral (USRP). This device consists of two radio frequency (RF) channels with a bandwidth of 160 MHz, which can be simultaneously used in either transmit or receive mode (2× 200-MSPS sampling rate), and a maximum combined bandwidth of 320 MHz. The most challenging requirements of the development process are: 1) the critical timing constraints, which have to be maintained from reference input through the hardware to the input-output signal time stamping and 2) sustainable signal acquisition at very high sampling rates and the related high data rate throughput and storage.
The main contributions of this work are the design, development, and validation of three radio frequency on chip (RFNoC) compatible logic units that allow for sustainable, time deterministic, synchronous transmission, and both synchronous and asynchronous acquisition of dual-band signals, sampled at a total rate of 400 MSPS. The developed blocks have been integrated through RFNoC within the X310 USRPs and are GNU Radio compatible. Indoor ranging experiments with 160-MHz wideband signals using this testbed show a decimeter level accuracy. Furthermore, it is demonstrated that the platform can be used for sustainable dual channel wideband signal acquisition.
The remainder of this article is organized as follows. Section II presents the signal model for ranging based on time delay estimation, and discusses the need for wideband signals. The challenges of wideband signal acquisition are presented in Section III, along with the required functionalities and specifications of the testbed. In Section IV, the X310 USRP and the development context are introduced. In Section V, the design and development of the logic units are described in detail. Benchmarking results of the testbed coarse synchronization and the long duration dual channel acquisition are presented in Section VI. In Section VII, experimental indoor ranging results, obtained with the test platform, are presented. Finally, in Section VIII conclusions are drawn and an outlook on future works is given.

A. Signal Model
For ranging based on time delay estimation, a baseband signal model is used. The transmitted bandpass signal at a frequency f c for the baseband signal s(t) is given by x(t) = Re{s(t)e j 2π f c t }, in which Re{·} indicates the real part of a complex value. We assume an ideal channel impulse response (CIR) for a static single path channel, given by with a is the attenuation of the signal due to propagation losses and τ is the propagation time between transmitter and receiver. The received baseband signal is then given by where * indicates convolution, θ 0 is the phase offset between transmitter and receiver at t = 0, f D is the frequency difference between the received signal and the receiver frontend, i.e., due to a frequency offset between transmitter and receiver or due to Doppler shift, and n(t) is noise with a zero-mean Gaussian distribution and variance σ 2 n . For a sampled system, the sampled received signal can be represented as where k represents the discrete time variable and T s is the sample time.

B. Time Delay-Based Ranging: Accuracy and Bandwidth
For ranging, the variable of interest is the time delay τ . From the estimated valueτ of τ , the estimated distanced between transmitter and receiver is found asd = c ×τ , where c = 3 × 10 8 m/s is the speed of light. The precision that can be obtained when performing ranging based on unbiased time delay estimation can be bounded by the Cramer-Rao Lower Bound (CRLB), which defines a lower bound on the variance of the estimation error σ 2 τ = (τ − τ ) 2 as (4) and is determined by the signal-to-noise ratio (SNR) and mean square bandwidth or Gabor bandwidth F 2 of the signal [6] defined as where S( f ) is the Fourier transform of s(t). Equation (5) shows that the Gabor bandwidth is related to the signal bandwidth. In a single path LoS channel, the variance of the range estimate becomes: σ 2 d = c 2 σ 2 τ . So, in order to reduce the range estimation error variance, a larger Gabor bandwidth is needed and/or the SNR has to be increased. As we saw in Section I, the time resolution, which determines how well multipath components that arrive with short time differences can be separated, also is inversely proportional to the signal bandwidth. So the use of wideband signals is essential for accurate ToA estimation and ranging in a multipath channel. The CRLB for such a scenario also depends on the channel characteristic and on the ranging algorithms used to perform the estimation, which is outside the scope of this article.

III. WIDEBAND RANGING SIGNAL ACQUISITION
Acquiring wideband radio signals for prototyping of ranging algorithms can be a hurdle due to the high amount of generated data. For sparse channels, sub-Nyquist sampling and compressive sensing techniques have been proposed that can operate at sample rates less than the Nyquist rate, and therefore partially overcome this issue [14], [15]. Such techniques can only be applied when sufficient a priori knowledge of the channel is available. Therefore, in the following, we only consider full rate wideband signal acquisition.
Bearing in mind the computational cost and data throughput of such full rate acquisition, one may alleviate the problem by implementing and testing all the algorithms directly on-device with real-time signals acquired by the RF hardware system in the testbed in concern. This online processing option could nevertheless turn out to be very time-consuming and not without implementation hardship, particularly due to FPGA and ASIC hardware development constraints. Moreover, such implementation is generally very inappropriate in the early stages of development, when algorithms and techniques are not reliably validated.
In this situation, the common practice is to first implement and test the techniques on a computer, in scientific software or libraries (MATLAB, Python/Numpy) with: 1) models of signals and then 2) signals acquired from real-life environments. This allows to have a faster initial assessment of the performance. Then, if needed, in a later stage hardware implementation for further validation can be done.
In this work, we are thus targeting full rate acquisition and transfer to PC of wideband signals for offline prototyping of ranging algorithms using an SDR-based testbed.  Table I presents characteristics of five SDR devices capable of dual-band wideband signal acquisition. The maximum bandwidth is varying from 320 to 1200 MHz, and sampling rates are varying from 400 to 1480 MSPS.
Let us take the X310, used in our testbed, as an example to study the computational requirements and acquisition issues.
This X310 allows for a maximum sample rate of 200-MSPS per channel, where one sample occupies 4 bytes (16 bits-I, 16 bits-Q). The equivalent data throughput to the host-PC is 800 MB/s for the one-channel configuration, and 1.6 GB/s if the X310 is operated with two channels (Table II). To handle this throughput, a PC with two high-end 10-Gb/s Ethernet cards is required. A total of 2.5 GB/s can be theoretically handled using such a configuration. A 1.6-GB/s data throughput also requires the use of NVMe-type SSD disks that can support a theoretical writing speed at GB/s level. It should be noted, however, that the common host-PC runs on a nonreal time operating system (OS) that may introduce latencies and delays, with process scheduling that reduces the effective throughput. Hence, even if the hard drive and the network link support 1.6 GB/s, a continuous acquisition without data losses can hardly be guaranteed. Besides throughput considerations, 10 min of data acquisition at 1.6 GB/s will generate a data file size of 960 GB (or 5.76 TB for 1 h). This renders long-duration experiments and further post processing very unpractical.

A. Bursts Transmission and Acquisition Scheme for Reduced Throughput Operations
The high amount of data transfer and the required transfer rate needed for continuous acquisition of wideband ranging signals can be largely relaxed when a wideband but low duty cycle ranging signal is used, i.e., periodic burst transmissions. Such a ranging scheme will allow data transfer from the receiving USRP to the host PC at a reasonable and sustainable throughput. Fig. 1 shows a low duty cycle-based setup for ranging, where a first X310 acts as a transmitter, and a second X310 acts as a receiver, which forward the ranging samples to a host-PC, through a 10 GE link. Let T b be the burst duration and T p the transmission period, then the resulting duty cycle is α = (T b /T p ). A too large duty cycle may cause an unsustainable throughput. We have specified a default transmission period of T p = 1 ms and a duration T b = 100 μs for a duty cycle of α = 0.1. In this way, 1000 range estimates can be obtained per second. A car driving at 30 m per second will move by 3 cm during a T p = 1 ms period. At a rate of 200 MSPS, 20 000 samples are transmitted in a single burst.
For two channels operating at full sampling speed, the throughput is reduced to 160 MB/s, a value well below the maximum ten GE Ethernet speed and the writing speed of a performant SSD. Moreover, one minute of data recording then results in a file size of 9.6 GB, see Table II. When an even lower duty cycle is acceptable, the throughput and file size will accordingly decrease.
The testbed is designed to support generic format signals ( Fig. 1) that meet the frontend bandwidth limits. As the X310 is the device in use, the transmitted signal can have a maximum contiguous bandwidth of 320 MHz when the two 160-MHz channels are combined. Ranging signals can vary from phase shift-keying (PSK) signals to more advanced signal designs such as orthogonal frequency-division multiplexing (OFDM). Signal samples are provided to the transmitter in I − Q-format in a raw file, and the received samples are available in I − Q-format for post processing.

B. Testbed Synchronization Requirements and Solutions
Synchronization is a critical issue in ranging. As wideband ranging signals are used to attain decimeter or even centimeterlevel accuracy, a small timing mismatch, occurring within the transmission or acquisition chain, will significantly degrade the ranging performance. We have opted for synchronization based on the use of an external (1PPS/PPS) time reference and a 10-MHz frequency reference. This allows to align the transmitting and the receiving windows in time for proper ranging signals recording at the receiver side, Fig. 1. The 1PPS and 10 MHz may be derived from various sources.
When the testbed receiver and transmitter(s) are relatively close together, the synchronization source can be a clock distributor which electrical 1PPS and 10-MHz outputs are conveyed via coaxial cables to the SDRs. When the testbed devices are more distant, GPS-based synchronization could seem to be a solution. However, GPS timing accuracy is in the order of 10 ns and therefore not suitable for decimeter level precision. High accuracy wireless-based synchronization may provide sub-nanosecond synchronization [17], [18] and could be used to synchronize the testbed devices.
White Rabbit (WR)-based synchronization over optical fiber networks is also a promising solution to distribute accurate synchronization reference signals over large areas. Time-frequency reference distribution, over tens of kilometers with sub-nanosecond synchronization accuracy, has been demonstrated [19]. High timing accuracy, at the 100-ps level, has already been demonstrated with WR [20], [21]. A 100 ps of timing error will generate 3 cm of ranging bias which may be acceptable for a decimeter level ranging or positioning system [22]. In the proposed testbed, time-frequency reference signals are distributed over a small optical fiber link using WR nodes.
In a multiple transmitter scenario, as shown in Fig. 2(a), accurate synchronization allows several X310 USRPs of the testbed to start transmitting ranging signal samples at predefined time instants. Moreover, the testbed will support timedivision-multiplexing (TDM), as shown in Fig. 2(b), where different X310 devices can transmit within the same burst period, or successively within different timed bursts [ Fig. 2(c)].
The receiver of the testbed can operate in synchronous mode, sharing the same time and frequency reference signals with the transmitter(s), or it can operate in a standalone asynchronous mode, which is more suitable for a mobile scenario. In asynchronous mode, a matched filter-based technique is used to detect the arrival time of the incoming ranging bursts and time-align the acquisition windows to the transmission windows.
The synchronous operating mode hardly causes any time delay impairments; however, synchronized frontends may still present phase offsets or a subsample bin level difference in transmitting/receiving time. For asynchronous mode operation, however, the impact of time delay impairments is much more significant. Moreover, a frequency offset by the free running receiver oscillator is also introduced. The matched filter-based synchronization is not totally comparable to a synchronization based on an external 1PPS/10-MHz signals, being less accurate.

C. Platform Specifications
The full specifications and functionalities of the testbed are summarized as follows: 1) multiple transmitters/receivers support, for ranging and positioning experiments; 2) time-frequency synchronization between the devices of the testbed. Synchronization based on a 10 MHz/1PPS external time-frequency reference signal. The previous features are generally common to SDR systems and may be already available or very easy to implement. The development we present is mainly focused on the following: 1) deterministic timing from transmitter(s) to the receiver, in particular in the custom units in hardware; 2) dual channel, full bandwidth (160 MHz per channel) and at full sample rate (200 MSPS per channel) sustainable periodic generic signal transmission, and acquisition; 3) hardware-based transmission and acquisition control and specialized time control units in hardware, in order to limit PC-SDR interaction. 4) support of synchronous and asynchronous operating modes for the receiver. The latter allows to move the receiver freely from the transmitter(s); 5) sustainable offload throughput and reduced experimental data file size (low duty cycle ranging signals); 6) time-division multiplexed transmissions for positioning applications; 7) configurability of transmission and recording packets sizes and periods.

A. X310 USRP System
The Ettus X310 USRP has two individually configurable RF channels, each of which can be used either as a transmitter or a receiver, operating at a maximum sample rate of 200 MSPS. Each RF channel has an effective bandwidth of 160 MHz. By combining the signals from two frontends, a total bandwidth of 320 MHz can be covered, resulting in a total sample rate of 400 MSPS. The central frequency of each RF frontend can be tuned from 10 MHz to 6 GHz.
Several X310 units may be associated within a synchronized network to form a flexible array. The Ettus X310 USRP is compatible with the GNU Radio and the universal hardware (UHD)/RFNoC software environments. This compatibility allows to develop and integrate C++/Python digital signal processing units. Moreover, the X310 features an embedded Xilinx Kintex7 XC7K410T FPGA. This allows for the implementation of on-device digital signal processing units. The main advantage of an FPGA implementation is that it allows for much faster and time deterministic on-device processing operations. Hardware acceleration tasks can thus be implemented in the Kintex7.

B. Dual-Channel Transmission and Acquisition With the X310: Current Options
Due to the USRP's flexible software/hardware environment, very diverse radio/signal processing features have been already implemented and are publicly available. In the following, we put forward the context of the developed solutions as well as alternative solutions for dual-channel transmission and acquisition with the X310 devices.
1) At the Transmitter Side: Using the USRP to transmit synchronized generic signals is not straightforward due to the high sample rate and throughput, and the required accuracy. Several solutions may, however, be considered. 2) At the Receiver Side: At the receiver side, and still considering dual channel acquisition at full sample rate (1.6-GB/s throughput), using DPDK is still a possible solution. However, assuming that the acquisition is without issues due to OS scheduling and latencies, this solution is rather inefficient as 10 min of data recording will fill a 1 TB hard drive. Moreover, if the transmitted signals are low duty cycle bursts, which is the likely approach, the solution is inefficient as the signal is being continuously acquired while only a small part of its samples is relevant.
Another option allowed by the X310 is to save the received samples in the external 1-GB DRAM. However, 1 GB only allows for 0.625 s of dual channel recording. Up to now, we did not come across a solution developed, tested, and benchmarked allowing for a longer duration, 400-MSPS dual channel synchronous or asynchronous sustainable data acquisition using the X310 USRP platform.

V. USRP SYSTEM BLOCK DEVELOPMENT
In this section, we present the specific blocks developed to implement the testbed. The development software environment is first introduced before focusing on the actual architecture of design and implementation.

A. GNU Radio, UHD, RFNoC
GNU Radio is a free and open-source software development and control toolkit that provides signal processing blocks for SDR systems [27]. These blocks are written in C++ or Python. GNU radio is compatible with a large number of SDR systems available on the market. GNU Radio blocks can also be used with the X310 USRP through the USRP Hardware Driver (UHD) layer. The UHD framework provides drivers, libraries, and basic terminal commands to directly operate USRPs from a host PC, or do so via GNU Radio. Moreover, it is also possible to implement hardware DSP units in the USRP Kintex7 FPGA. While software DSP blocks can be implemented through the GNU Radio framework, hardware DSP blocks, also called computation engines (CEs), are implemented on the USRP through the radio frequency network on chip (RFNoC) development framework [28], [29].
RFNoC allows data streaming between DSP blocks running on hardware, but also between hardware and host PC blocks (through the UHD). The RFNoC framework allows the implementation of up to ten logic CEs in the X310 Kintex7 [30].

B. Terminology
In the sequel, we specifically define the following.

C. Development of the Synchronous Receiver
The receiver flowgraph in GNU Radio is shown in Fig. 3. It contains the frontend, a file-sink block that points to the data file path and the rxBursts-CE, shown on top, which is handling the periodic transfer of received bursts to the host PC. The rxBursts-CE continuously receives samples from the RF frontend but only forward the time-aligned samples to the host-PC (Fig. 1). The default acquisition period is 1 ms. In the acquisition process, the CE will first move the time-aligned received samples to a 128-KB FPGA BRAM memory. When the BRAM memory is filled with the time-aligned samples, the CE acknowledges the PC of data availability. When ready, ranging samples contained in the BRAM are then forwarded to the host PC and saved to the data file. If, for instance a α = 0.1 (adjustable) duty cycle is considered, the recording duration is 100 μs (equal to 20 000 samples per channel), and the host PC has 900 μs to fully offload the 20 000 ranging samples in BRAM. The cycle is restarted each 1 ms. Here it is assumed that the transmission and the receiving windows are timealigned. Asynchronous operation, when the transmitter and the receiver are operating on independent oscillators, is considered later in this article.
1) Architecture: A detailed block diagram of the rxBursts-CE is shown in Fig. 4. The architecture includes the default RFNoC layers, such as the NoC Shell, the AXI wrapper, and the configuration registers, and shows the core logic architecture for burst acquisition using the 128-KB BRAM memory. The logic can be divided into six parts: two finite state machine (FSM) units, a header unit, a timing unit, a multiplexing unit, and the 128-KB BRAM memory where the samples to be forwarded to PC are temporarily stored. A 128 KB of memory can hold in total 32 768 samples, i.e., a maximum burst length of 163.8 μs.
FSM1 is the main state machine. It controls the storing to BRAM and also triggers the transfer of samples to the PC. The BRAM interface (I/F) allows to add metadata such as headers, timing information, and even addresses to the data forwarded to the host PC.
For time coordination, a key sub-block is the time unit which provides the following 1) the 1-ms reference; 2) the 5-ns counter reference; 3) time synchronization between the 1PPS and 1-ms signals and other control signals in the logic; 4) FSM2 controls the sample-by-sample transfer of the BRAM data to the host PC. Once the time-aligned samples are fully transferred to PC, the state machine returns to its idle state and waits for the next transfer trigger from FSM1 to restart the offload process.
2) Time Synchronization Inside the FPGA: To ensure time alignment, the 1-ms reference is synchronized to the frontend time counter (vitatime counter) and to the 1PPS external reference. Thus, several USRPs will have all their time references synchronized and their processes time-aligned.
3) Packets Offloading Period: The received packets are, by default, transferred to the PC every 1 ms. Two configuration registers (idx/max idx), are available to increase this period by a natural integer value L. In that case the offload process will be restarted every L× 1 ms instead of every 1 ms, thus further reducing the duty-cycle and the average throughput and data file size.

4) RFNoC Crossbar Backpressuring:
The RFNoC crossbar is a "smart" bus that links different CEs, relying on the VITA Radio Transport (VRT [31]), and Condensed Hierarchical Datagram for RFNoC (CHDR, [32]) transport protocols and the AXI communication interface. VRT defines a radio transport-layer protocol that facilitates signal and context data (timing) sharing between or within radio frequency equipment. CHDR is a protocol that defines the fundamental unit of data transfer in an RFNoC network.
The crossbar controls the sample by sample transfer between different RFNoC CEs. An important consideration is that the RFNoC crossbar can interrupt a stream of samples between a transmitting CE and a receiving CE for several clock periods when the crossbar is adjusting to cope with the high data throughput. This phenomenon is also known as backpressuring.
However, such an interruption will cause a shift in the samples' timestamps. Therefore, we added logic in the design to monitor when these streaming interrupts happen. For each acquisition period, the start of the data transfer from the RF frontends and the number of clock periods of interruption is inserted as metadata in the received packet. This information can then be used offline to correct for any mismatch, in particular when ranging is based on ToA techniques which require all hardware delays to be fixed.

5) Received Packet Format:
A received ranging packet contains 30 samples of metadata consisting of: ten samples of header, seven samples of timing and packet context, and 13 samples which are currently not filled in, see Fig. 5.

D. Development of the Asynchronous Receiver
The rxBursts-CE only enables burst acquisition in a synchronized ranging/communication scenario, i.e., in case the transmitter and receiver are synchronized to the same reference.
In a realistic situation, there is no time-frequency synchronization between the transmitter and the receiver, and the transmission and receiving windows are not deterministically aligned and are continuously sliding with respect to each other. Having the option to work in an asynchronous configuration allows for a more flexible testbed as the receiver can be mobile. Therefore, we developed the arxBursts-CE, which is based on the rxBursts-CE. The main difference is that the asynchronous CE has a Time-Unit, associated with a 512 sample length correlation-based coarse synchronizer, to align the receive window with the incoming ranging packets. Once the GNU-Radio flowgraph is launched, the CE is continuously performing a Schmidl&Cox-based correlation on the samples provided by the RF frontend [33], [34]. After ten consecutively detected synchronization frames with consistent timing, the CE will lock on the last detection time and launch the recordto-BRAM and offload-process. This detection time is also tracked and updated to keep the receive windows aligned with the transmitted bursts. The situation can be compared to the synchronous mode where the record to BRAM and the transfer-to-PC process is triggered by the 1-ms time reference pulse. An asynchronous receiver uses a GNU Radio flow graph similar to the one presented in Fig. 3, only the synchronous CE, the rxBursts, is replaced by the asynchronous one, the arxBursts.

E. Development of the Transmitter
The transmitter flowgraph in GNU Radio is illustrated in Fig. 6. It contains a file-source block which represents the samples to be streamed, the txBursts-CE which schedules and controls the periodic transmission of the ranging signal and the RFNoC Radio which represents the RF frontend.
The transmitter USRP operates as follows. The generic ranging samples are received from the host PC (raw file source) and moved into a block RAM (BRAM) memory encapsulated by the txBursts-CE, in the FPGA. These samples are then periodically streamed out through the RF frontend at a default interval of 1 ms. The ranging burst has an adjustable duration of, by default, 100 μs (α = 0.1, 20 000 samples), and zero-valued samples are streamed for the rest of the 1-ms transmission period. This is done by using a multiplexing mechanism which first connects the output of the txBursts-CE to the output of the BRAM memory. Then when all the samples in the BRAM have been transmitted, the output of  the CE is connected to a still zero-value sample through the multiplexing mechanism. Thus zero-value samples will be transmitted until the next transmission period. This ensures an uninterrupted data stream between the CE and the RF frontend in order to avoid timing issues due to ON and OFF switching between streaming periods. The txBursts-CE forward ranging packets to the RF frontend with hardware generated timestamps relative to the 1PPS and the frontend time counter (vitatime counter). This allows transmission time synchronization with respect to the external reference and to other devices in the setup. Fig. 7 shows the transmitted signal and the 1PPS reference signal, as observed by an oscilloscope. A delay of about 8 μs between the two signals can be seen, which is caused by a constant hardware latency in the transmission chain [35] which can be calibrated out in the off-line processing afterward.
1) Architecture: The txBursts-CE has a similar architecture as the rxBursts-CE (Fig. 4). The main difference between the two blocks lies in its functionalities. The rxBursts-CE periodically receives signal samples and transfers this data to the host PC, while the txBursts-CE receives the signal samples once and periodically transfers the samples to the transmitting RF frontend. Hence, the FSM of the two CE is slightly different.

2) Time Division Multiplexing:
To associate several synchronized transmitting frontends, channels or devices, code division multiplexing (CDM), or TDM within the same transmission period can be implemented by signal design, see Fig. 2(b). We have chosen to also implement a TDM transmission scheme where 1-ms transmission slots are successively allocated to different transmitters, as shown in Fig. 2 (c). By default, sample transmission is retriggered every 1 ms. When TDM is used, the transmission of the samples is retriggered every K ×1 ms, where K is defining the total number of transmitters. For each transmitter, the idx (transmitter index) parameter, indicating the actual transmitter index in the TDM chain, and the max idx parameter, which corresponds to K , are configured before running the flowgraph. Referring to Section V-C3, L has to be a multiple of K .

VI. CHARACTERIZATION
In the following, we characterize the implemented testbed by means of some benchmark tests.

A. Packet Detection Performance
The performance of the asynchronous receiver detector is assessed by means of simulation using the receiver operating characteristics (ROC) metric [36] which links the probabilities of detection (PD) and false alarm (PFA) for different SNR and threshold values. The full detection stage was implemented in MATLAB and the computations were performed in integer format, as implemented in the hardware.
The received signal is based on the model as presented in (3). The frequency and phase offsets that occur in the asynchronous mode, are embedded in the received baseband signal variables f D and θ 0 , respectively. In the simulations, the frequency offset is randomly chosen between 0 and 10 kHz, which corresponds to values observed in practice. For the assessment, 1000 simulations are run. For each simulation, a random noise signal, frequency offset, and phase offset are generated. The ROC characteristic is presented in Fig. 8(a). Synchronization performance can be optimized by carefully adjusting the threshold value. As indicated before, for initial synchronization, the detection stage of the CE waits for ten consecutive positive detection events with consistent timing before providing the actual trigger to start the arxBursts-CE signals' recording and transfer process. This avoids locking on false positives in the initial phase. For instance, with SNR = 15 dB and P D = 0.8, P FA = 0.5, the probability of having ten consecutive false positives spaced by the 1-ms bursts period is P 10 F A ≈ 0.001, the probability of correctly locking to ten consecutive bursts is P 10 D ≈ 0.1.

B. Dual Channel Acquisition Benchmarking
We now present transfer and storage benchmarks of the system with dual channel acquisition, where a single X310 samples incoming signals from its two channels and forward the data to a connected PC through a 10-Gb/s optical Ethernet link. Every T p = 1 ms, a burst of T b = 125 μs is transmitted per channel, resulting in a total data throughput between USRP and PC of 200 MB/s, as each packet per channel holds 25 000 samples (100 kB). The main interest is to monitor the number of packet drops over a substantial number of acquired packets (>1e6 packets on each channel) to assess the sustainability of long duration acquisition.
The host-PC is an 8-core, 3.6-GHz Intel Xeon W-2123 Dell Precision 5820, having 16 GB of RAM and running on Ubuntu 18.04. Packets forwarded by the X310 are saved on a Samsung 970 pro SSD drive that has been benchmarked to support a writing speed higher as 2.5 GB/s. The latter is well over the needed 200-MB/s throughput. Nevertheless, having an Ethernet connection and an SSD writing speed that can handle 1.25 and 2.5 GB/s, respectively, does not guarantee reliable acquisition. Indeed the OS is standard and not a real-time system tailored for high throughput data acquisition. The system was not particularly tuned, but the maximum transmission unit (MTU) was set to 9000 to allow for Jumbo Ethernet packets and the maximum socket receive buffer was set to its maximum of 2 GB. On the X310 side, an important parameter to configure is the rxBursts-CE RFNoC spp (samples per packet) value. This was set to 7800, close to the maximum value which is 8000. Higher spp values result in a higher throughput that can be delivered by the CE. Other Ethernet connections were disconnected while running the test to avoid interrupts from the network. Table III presents the benchmark results for five different acquisition sequences of different duration, varying from 10 to 30 min. Along with the acquisition duration, we show in the table the total amount of the received data, the number of received and lost packets, and the number of overrun errors. An overrun error is sent by the X310 to the terminal when the USRP is providing more data than the PC can handle. Overruns indicate that samples are likely to be dropped between the X310 and the PC. Looking at the Table III, we can see that packet drops were only observed during the second recording session where a total of 80 packets are lost or corrupted out of 2 040 000 received packets. Fig. 8(b) shows the flushing of the received data from RAM to SSD storage during acquisition. Over the two minutes of observation, we observe that the flushing is not deterministic, but there is a pattern of peak flushing every 11 s where an amount of data up to 1.5 GB is written to SSD. Since the whole process is controlled by the OS, packets drop and corruption are to be expected if other processes are running during acquisition.  Overall, from the benchmarking presented in Table III, it can be concluded that long duration acquisition can be successfully performed in a sustainable way.

VII. TOA RANGING EXPERIMENTS
The platform has been tested and validated through indoor ToA ranging experiments in a corridor located at the TU Delft campus. More experimental results using this testbed can be found in [37] and [38]. In [37], a ranging system based on the sparse selection of narrow signal bands is presented and validated using the testbed. This work aims at increasing the bandwidth occupancy efficiency using a CRLB constrained convex optimization to select the frequency bands. In [38], a positioning system using carrier phase measurements is validated using the testbed.
The setup of the current experiment is shown in Figs. 9 and 10(a) and consists of two X310 USRPs and a timeand-frequency reference source and distributor. The distributor The antennas are commercially available ceiling mountable Taoglas wideband antennas (bandwidth: 700 MHz-6 GHz, gain 3 dBi). The TX antenna points to the floor and is set up higher than the RX antenna, which points upward to the ceiling. The carrier frequency f c = 3.96 GHz and the EIRP transmit power is about 7 dBm.
The measurement geometry is presented in Fig. 10(b). The corridor has a width of 2 m and a height of 3 m. The TX location is given in black. The receive antenna RX is moved along 32 measurements points in the corridor (indicated in blue and orange), from short range close to the transmitter, up to a distance of 23 m. A land-surveying Total Station is used to establish the ground truth distances at mm-level accuracy.
The positions of the first cluster of measurement points, close to the transmitters, are separated by less than 5 cm. A second series of measurement points, separated by about 2 m, are taken up to a distance of 23 m with respect to the TX antenna. From each measurement point the following information is extracted: signal attenuation, CIR, and the estimated distance based on ToA.

A. Channel Impairments and Multipath Components
Initial results from an RF cable-based 160-MHz artificial channel have shown that the system can achieve cm-level accuracy, using ToA and TDoA range estimates [39]. In the latter case, the channel is a single path channel without reflections and the SNR was as high as 50 dB. As indicated in Table IV, centimeter and millimeter level accuracy were, respectively, obtained in ToA-and TDoA-based ranging.
In indoor ranging the situation is less ideal because signal attenuation, impairments due to the antennas, reflections against walls and obstacles which cause multipath propagation, significantly degrade the ranging performance, even in a LoS situation. Fig. 11(a) shows the propagation loss as a function of distance. The two sets of measurement locations close (about 2 m) and far (>2 m) from the transmitter can be observed. An attenuation of about 20 dB is observed from the reference location [ Fig. 10(b), in red, 1.8 m from the TX] to the furthest measurement location. Over a 23 m distance, less attenuation is observed compared to the free space path loss. This is caused by the waveguide effect due to the propagation of the signal in the corridor. Fig. 11(b) and (c) shows extracted CIR at the rx 1 and rx 2 positions [see Fig 10(b)]. In Fig. 11(b), we can distinguish the LoS component (first peak) from the delayed multipath components (MPCs). In Fig. 11(c), at 19 m from the transmitter, the response shows two strong peaks. The first peak is due to the LoS component followed by a cluster of MPCs. The second strongest peak occurs 40 ns later and is followed by a second cluster of MPCs. Here we can also see the clear advantage of having a large signal bandwidth, which allows for separation of MPCs with small delay differences. Compared to a standard bandwidth, such as 10 MHz, which allows for a 100-ns resolution only. the LoS component would not be distinguishable from the following close-in reflections. Fig. 11(b) and (c) shows also two CIRs obtained by a synchronous and asynchronous receiver (in black and in dashed blue, respectively). The measured CIRs for both cases are very similar but CIRs measured by the asynchronous receiver have been realigned in time to facilitate easier comparison. Fig. 11(d) shows the CIR between the TX and the receiver located at a RX2. For the same receiver location, the dashed blue curve shows a CIR in which the LoS component is not the strongest one, where the transmit antenna was placed at TXb.

B. Ranging Techniques
From the transmit time and the estimated ToA of the received signals, the propagation delay between TX and RX is estimated, and used to compute an estimate of the distance between the TX and RX antenna withd = c ·τ . The ranging performance can now be assessed by comparing the range estimates to the ground truth values measured by the Total Station. In the following, we present the signal processing to obtain the ToA.
1) ToA-Based Ranging Using Cross Correlation: From the transmitted and received sequences, s 0 [k] and r [k], respectively, we extract the discrete time CIR using the following equation: where The sample index of the maximum of the magnitude of the cross correlation is used to determine the range estimate. In addition, we apply an oversampling (zero-padding) factor of F ovs = 100 to achieve a 50-ps bin resolution. Since the LoS or cross correlation peak is usually located in between two samples, oversampling allows to obtain the ToA estimate closer to its actual value. At Here, l is the sample index of the maximum, T s = 5 ns is the sampling period and N is the number of samples in the CIR, and * , used as a superscript, indicates the conjugate operator. The range estimate d, in meter, is now given bŷ where c = 3 · 10 8 m/s is the speed of light and the " +1" compensates for the d ref = where with and the estimated range d is found with (8).
3) ToA-Based Ranging Using Peak Search: Finally, we use a peak search to compute the range estimate. Since the first peak is not necessarily the highest peak, the first peak of each l m − l ref (13) and the estimated range d is found with (8).

C. Ranging Results
To assess the ranging accuracy in the corridor, the ToA-based range estimates are compared to the ground truth values as measured by the Total Station. Quantification of the accuracy is performed using the average error which characterizes the mean offset between the estimates and the ground truth values, which asymptotically will correspond to the range bias. Moreover, the spread in the estimates is characterized by the standard deviation of the error as well as by the average of the error magnitude.
In Fig. 12(a), the range estimates using cross correlation (in blue) and the ground truth values (in black) are plotted. A good agreement between the two curves is observed. Fig. 12(b) shows the range error ε, between the estimated and the ground truth distances. In most cases, the ranging error is at the decimeter level. The outlier of 2 m, shown in the histogram [ Fig. 12(c)] corresponds to the position 22. The first set of 17 measured positions are at a distance less than 2 m from the transmitter TX, while the second set of 15 measurements is taken at distances from 2 to 23 m.
Taken over all 32 locations, the average of the error magnitude is 36.3 cm and the standard deviation is 53.9 cm (see Table V). For the first set of positions, the average of the error magnitude is 21.8 cm and the standard deviation is 17.8 cm.
The second set of positions shows the average error magnitude is 52.7 cm and the standard deviation is 66.3 cm.   The ranging performance is better for the positions at a short distance from the TX. Ranging accuracy is mainly related to the channel behavior at the measurement points. Near the transmit antenna the extracted CIRs are close to the reference CIR h ref [k ] and the LoS path is much stronger than reflected paths, while further away from the transmitter, ranging is less immune to close-in MPCs arriving at the receiver within a few sampling periods.

D. Comparison of Cross Correlation, LS-Fitting, and Peak-Search
The range error statistics for cross correlation, LS-fitting and peak-search are shown in Tables V-VII, respectively. Comparing these results shows that the differences between the different ToA estimation techniques is small and they all achieve decimeter level accuracy, but the peak-search algorithm shows a lower standard deviation than cross correlation and LS-fitting, which show about equal performance. The peak-search algorithm, however, is sensitive to the selected

E. Effect of the Ranging Bandwidth
Finally, the effect of bandwidth on the ranging accuracy is investigated. From the initial 160-MHz channel frequency responses, we extract limited bandwidth baseband channel responses and perform ranging using the cross correlation technique as previously presented. The following bandwidths are compared: 160, 80, 40, 20, and 10 MHz. The results are shown in Table VIII.
With decreasing bandwidth from the standard deviation increases by almost a factor of 8 (from 53.9 to 473.8 cm), as predicted by the CRLB (see Section II-B) while the average of the error magnitude increases by a factor of 7 (from 36.3 to 284.2 cm).

VIII. CONCLUSION
This article presents a testbed for wideband radio signal acquisition, for validation and demonstration of high accuracy ranging and positioning. For this platform, the Ettus USRP X310 was used, which can either transmit or receive with a bandwidth of 160 MHz, simultaneously on two channels. By associating these two channels, a contiguous bandwidth of 320 MHz, or a much larger virtual bandwidth can be achieved, at a total sampling rate of 400 MSPS. At the receiver side, we particularly target data acquisition of dual channel ranging signals for further offline processing. The receiving device will record and forward samples to a connected host-PC. However, the very high sample rate results in a very high throughput, hardly sustainable even for high-end computers, a high probability of packet loss due to nondeterministic OS scheduling, and a huge data file size, rendering long duration experiments very unpractical.
To overcome this, we propose a burst mode transmission and acquisition platform. The USRP hardware units and developments that meet the acquisition requirements are discussed. The proposed platform is fully implemented and allows for accurate time-deterministic experiments that can be run for longer duration (more than 1 h, depending on the available memory). The system can be used to perform positioning experiments as several USRPs can operate synchronously. The platform also supports asynchronous operation where all transmitters are synchronized, but the receiver runs on an independent clock, like a regular mobile user.
The synchronization performance in noncoherent mode has been benchmarked as well as reliable data acquisition and data transfer in dual channel operation. Indoor ranging experiments, based on ToA, were performed using a PRN sequence with a bandwidth of 160 MHz as a ranging signal. The ranging accuracy was assessed by comparing the range estimates for three detection schemes: cross correlation, LS-fitting, and peak-search, with the ground truth values obtained by a land-surveying Total Station. The results show a decimeter level accuracy, where the peak-search algorithm (with a range error standard deviation of σ ε = 39 cm) outperformed cross correlation and LS-fitting.
Thanks to the wide bandwidth and related high time resolution, the testbed can be used to perform research on ranging and positioning in dense multipath environments. Accurate time-frequency references are needed for synchronization. The platform, in this article, has been developed within the scope of the SuperGPS project and uses WR timing nodes connected to an optical network to provide and distribute a time-frequency reference with 100-ps level uncertainty.
With accurate time-frequency references available, signal designs and ranging algorithms toward centimeter or even millimeter-level accuracy can be assessed. Such techniques could rely on multiband signals. The platform is also suitable to perform CIR measurements at multiple signal bands simultaneously. With small changes in the developed hardware, this setup can also support two-way ranging.