Neutron-Irradiation Testing of FPGA-Embedded Hadron Fluence Sensors

Hadron fluence monitors based on static random access memories (SRAMs) are being used at CERN and have been proposed for proton therapy facilities. Some of the limitations of the state of the art are related to the usage of separate components for sensing upsets and for reading them out, as these increase the power consumption, the board complexity, and its size. Moreover, in some cases, due to radiation-tolerance requirements, the readout logic (ROL) is fixed, and it cannot be updated once the system has been implemented. In this work, we show how to overcome the mentioned limitations by using an SRAM-based field programmable gate array (FPGA) for implementing both the sensitive element [the configuration SRAM (CRAM)] and the ROL (the firmware in the fabric). In fact, we describe the implementation of a compact, reprogrammable, low-power, actively self-reading hadron fluence sensor realized by means of a Xilinx Artix-7 FPGA. Moreover, we present the customized radiation-hardening-by-design (RHBD) techniques adopted for the ROL. We irradiated two sensor prototypes at the Jožef Stefan Institute ’ s TRIGA Mark II research reactor, under different neutron spectra and flux conditions. We discuss our results, which include the measurements of the radiation tolerance, the single event upset (SEU) cross Section of the CRAM, the sensitivity to thermal neutrons, and the failure cross Section of the ROL.

count (N upsets ), through the equation φ = N upsets σ N bits (1) where σ is the device cross section per bit and N bits is the memory size in bits. The cross section depends on the device, technology, power supply voltage, type of hadrons, and their kinetic energy and must be measured by means of irradiation tests [1]. SRAM-based hadron fluence sensors are in use at CERN [2], [3], [4] and have been proposed for proton therapy applications [5].
In order to read the response from SRAMs, some radiationtolerant readout electronics is needed. Over the last few years, flash-based field programmable gate arrays (FPGAs) have been considered for SRAM readout at CERN [6] and actually used in proton therapy [7]. A shortcoming of state-of-theart solutions is that they require separate components for sensing upsets and for reading them out, increasing the power consumption, the board complexity, and its size. Moreover, if high total ionizing dose (TID) tolerances are needed, flashbased FPGAs cannot be used [8].
SRAM-based FPGAs [9] are programmable logic devices used for real-time data processing. The functionality of the device is determined by the content of a configuration SRAM (CRAM). The CRAM size can reach 100 Mb and can be accessed by the programmable logic through dedicated ports. Many devices have a TID tolerance over a few kGy [10], and they are hardened at transistor level against radiationinduced single-event latch-ups. However, upsets in the CRAM [11] may alter the programed elements, including routing, thus disrupting the operation of the logic implemented in the fabric [12].
The contribution of this work to the state of the art is threefold. First, we show an implementation of a novel thermal neutron and high-energy (>20 MeV) hadron fluence sensor based on an SRAM-based FPGA used both as the sensitive element (the CRAM) and as the readout logic (the programed fabric). Second, we describe the radiation-hardening-by-design (RHBD) techniques we used for protecting the readout logic (ROL) at firmware level, including architectural, place-androute, configuration aspects, and at printed circuit board (PCB) level. Finally, we present the radiation-tolerance results of the sensor board (SB) and its firmware after neutron irradiation at a research reactor.
The rest of this article is organized as follows. In Section II, we describe the fluence SB with details about the used components. In Section III, we present the ROL architecture  and implementation. In Section IV, we show the sensor radiation-tolerance test results from irradiation at a nuclear reactor. In Section V, we discuss our findings and compare them to results from the peer-reviewed literature. In Section VI, we draw our conclusions.

II. FLUENCE SB
The SB we designed [ Fig. 1 (top)] is based on a Xilinx Artix-7 200T FPGA and has the following features: 1) single integrated circuit used as sensor and readout; 2) reprogrammable ROL and digital serial interface to back-end systems; 3) compact (6.5 × 6.3 cm) PCB; 4) low power consumption (≈0.7 W); 5) usage of commercial off-the-shelf (COTS) components only. The PCB size has been kept to a minimum to use the sensor also in space-constrained locations. In order to enhance the overall radiation tolerance, we minimized the usage of active components, which include only the FPGA, a 100-MHz crystal oscillator (Si Time SIT8008), a differential receiver (Analog Devices ADN4691), 1:3 clock buffers (Texas Instruments LMK1C1103), and analog devices low dropout (LDO) voltage regulators. One LT3070 LDO uses a 1.5-V input to generate the 0.95-V regulated power supply for the FPGA, while two LT1963 use a 3.8-V input to generate, respectively, the 1.8-and 3.3-V regulated voltages for the FPGA and the rest of the active components. It is also possible to skip the on-board regulators and provide the voltages directly via a dedicated connector supporting a four-wire scheme for sensing the voltage at the SB. This feature makes it possible to perform voltage scans during cross-section characterization tests and to operate the board even in harsh radiation environments, where the LDOs might fail. However, direct powering requires a remote sensing low-voltage power supply and more complex cabling with respect to powering via regulators. All input-output (IO) signals to the FPGA are routed over tripled PCB traces, and, except clocks, they are single-ended and unbuffered in order to minimize the active components count and maximize the overall board reliability [ Fig. 1 (bottom)].
We have chosen the Artix-7 FPGA family for our sensor, since it offers a good compromise between price and configuration memory size, and from published TID gamma ray tests, it is known that devices of this family operate up to 5.5 kGy without functional issues or hard failures [13]. The configuration memory size of the 200T is 77.8 Mb arranged in 24 060 frames (18 300 for the fabric + 5760 for block RAMs), each containing 3232 bits.
As far as it concerns the LT1963 and LT3070 regulators, other works [14] report their radiation tolerance, respectively, to be up to 3.4-and 2.0-kGy gamma ray TID and 7.4 and 8.0 × 10 12 -n eq cm −2 1-MeV-equivalent-neutron fluence. The LT1963 is also reported to tolerate up to 1-kGy TID in [15]. We did not have radiation-tolerance information about the other components on the board, which we combinedly tested under neutron irradiation.

III. READOUT LOGIC
The implementation of the ROL required us to tackle the important challenge of the impact of SEUs on the functionality. In fact, in this peculiar application of the FPGA, we faced two contradictory requirements. On the one hand, the CRAM must be as sensitive as possible to SEUs to enhance the operation as sensor. On the other hand, the firmware in the FPGA fabric must be robust against SEUs and transients. In fact, we devised a dedicated system for correcting upsets in the CRAM, i.e., a configuration scrubber, based on redundant configuration and usage of multiple configuration access ports. By redundant configuration, we refer to the fact that the content of the configuration memory locations, i.e., the frames, must be available in several copies [16]. We achieve it by properly copying each programed frame into two unused frames, realizing a tripled configuration, according to the method disclosed in [17].
Our ROL is based on the scrubber described in [18], of which here we briefly summarize the architecture. The system runs at 100 MHz and is triple modular redundant. It is built around the Xilinx picoBlaze-6 soft microprocessor with custom peripherals we devised, including block RAMs and IO logic. The main functionality is to periodically scrub the configuration memory of the FPGA by majority voting redundant configuration frames and keeping unprogrammed frames to their expected state. In order not to interfere with access to the used block RAMs, only fabric frames are readback and scrubbed. It is possible to send commands and receive output from the scrubber via a universal asynchronous receiver/transmitter (UART) or one of the device's boundary scan primitives (BSCAN) accessible via the Joint Test Action Group (JTAG) port. The BSCAN support has been added to enable sensor control over a single signal cable, i.e., the JTAG cable. The architecture includes a number of features to enhance reliability, including for instance periodic resets of the main modules and majority-voting-based scrubbing of RAMs, whose description can be found in the abovementioned paper.
In the new design for this work (Fig. 2), we ported the scrubber to the Artix-7 family, and we added support for tripled PCB traces and for multiple configuration access ports. In fact, the ROL includes output majority voters, which are also tripled to avoid single point of failures. For each majority voter's output triplet (TR0, TR1, and TR2), three dedicated minority voter are used to disable output drivers (OBUFTs) to avoid clashes in case of SEUs. The firmware supports both the JTAG and internal configuration access port (ICAP) to readback and correct the device configuration. As per device specifications [19], the configuration access via ICAP is faster (up to 3200 Mb/s) with respect to JTAG (up to 66 Mb/s), but the ICAP is tied to specific hardware primitives, which cannot be tripled. The access to the JTAG port of the device from the fabric is possible by means of tripled loop back traces on the Fig. 3. Implementation view of the readout logic. Left: layout from Vivado graphical user interface (top) and close-up on blocks (bottom). Right: redundancy generated in the configuration memory. Each green pixel represents a 32 bits × 32 frames cluster in which at least 1 bit is set, i.e., a used cluster; each white pixel represents a cluster with all bits cleared, i.e., an unused cluster. PCB. The self-access to the JTAG port represents a reliable alternative to the ICAP port for recovery situations. In fact, in normal operation, the ROL accesses the configuration via the ICAP, but in case of failure, it switches to the JTAG port on the fly. The abovementioned periodic resets of the ROL have been programed to happen every 300 processed frames, in case at least one frame has been corrected. As soon as the microprocessor resets, the program tries to switch back to ICAP to speed up the scrubbing. Before switching the configuration access port, the program checks its correct operation by attempting to read the device identification code. The ROL can be clocked by the local oscillator on the SB, an external clock, or by a digitally-controlled oscillator (DCO) [20] in the FPGA fabric.
To enhance the impact of the layout of the ROL on the reliability, we constrained the redundant modules to be placed and routed in distinct geometrical areas [ Fig. 3 (left)], leveraging the "pblock" functionality of the Vivado design tool. We then used our custom scripts for generating redundant configuration frames [ Fig. 3 (right)]. The resource utilization of the ROL (Table I) is minimal; in fact, it occupies just 2.5% of the available slices and block RAMs and 7.4% of the fabric configuration frames. The low resource occupation positively impacts the power consumption and the scrubbing period, as fewer programed frames to be protected translate to a lower   Fig. 4). The CRC makes it possible to immediately spot subtle data transmission errors and conveniently discard erroneous output bursts from the ROL in case of failure.

IV. IRRADIATION TEST RESULTS
We devised a setup (Fig. 5) to perform bench testing and irradiation testing of the sensor. The sensor is connected to a custom interface board (IB) over standard CAT7 S/FTP cables and is powered by a Keysight N6705A power analyzer. The IB feeds a 100-MHz clock signal to the sensor and provides access to its UART and JTAG ports over Ethernet. We prepared the cabling for powering the sensor either directly or via the on-board LDO regulators, and we logged the power consumption of the board in either case. A dedicated data acquisition personal computer (DAQPC) communicates via Ethernet with the power analyzer and the IB to perform FPGA configuration/readback, sensor control, power management, and measurement. A portable oscilloscope (PicoScope) was 1 Specifically, we used the CRC16 based on the x 16 used to log the output voltages of the LDO regulators during irradiation. Cables lengths make it possible to place all the instrumentation at up to 10 m from the sensor, in order to be out of the radiation field, while a remote terminal can be in a control room, without actual limitations of distance from the instrumentation.
We irradiated two identical fluence sensors prototypes at the TRIGA Mark II reactor [21] of the Jožef Stefan Institute (JSI, Ljubljana, Slovenia). The reactor is optimized for training in reactor operation and technology, research with neutrons, and isotope production. It is water-cooled and can operate at powers up to 250 kW. There are several irradiation channels offering different combinations of thermal (<0.625 eV), epithermal (0.625 − 10 5 eV), and fast (>10 5 eV) neutron fluxes, and there is also a dry room for irradiating larger samples. For the irradiation tests presented in this work, we used the dry chamber and the TOK2 triangular channel, and the pertaining fluxes are reported in Table II. The dry chamber also includes a fission plate, which can be installed to increase the fast neutron component. To estimate the neutron fluences during our tests, we used the data provided by the facility [22], [23].
A typical test run consisted of the following steps: 1) power on the SB and begin current logging of all power supply output channels; 2) configure the sensor FPGA with ROL firmware and enable readout; 3) wait for an ROL failure to occur or 1 h to elapse, and, meanwhile, log upset details; 4) readback the sensor configuration and verify against the initial configuration; 5) power off the SB.
During step 3), the FPGA configuration is also periodically refreshed by means of partial reconfiguration from TCL scripts running on the DAQPC, in such a way to realize a hybrid, i.e., internal and external, scrubbing. This ensures that, even in case of failures related to upsets non-recoverable by the ROL, the operation can be resumed by means of the external scrubbing. The external scrubbing concerns only the programed frames, less than 0.5% of the total, and takes nearly 3 s. The external refresh is performed every fifth ROL scrub cycle. Upset details  6. CRAM upsets detected by means of hybrid scrubbing versus thermal neutron fluence in a typical irradiation run. Blue experimental points represent raw data, while orange points represent data filtered by rejecting events with a number of bitflips per frame beyond five. The best fit line of the filtered data is dashed. are logged both from the UART and from the TCL scripts, and then combined off-line during data analysis.
Since shutting down and restarting the reactor require a few minutes, we decided to keep the reactor always on, even between runs, to optimize the usage of the assigned irradiation time.

A. Dry Chamber Tests
A first sensor has been irradiated in the dry chamber with total (thermal, epithermal, and fast) neutron fluxes ranging between 1.3 and 2.1 × 10 8 cm −2 s −1 for a total fluence of 1.0 × 10 13 cm −2 . We partitioned the irradiation in 38 runs for a total irradiation time of 29 h and 43 min. With exception of a few initial calibration runs, we operated the reactor at full power. For each run, we combined the upset logs and neutron flux measurements to plot the CRAM upsets versus the neutron fluence, as, for example, shown in Fig. 6. In some cases, we measured some false readings from the ROL, where the upset count quickly increases due to a big number of fake SEUs with multiple bitflips in the same configuration frame, typically 6 and beyond. This behavior might be explained by some form of single event functional interrupts (SEFIs), due to the ROL or the configuration access ports. The SEFIs might generate incorrect readings, which are aliased as bitflips. These events have been removed by the actual upset count off-line by imposing a cut on the number of bitflips per frame. The cut restores the expected linear behavior of the upset trend.
We determined the optimal value for the cut by examining the distribution of the number of bitflips per frame. In fact, from Fig. 7, it is apparent how after five bitflips per frame the counts oscillate, instead of monotonically decrease as one would expect. It is worth noticing that the cutting at five bitflips per frame retains 98.1% of the events, so there is a negligible loss in detection efficiency. From a computational standpoint, the cut procedure is very simple, and it will be implemented online in the ROL itself for future tests and applications.  Table III summarizes the details of the run conditions and the overall results. We performed our runs with and without the fission plate to test the sensor with different fractions of thermal and fast neutron fluxes. In fact, these differential measurements made it possible to note that the CRAM upset count (N upsets ) scales approximately with the thermal neutron fluence (φ); i.e., σ N bits does not change significantly between the two conditions. Moreover, the SEU rate and the thermal neutron flux decrease, respectively, by 4% and 6%, when setting the fission plate on. On the other hand, the SEU rate does not seem to correlate significantly with the fast and epithermal fluxes, which increase, respectively, by a factor of 5.6 and 1.2.
Considering the results averaged on all runs, under the simplifying hypothesis that configuration upsets are related to thermal neutrons only, σ N bits was measured to be 6.9 × 10 −8 cm 2 and the sensitivity to be 1.4 × 10 7 n × cm −2 .
The failure cross section of our ROL was measured to be 5.1×10 −12 cm 2 corresponding to a mean number of detectable upsets before failure of 1.4 × 10 4 (thermal neutron fluence of 2.0 × 10 11 cm −2 ). We tested the sensor by clocking the ROL from the IB and by means of the internal DCO, but the number of failure events was not sufficient to assess differences in reliability between the clocking modes.
We performed a failure mode analysis on the 27 unrecoverable soft failure events of the ROL. In these failures, not even the external scrubbing from the DAQPC could restore the correct operation. We arranged the events in four modes according to the reason for failure, determined by analyzing logs of the ROL UART and of the TCL scripts running on the DAQPC (Fig. 8). Most of the failures (52%) were related to some form of SEFI in the configuration access ports logic, which caused data to be readback incorrectly. In some events Fig. 9. Bitmap of the FPGA configuration memory with superimposition of upsets measured during a typical irradiation run. Specifically, in the shown run, we measured 1.4 × 10 4 upsets for a 2.2 × 10 11 cm −2 thermal neutron fluence. Each red pixel represents a cluster of 32 bits × 32 frames in which at least an upset has been detected. Some rectangular areas, marked in gray, are masked in readback, and they do not contribute to upset detection.
(26%), the ROL failed while still driving the JTAG pins via the loopback, which made readback and scrubbing from the DAQPC impossible. Other failures included the ROL to become unresponsive to commands via the UART (7%) and a variety of other failures (15%). However, it is important to report that, for all the soft failures, the sensor operation has always been recoverable by means of a power cycle.
No failures of the active components have been observed, except for the SIT8008 oscillator chip, which permanently failed at a total neutron fluence of 1.0×10 11 cm −2 . This failure did not impact our tests, as the SB firmware supports multiple clock sources. After the oscillator chip failure, we switched to clocking from the IB.
By means of the details provided by the ROL, we built a bitmap of the upsets of the FPGA configuration ( Fig. 9) for each run. We observed a uniform distribution of upsets over the CRAM, with the exception of five rectangular areas, which are masked for readback by the device. The CRAM cells pertaining to these areas do not contribute to upset detection, and they lower the effective N bits by nearly 16% (9.5 Mb). By taking into account this effect, we can estimate the thermal neutron cross section per bit to be 1.2 × 10 −15 cm 2 b −1 , which is in good agreement with the results from [24]. 2 During the irradiation, the voltage output from the LDO regulators remained constant.

B. TOK2 Tests
For the second sensor, we performed a high-flux test in the TOK2 channel. The TOK2 channel section is an isosceles triangle, with 78-mm sides and 83-mm base. Due to the small size of the channel, we had to run cables to the sensor only from the JTAG/UART side, so for this test, we could only power the sensor through the on-board regulators without logging the output voltages. We powered the reactor at 463 W corresponding to a flux of 6.6 × 10 9 n eq cm −2 s −1 on the sensor, tuned to have an upset rate of nearly 300 SEU/s, i.e., nearly 50 times higher than what had been measured in the dry chamber. We performed the run in "accumulation" mode, where the readout logic was disabled, i.e., not clocked, and we continuously readback upsets in the FPGA, without correcting them, over JTAG via the Xilinx iMPACT tool. We could readback the FPGA up to a fluence of 3.4 × 10 12 n eq cm −2 . Beyond this fluence, we observed erroneous readings of the device temperature and internal voltages (VCCINT and VCCAUX) via the iMPACT tool (Fig. 10). However, even before failure, we observed that the upset count was incorrect on some reads (nearly at 1.0 and 1.3 × 10 12 n eq cm −2 ), as, in fact, the trend was not monotonic. After the failure, we tried to power cycle the sensor to perform a new run, but while the temperature reading went back to a correct value, the internal voltages did not. Fourteen months after irradiation, we measured the regulators output voltages and noticed they were all shifted to higher voltages with respect to a reference, i.e., not irradiated, sensor (Table IV). By powering the sensor via the direct input connector, we verified That the FPGA was fully functional and able to run the ROL firmware correctly.

V. DISCUSSION
We designed our sensor to meet typical requirements for usage at high-energy physics experiments, and our results show the tolerated neutron fluence to be higher than 10 12 n eq cm −2 . As a frame of reference, the requirement for on-detector electronics at the Belle II experiment is 10 11 n eq cm −2 per year [25], which for our sensor would translate to more than ten years of operation. Moreover, with respect to the SRAM-based monitors described in [2] and [7], our sensor is more compact being based on a single integrated circuit for sensing and readout. Nevertheless, our solution provides a higher TID tolerance (2 kGy versus 250 Gy [6]) because of the usage of an SRAM-based rather than a flashbased FPGA. For the same reason, our ROL may fail due to SEUs in the FPGA configuration, which instead is not an issue in [6] and [7]. However, the choice of components, the hardening of the PCB, and the hardening of the firmware lowered the failure rate of our logic to be tolerable. The fake SEU burst issues we have encountered are a common drawback among SRAM-based hadron sensors, and in fact, similar issues are also reported by Danzeca et al. [2] and Ytre-Hauge et al. [7]. The σ N bits for thermal neutrons for our sensors is, respectively, nearly 18.7 and 3.5 times higher than the same figure reported by Danzeca et al. [2] (3.7×10 −9 cm 2 ) and by Ytre-Hauge et al. [7] (2.0 × 10 −8 cm 2 ). This is partly due to the larger memory capacity of the device we used (65.4 versus 8 and 16 Mb) and partly due to its higher sensitivity to thermal neutrons.
It is worth mentioning that many other solutions for measuring hadron fluences exist. Among the others, these include chemical vapor deposition diamonds [26] and 6 LiF thermoluminescence detectors (TLDs) [27]. Differently from our sensor, these are also sensitive to photons, which is undesirable when the goal is to selectively measure hadrons. Moreover, TLDs are passive devices, require to be read with dedicated instrumentation after irradiation, and are unsuitable for online readout. Gas-filled detectors, such as 3 He [28] or BF 3 [29], can be employed for neutrons, but due to the gas volume and moderating materials, they are normally significantly larger than our solution.
Due to their operating principle, SRAM-based sensors, in general, and our sensor, in particular, are very well-suited for radiation monitoring aimed at protection of digital electronics, where SEUs are often a concern.
VI. CONCLUSION The discussed RHBD techniques adopted for the SB, including triple modular redundancy, redundant clocks, inputs and outputs, PCB trace redundancy, reliability-driven placement and routing constraints, and redundant configuration, made it possible to implement a very robust system against SEUs in the CRAM. These techniques are indeed of general interest for the implementation of FPGA-based instrumentation in radiation areas.
The ROL failure cross section was measured to be 5.1 × 10 −12 cm 2 , translating to 1.4 × 10 4 detected upsets before failure on average. All the ROL failures observed were recoverable by means of a power cycle. The tests in different irradiation channels showed that the SB can be operated at the fluences of 10 12 n eq cm −2 and beyond. We spotted some fake count issues, which hinder the response of the sensor, but we also provided a solution to mitigate them. The clock oscillator on the SB failed; however, this did not compromise the functionality of the sensor because of the redundant clocking scheme we had foreseen.
The promising results suggest that our sensor has potential applications for radiation monitoring at high energy physics experiments, at particle accelerators, and at proton therapy irradiation facilities.