Real-Time Classification of Radiation Pulses With Piled-Up Recovery Using an FPGA-Based Artificial Neural Network

Artificial neural networks (ANNs) have shown several benefits over the traditional classification methods for radiation detector data, such as greater accuracy and the ability to classify neutron-photon combinations from piled-up events. These capabilities are of particular interest in applications involving intense radiation environments where large instantaneous detector count rates can lead to many piled-up events and subsequent information loss. The recovery of individual radiation detector pulses from piled-up data can improve the efficiency of classification systems, making them more attractive for field applications. This work extends the use of ANN systems with piled-up recovery to the real-time domain, with a focus on the hardware implementation. The ANN system is implemented on a Virtex-5 XC5VSX95T FPGA which collects pulse data at 250MHz and then processes and classifies pulses in the pipelined ANN at lower frequencies. The system’s performance is demonstrated by classifying pulses in real-time in a variety of scenarios including passive background, passive plutonium-beryllium (PuBe), and active PuBe. The results show that the system can provide accurate classifications in real-time while displaying the results clearly to the user. The system is shown to be capable of classifying pulses at a maximum rate of $1.11\times 10 ^{6}$ pulses per second, with a maximum latency of $7.7~\mu \text{s}$ , and an overall accuracy of 98.2%.


I. INTRODUCTION
The detection and characterization of special nuclear materials (SNM) is an important technical challenge in nuclear security and nonproliferation that requires high-accuracy measurements as well as high-speed data processing capabilities [1], [2], [3], [4]. These measurements are difficult to achieve due to a combination of high count rates, low signal-to-noise ratios (SNR), and materials being heavily shielded [1]. Applications where these difficulties are common include spent-fuel measurement, repository acceptance, nuclear waste measurement, and active interrogation [5].
The associate editor coordinating the review of this manuscript and approving it for publication was Muhammad Sharif .
Although each of these scenarios presents its own set of difficulties, a common goal for all of them is accurate realtime classifications. While achieving this goal has been hindered by the use of traditional classification methods and CPU-based processors, recent work has helped to overcome these challenges and led to the innovation of new detection and classification methods.
A common challenge for conventional classifiers in these applications is that of dual particle sensitivity in environments where one radiation type dominates the other [6]. The traditional method of classifying pulses known as charge integration (CI) must discard piled-up pulses and rely on a particle discrimination line for pulse shape discrimination (PSD) of single neutrons and photons [7], [8]. CI takes the ratio of the tail integral and total integral of pulses to determine their classification, as shown in Fig. 1. Piled-up pulses cause the ratio of the tail integral to total integral to increase which can lead to misclassifications. To overcome the limitations of the CI method, researchers have developed and demonstrated systems implementing neural networks (NN) and machine learning (ML) techniques for pulse classification [9], [20], [21]. These techniques have been shown to accurately classify single neutrons and photons. Additionally, [10], [15], and [19] have demonstrated piled-up recovery using these methods. While ML techniques present their own challenges with topology selection and training, the benefits of increased performance outweigh these costs. In this work, an artificial neural network (ANN) system is implemented on a Virtex-5 FPGA. Our contribution demonstrates an ANN system with pulse classification and piled-up recovery operating in real-time. FPGAs can perform data collection and classifications in real-time through programming that is exclusive to this task, so there are no wasted resources. Previous NN systems capable of piled-up recovery collected data in real-time but performed classifications later using various programming languages [10], [15], [19]. Similarly, previous systems implemented on FPGAs were capable of classifying pulses in real-time but could not perform pile-up recovery using neural networks [20], [21], [22]. Reference [23] showed a method of reconstructing pulses from piled-ups on an FPGA by using a digital method of dynamic integration, but it did not provide further classification for the recovered pulses. The work described here provides an all-in-one solution for real-time pulse classification with piled-up recovery.
This paper begins with a summary of the methods being used by state-of-the-art classifiers. This is followed by an overview of the ANN system being used for this work. Next, the implementation of the algorithms into hardware is described in the digital architecture section. Some of the difficulties that were encountered with the hardware implementation are subsequently discussed along with the optimizations and solutions that were used to overcome them. The next section focuses on the resources used for the ANN hardware implementation. A comparison is then made to the performance of other FPGA-based neutron/gamma classifiers. The system is then experimentally tested to demonstrate real-time functionality. We conclude with a discussion on some of the limitations of this design and possible future work.

II. STATE-OF-THE-ART CLASSIFIERS DEVELOPMENT
Research into PSD has been ongoing for many years, and thus many algorithms have been developed and are still in use today [17]. While CI is the most common method, other methods include the risetime algorithm and frequency gradient analysis (FGA). The risetime algorithm relies on neutrons' longer risetime than gammas for identification, while the FGA compares signals in the frequency domain where the frequency of neutrons drops faster than gammas [7], [17]. The FGA has the added benefit of being somewhat insensitive to noise [17]. Further development of new methods has correlated with the improvement of analog-to-digital converters (ADC) and digital electronics [12]. Methods that capitalize on the performance of modern ADCs include correlation and curve-fitting algorithms [12].
More recently, as ML methods have matured, various neural network systems have been used for classification. Some of the algorithms being used include k-means clustering, gaussian mixture models (GMM), ANNs, convolutional neural networks (CNN), and recurrent neural networks (RNN) [24]. An in-depth discussion of these algorithms is outside the scope of this work, but some of the general differences between them will be discussed. One difference between these algorithms is whether their training is supervised or unsupervised. ANNs, for example, are supervised and therefore require a labeled dataset to accurately train the network [24]. Getting a properly labeled dataset can be difficult, as there are no pure neutron sources [20]. For this reason, some prefer to use unsupervised methods such as k-means clustering [24], [25], [26]. Another difference between algorithms is how input data is processed. For example, RNNs use information from their previous state, which makes them more suitable for sequential data but comes at the cost of added complexity [15], [26].
Many works have now demonstrated how these ML algorithms can be used to get more accurate classifications than conventional algorithms and often in a reduced amount of time [9], [20], [21]. These benefits make these methods attractive for use in the field in real-time applications [27]. This has led to FPGA implementations of these algorithms [20], [21], [22]. Other works have focused on increasing the classification efficiency further by recovering piled-ups [10], [15], [19]. Reference [23] presented a method of reconstructing pulses from piled-ups on an FPGA without ML, but it did not classify the pulses after reconstructing them. The contribution for this work is to implement an ANN capable of piled-up recovery on an FPGA, thus getting the benefits of increased classification efficiency and real-time analysis.

III. ANN APPROACH TO PULSE CLASSIFICATION
The ML algorithm that has been selected for this work's implementation is the ANN. Research has demonstrated that VOLUME 11, 2023 ANNs can outperform traditional methods for the classification of nuclear materials [19]. Furthermore, recent work showed how ANNs could be applied to classifying pulses in photon AI scenarios by recovering information from piled-up events [19]. This section will describe the ANN system being used in this work. The ANN architecture is based on the work presented in [19] and the block diagram is shown in Fig. 2.   2 shows the full data processing architecture and is separated into three main categories: data preparation, neural network processing, and cleanser processing. Data preparation is the first step after collecting a pulse sample. Here each sample is transformed from the ADC values outputted by the digitizer into a format usable by the NNs. This transformation is performed through a series of processing steps where the pulses are checked for thresholds, aligned, and then normalized. Additional features to help with classification are then added to each pulse by calculating their segmented maximums. The NN processing blocks then send the pre-processed pulses and features to the NN layers and activation functions to be classified into the categories shown in the classification result blocks. The ANN system is capable of classifying both single gammas (G) and neutrons (N) as well as combinations of them in various orders (GG, GN, NG, NN) separated by different amounts of time (Close, Split, Cut). There are also classifications for pulses that cannot be recovered (Too Close, Poor SNR, Others). The last type of processing block is the piled-up cleanser, which provides an extra check on the pulses by making sure they can be accurately classified by the subsequent layers. In the case of the piled-up cleanser, it is a threshold check of the second peak in a piled-up scenario.

IV. DIGITAL ARCHITECTURE
The main challenge of an FPGA implementation is determining how to implement the ANN system and its algorithms in hardware. ANNs are one of the most powerful ML algorithms but also one of the most computationally challenging [24].
Making the system suitable for implementation in an FPGA requires careful planning around timing, hardware resources, and dataflow throughout the system. In the next sections, we will describe the hardware implementation of each of the processing blocks.

A. DATA COLLECTION
The first part of the ANN system is the data collection loop (DCL) which is shown in Fig. 3. The DCL receives a continuous stream of data from the I/O at 250MHz and converts each signed 14-bit input to an unsigned 16-bit (U16) value. Each value is then checked to see whether it crosses a minimum trigger; values below the trigger threshold are considered background noise and are not to be saved. Once a sample crosses the trigger value, 50 samples (a 200ns pulse window) are passed to the pre-processing loop (PPL) through the DCL_PPL FIFO. The samples that are passed to the PPL start with the sample that occurred six cycles before the trigger, which is read from a series of shift registers. This approach ensures that the beginning of a pulse is included in the recorded pulse window. The trigger also signals the passing of the baseline sum (BL_Sum) to the PPL. The baseline sum is the sum of the first three values of the pulse and is used for baseline correction of the signal. As the DCL operates at the maximum frequency of the FPGA I/O, it required precise pipelining to meet the timing requirements. This was done by separating it into three smaller sections: collecting data, checking triggers, and writing data to FIFOs. Each of these smaller steps can be completed within a single clock cycle.

B. PRE-PROCESSING
Once a pulse sample has been collected, it is passed to the PPL shown in Fig. 4. When the PPL receives the first sample of a pulse, its first step is to average the baseline sum calculated previously. The 50 samples are then converted from digitized 14-bit values to volts one sample at a time. This is done by first subtracting the baseline from each sample and then inverting the result. The resulting value is then multiplied by 2/2 14 (maximum 2 volts with 14-bits of resolution) to convert to volts. The pre-processed values are then passed to the max threshold loop (MTL) through the PPL_MTL FIFO. The full pre-processing calculation is summarized below in Eq. 1.
78076 VOLUME 11, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply.

D. ALIGNMENT
The ANN system performance is very sensitive to the alignment of the pulses being classified. This is because all the pulses in the ANN training dataset used the same alignment. Fig. 6 shows how misaligned pulses are aligned before performing pulse classification. For this ANN system, a pulse is considered aligned when its 6 th sample is the first sample greater than or equal to 20% of the peak value of the first pulse in a pulse window. To achieve the desired alignment, we duplicate and remove samples on either end of a pulse window to effectively shift the pulse. So, if a pulse is currently aligned at the 5 th sample instead of the 6 th , we duplicate the first element in the pulse window and remove the last element. This shifts the alignment point one sample to the right while minimizing information lost from the original pulse. Pulses can be shifted left in the same manner. The amount the pulse can be shifted is primarily limited by the pipelined nature of the system, as left shifts require an extra clock cycle to be completed. Even without the hardware limitations, there is still a limit to the amount a pulse can be shifted without causing significant information loss of the original signal.

E. NORMALIZATION & FEATURES
After alignment is complete, the last step before the NNs can begin classification is to normalize the data and calculate the pulse features. The two normalizations used in the ANN system are Euclidean and area normalizations. Similarly, the two features used in the ANN system are Euclidean and area segmented maximums. Table 1 summarizes the normalizations and features needed by each NN. The method for computing the Euclidean and area norms is shown in Fig. 7. The first step is to get the sum and sum of squares of the pre-processed values. The next step is to divide each sample by these sums to get the normalized values, but division is slow and takes a lot of resources when implemented on hardware. Instead, we do a single division operation to convert our divisors to factors. These factors are  then multiplied with each element to get the area normalized (ANL) and Euclidean normalized (ENL) values.
Once the samples have been normalized, they are passed to the segmented maximum loop (SML) to find the features as shown in Fig. 8. This is done by determining the maximum values of each 10-sample segment of the normalized datasets. As the segmented maximums are being calculated, each normalized sample is also passed to its next processing block.

F. NEURAL NETWORKS & DECISION LOOPS
The data points are now ready to be input into the NNs. Each NN is composed of one to two hidden layers and activation functions followed by an output layer and activation function. The structure of each NN is summarized in Table 2. Because the general structure of each NN is similar, we will focus on the Classify Top NN depicted in Fig. 9 for the rest of the discussion on the hardware implementation of these NNs. The first step is to input one element at a time into the hidden layer where each element undergoes a multiplication and accumulation (MAC) operation before having a bias added at the end. For the Classify Top NN, the MAC operation for the features is done in parallel to the rest of the normalized samples so that it can complete in 50 cycles and stay pipelined with the rest of the system. The outputs of the two MAC blocks are then summed together before going through the ReLU activation function one element at a time. Each element is then passed into the output layer which performs another MAC operation with the ReLU elements and output layer weights. The index of the maximum value of the NN then determines the classification of the pulse.
The classification is then passed to the NN's decision loop which determines whether a final classification has been reached or if further processing is needed. If a final classification has been reached, such as ''Other'' or ''Close GG'', then the raw data will be passed through DMA FIFOs to the host computer for writing to an appropriate file. If the pulse still needs further classification, such as a ''Single'' or ''Piled-Up'' pulse, then the normalized values and features are passed to the next appropriate NN. This process is repeated until a final classification is reached.

V. OPTIMIZATION OF HARDWARE IMPLEMENTATION
There were many challenges encountered throughout the ANN design process that required consideration and optimization. One challenge was passing the original digitized data throughout the system. While the FPGA needs pre-processed processed pulses for performing classifications, it's important to save the data in its original form. This allows a user to perform further analysis offline without interference from the FPGA's processing. The original digitized values are passed throughout the ANN system but have been excluded from the digital architecture figures to help readability.
Another challenge for the hardware implementation was specifying data sizes. After the pre-processing loop, the values are no longer unsigned 16-bit but rather fixed-point (FXP) values. Throughout the remainder of the ANN system, the FXP word lengths and integer word lengths are adjusted to ensure a balance between resolution and resources used. This process of maintaining high accuracy with the minimum number of bits is known as ''quantization'' in ML and is often used when setting the weights of a neural network [28]. This is a necessary practice due to the limited amount of resources available on the hardware being used. For the weights used in this work, the fixed-point bit widths were chosen to try to maximize the resolution but still allow a range of weights to be provided to the system by a user. Therefore, a user should be able to retrain the ANN system for different setups and recompile the FPGA using new weights without much difficulty.
More significant challenges came from implementing the neural networks themselves on hardware. One challenge was the choice of activation functions. Using log sigmoid or tansig transfer functions like [19] did not seem feasible, as these transfer functions require performing a division and exponential operation on each sample. In hardware, the division and exponential operation take many clock cycles to complete, with the total number of clock cycles being proportional to the number of bits of the input data [29]. This causes a bottleneck in the processing unless the resolution of the data is significantly decreased. Instead, the rectified linear (ReLU) and saturating linear (SatLin) activation functions were used. The operation of the ReLU and SatLin functions are given in Eq. 2 and Eq. 3.
78078 VOLUME 11, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply.
These activation functions take only a single cycle to complete and require very few hardware resources. While the log sigmoid and tansig activation functions are often considered to give more accurate classifications, there did not seem to be any significant decrease in accuracy when using the ReLU and SatLin activation functions.

A. HARDWARE & RESOURCES
The ANN system is implemented on a Virtex-5 XC5VSX95T FPGA housed in a NI FlexRIO device. This FlexRIO device uses an NI 5761 FlexRIO adapter module to sample the output of a trans-stilbene organic scintillation detector at 250MS/s. The detector has a dynamic range of up to two volts and a resolution of 14-bits. The FlexRIO device communicates with an NI PXIe-1071 chassis which serves as the host computer. The system was developed and compiled in LabVIEW FPGA. The LabVIEW FPGA compiler utilizes Xilinx for compilation after generating intermediate files and provides a summary of the resources used on the FPGA. The final resource utilization of the system is listed below in Table 3.

B. SIMULATED PERFORMANCE
The hardware implementation was first tested in simulation to verify its functionality and to determine performance metrics such as maximum pulse count rate, system latency, and classification accuracy. The maximum count rate was determined by inputting pulses at increasing frequencies into the system until the classifications began to fail. Using this method, the maximum count rate was determined to be 1.11 × 10 6 pulses per second. Increasing the count rate beyond this point causes the FIFO between the 250MHz data collection loop and the 40MHz pre-processing loop to overflow and thus data points are lost and classifications begin to fail. Next, the maximum latency of the system was determined by measuring the longest path through the system before a classification was returned. For this ANN system, the longest path was for pulses classified as ''Close'' and it takes 7.7µs to complete. Other classifications can be completed in a shorter amount of time, with the fastest classifications (''Other'' and ''Poor SNR'') only taking 4.66µs to complete. The classification accuracy was then determined by generating a confusion matrix for the ANN system as seen in Fig 10. The confusion matrix shows that the system has an overall accuracy of 98.2% and an averaged accuracy of 99.75% for single gamma/neutron pulses. These accuracy measurements include the effects of the ANN system's quantization. Before quantization, the overall accuracy was 99.3%.

C. COMPARISON TO OTHER FPGA CLASSIFIERS
The performance of recent FPGA-based gamma/neutron classifiers is compared to this work in Table 4. The key takeaway is that this work demonstrates an FPGA implementation with competitive performance while being the only FPGA capable of recovering gamma/neutron classifications from piled-up data in real-time. Reference [21] used NNs for piled-up classification but was limited to classifying piled-up pulses as either ''Close'' or ''Split'' and was unable to provide further classification for these pulses. Reference [22] achieved a high count rate and low latency but used CI which is prone to decreased accuracy at lower energy levels, although the overall accuracy of their implementation was not provided. Reference [23] used reconstructed piled-ups for energy and timing calculations, but did not classify the pulses as gammas or neutrons.

VII. TESTING REAL-TIME DATA PROCESSING ON FPGA
This section is focused on experimentally demonstrating realtime piled-up recovery and pulse classification, as the main performance metrics of the system were already discussed in the previous section. Readers are referred to [19] for more in-depth ANN testing aimed at establishing the baseline performance of the system and comparing it to the CI method. This experiment was performed by continuously collecting data while transitioning between three scenarios: passive background, passive plutonium-beryllium (PuBe), and active PuBe. The first scenario, passive background, refers to measurements where only the naturally occurring background radiation is present. The next scenario is passive PuBe, where a PuBe source is placed near the detector's unshielded side approximately 30cm away. The final scenario is active PuBe, which refers to measuring the same PuBe source but with an active background. To generate an active background, a Varian M9 electron linear accelerator (linac) was used [30]. This linac is capable of producing a very intense and energetic photon environment. The detector is placed 2m off-axis from the linac beamline to minimize additional radioactivity from being induced in the PuBe source due to the bremsstrahlung bombardment. The setup for this experiment is shown in Fig. 11.
The data collection process is set up and monitored from a LabVIEW front panel, as shown in Figs. 12 and 13. Here the user specifies how long the test should run, how large the averaging window should be, how frequently average data points are collected, and various trigger settings related to how pulses should be collected. Once these options have been set and data collection has begun, there are a few ways the user can monitor the data in real-time. The first is numerical counts of the average pulses/second of each classification and the percentages of single pulses, recovered piled-up pulses, and unrecovered piled-up pulses. There is also a pie chart for quick visualizations of the pulse types being classified, and line charts showing how classifications change over time. All these visuals can be easily adjusted or changed by a user thanks to LabVIEW's user-friendly interface.
The example plots shown in Fig. 13 come from the testing of the three scenarios. The first portion of the charts shows the count rates are near zero during passive background. Once PuBe is introduced, the count rates for both the gamma and neutron pulses rise to similar levels. When the linac turns on and the active PuBe scenario begins, we see a large jump in gammas and a small jump in neutrons. The large jump in gammas is expected from the photon beam generated by the linac and the small jump in neutrons comes from the photonuclear reaction in the linac collimation and shielding around the linac. The active PuBe scenario is when the piled-up recovery is used to the greatest effect, as approximately 6.5% of pulses are recovered from piled-up pulses during this scenario. There are still some piled-up pulses that cannot be recovered as can be seen by the red slice on the pie chart. Piled-up pulses are unrecoverable if there are more than two pulses in a pulse window, two pulses are too close together, or the second pulse in a piled-up is too small to be classified.

VIII. CONCLUSION AND FUTURE WORK
This work demonstrated how an ANN system can be implemented on hardware to provide real-time pulse classification with piled-up recovery. Previous works demonstrated that ANNs can provide greater classification accuracy and recover more information than the traditional method of CI for PSD. By now extending the use ANNs capable of piled-up recovery to the real-time domain, this method has been shown to be viable for many real-world applications. This work has competitive performance even when compared to similar FPGA classifiers that do not recover piled-up pulses, with an overall classification accuracy of 98.2%, a maximum latency of 7.7µs, and a maximum pulse count rate of 1.11E6 pulses/second. The 1.1% accuracy loss due to quantization can be reduced in future works with advancements in hardware allowing the use of larger bit widths.
The FPGA implementation tested its real-time classification capabilities by classifying pulses in three experiment scenarios: passive background, passive PuBe, and active PuBe. The system was able to provide accurate count rates during each of these tests in real-time. Furthermore, the real-time classification allows a user to rapidly notice changes to the pulse count rates thanks to the continuously updating graphics.
The applications of real-time processing with ML in nuclear physics will continue to grow as computing and detecting hardware advance along with the field of ML. Future work could include real-time direction detection [31], further classification of the source, supplementing missing data, visualizations, and many other applications [32]. There could also be work done to prevent classification accuracy's dependence on the transmission medium [33]. Additional work could be aimed at increasing the performance of unsupervised learning methods so that there is less reliance on labeled training data [25], [26]. Each of these methods is already being investigated, and performance will continue to improve.