Bone Ablation Depth Estimation From Er:YAG Laser-Generated Acoustic Waves

Using a laser for cutting bones instead of the traditional saws improves a patient’s healing process. Additionally, the laser has the potential to reduce the collateral damage to the surrounding tissue if appropriately applied. This can be achieved by building additional sensing elements besides the laser itself into an endoscope. To this end, we use a microsecond pulsed Erbium-doped Yttrium Aluminium Garnet (Er:YAG) laser to cut bones. During ablation, each pulse emits an acoustic shock wave that is captured by an air-coupled transducer. In our research, we use the data from these acoustic waves to predict the depth of the cut during the ablation process. We use a Neural Network (NN) to estimate the depth, where we use one or multiple consecutive measurements of acoustic waves. The NN outperforms the base-line method that assumes a constant ablation rate with each pulse to predict the depth. The results are evaluated and compared against the ground-truth depth measurements from Optical Coherence Tomography (OCT) images that measure the depth in real-time during the ablation process.


I. INTRODUCTION
Reducing the trauma of a patient during surgical procedures is paramount in improving the post-surgical standard of living. Consequently, minimally invasive alternatives to common interventions are a highly researched topic [1], [2]. One line of inquiry is the replacement of mechanical tools with laser-based ablation [3], [4], [5], [6], [7].
When the tissue is exposed to microsecond pulsed Er:YAG laser light, the water in the tissue heats up, vaporizes, and the expansion causes micro-explosions that ablate a small part of the tissue [8]. This process emits an acoustic wave that is captured by a transducer [9].
The associate editor coordinating the review of this manuscript and approving it for publication was Ahsan Khandoker .
In contrast to classical mechanical cutting tools, laser ablation does not provide direct haptic feedback on the progress of the cut to the surgeon. Furthermore, the laser system occludes the cutting location and impedes visual inspection. With the development of new tools that assist the surgeon in monitoring the depth of the cut, damaging sensitive tissue can be avoided.
A classical method to measure the cutting depth would be using an OCT [10], [11], [12], [13]. However, it can be challenging to integrate an OCT in combination with an Er:YAG laser for minimally invasive surgery. To this end, we propose an acoustic depth measurement technique that uses the acoustic wave created during the ablation process to determine the depth of the cut.
When the location of the ablation source is known, the depth of the cut can be estimated via various approaches. One VOLUME 10, 2022 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ approach is to triangulate the source [14], [15], [16]. However, this approach requires multiple transducers to determine the source location. A different approach uses acoustic waves in a 2D simulation to detect the source position using just two transducers [17]. The main limitation of this approach is that the exact acoustic wave generator (form and the frequency composition of the source) has to be known. In another 2D simulation, the source position and its form were reconstructed in an unknown surrounding [18]. However, many transducers are required for the reconstruction, making this approach unsuitable for minimally invasive surgery. There are several ways to determine the distance between an acoustic source and the transducer. One option is to estimate the time-of-flight (ToF). For example, in [19], the authors used auto-correlation between the transmitted and echoed signal. In [20], they exploited the phase shift between transmitting and measured signals. However, in our application, the source signal shape is unknown, and the signal may even change for different depths. In [21] and [22], the authors showed that the ToF correlates with the signal's distance and decay when ablating tissue with an Er:YAG laser and in [23], they used ToF to estimate the depth of the ablation with a Neodymium-doped Yttrium Aluminium Garnet (Nd:YAG) laser. However, the ToF option has a significant drawback. It requires the distance between the transducer and the bone surface to be constant, and the medium velocity of the ablated tissue must be known. As this can not be guaranteed in our envisioned application of robotic-guided laser osteotomy, we focus on different approaches to estimate the depth of the laser cut.
Since the use of Neural Networks [24] is well established in medical imaging [25], [26], [27], [28], speech, and signal processing [29], [30], [31], [32], we aim to estimate the ablation depth by interpreting the signal from one single air-coupled transducer with a neural network. We compare two approaches: The CA is used as a base-line approach, where we assume that the ablation rate per pulse is constant, and therefore, the depth is proportional to the number of ablation shots. The second approach uses one or multiple consecutive measurements of the acoustic waves during the ablation process. These acoustic waves are then used as input for an NN to predict the depth of the laser cut.
The goal of this work is to analyze the acoustic wave and to prove that there is depth information in the acoustic wave produced during the ablation process of the bone using the Er:YAG laser. In addition, we can show that one transducer is sufficient to measure the depth of the laser cut, simplifying the complexity of the setup in future work.

II. MATERIAL AND SETUP
An Er:YAG laser (Syneron Candela, litetouch LI-FG0001A) with an energy of 153 mJ, a wavelength of 2940 nm, with a repetition rate of 10 Hz, is used for ablating the bone. The ablation process emits an acoustic wave. A CaF 2 mirror diverts a small percentage of the laser beam to a PbSe photodiode (PbSe Fixed Gain Detector, PDA20H, 1500 − 4800 nm), to trigger the acoustic measurement. The wideband transducer, 1 with a frequency range of 100 − 1000 kHz and a resonant frequency at 650 kHz, measures acoustic signals with a sample rate of 7.8125 MHz. It is placed at a distance of approximately 5 cm to the bone surface. The setup is displayed on the left of Fig. 1. We note that due to the limited acquisition rate of our setup, and the high repetition rate of the laser, we can only measure the acoustic wave of every second laser pulse.
A custom-made dichroic filter reflects the wavelength of the Er:YAG laser and transmits the OCT's wavelength, therefore, integrating the Er:YAG laser into the OCT system in a co-axial configuration. Consequently, we can monitor the ablation depth with a long-range Bessel-like beam OCT system [33] in real-time. The OCT has an imaging half-range of 22.21 mm in air and a field-of-view of 4.2 mm. The OCT system uses a swept-source laser (Insight Photonic Solution, Inc., Lafayette, Co, USA), with a scan rate of 104.17 kHz, a central wavelength of 1288.82 nm, and spectral bandwidth of 61.5 nm.
We used 13 cow femur bones as ablation material, bought in a local grocery store. The height of the bone varied between 2.4 cm to 2.9 cm. Muscle, fat, bone marrow, and tendons were carefully removed from the hard bone. To avoid dehydration of the bone, we submerged the samples in water between experiments.
We conducted our experiment by ablating a maximum of nine holes in each bone, each reaching a depth of up to 3.5 mm. The ablation process was stopped when we noticed that the cutting depth stagnated, i.e. if the bone started carbonizing. Bone carbonizes when insufficient water is in the tissue; hence, no micro-explosions remove the tissue, and the laser energy accumulates heating up the tissue. An exemplary bone is shown on the right of Fig. 1.
Since the frame rate of the OCT is 173.6167 Hz and the repetition rate of the laser is 10 Hz, we have a fixed number of OCT frames between the laser pulses. Therefore, we can align the OCT frames to the corresponding acoustic waves emitted during bone ablation. To generate a ground truth depth for each measured ablation wave, we locate the first pulse of the ablation in the OCT image stream. Then we labeled the 2443 OCT images by marking the bone's edge and the cut's end (see Fig. 2). We deduced the depth of the cut using the pixel resolution of the OCT of 10.86 µm.

III. METHOD
To assess the performance of our approaches, we divided our data into three mutually disjoint subgroups: training data, validation data, and testing data. Each bone was only part of one of these groups. Five-fold cross-validation was conducted to demonstrate the performance of the approach.
We investigated two approaches: (1) the first approach CA assumes that the ablation rate of the laser is constant, and 1 PHYSICAL ACOUSTICS WSα SNAK28 6.9.2022, physicalacoustics.com/content/literature/sensors/Model_WSa.pdf The laser beam gets split at mirror 1, and a small percentage of the laser beam is redirected to the photodiode. The photodiode triggers the acquisition of the acoustic wave with the transducer (distance to the bone: 5 cm at a 45 • angle to the ablation). Further, the laser light is diverted with mirror 2 (a custom-made dichroic filter that reflects the wavelength of the Er:YAG laser and transmits the OCT's wavelength) onto the bone for the ablation. At the same time, the OCT measures the depth of the ablation. (Right) an exemplary bone with nine holes after the ablation process. therefore when counting the number of shots, it can predict the depth of the laser cut. The ablation rate is estimated by the median value of the ablation rate of the training and validation data, and its performance is evaluated on the testing data.
(2) In the second approach, we use a NN to predict the depth using the acoustic ablation waves.

A. NEURAL NETWORK (NN)
As the distance between the surface of the bone and the transducer can vary, all approaches using the ToF to estimate the cutting depth are of limited use. Furthermore, we wish to investigate if the depth information of a cut is embedded in other parts of the acoustic signal (excluding the ToF); hence, we removed this information by cropping the signal as follows: First, we identified the maximum absolute value of the first 500 sample points (the time window of 64 µs). Then we used 1.5 times this value as a noise threshold to remove the first part of the signal that only contained noise and no signal from the ablation (see Fig. 3, top left): We removed all sample points before the absolute value of the acoustic wave that first exceeded this threshold (see Fig. 3, top right, red dot) and solely used the window after that point (see Fig. 3, bottom left). In the final step, we normalized the area of interest of the acoustic wave w with: wherew is the mean value and σ w is the standard deviation of the preprocessed acoustic wave w * (see Fig. 3, bottom right).
To reduce over-optimization during training, we applied non-linear scaling of the wave amplitudes as data augmentation: Each sample point of the acoustic wave w was scaled by the formulaŵ with λ a uniformly distributed random scaling factor. As an input for the NN, we tried three different variants; namely, one, five, or ten consecutively measured acoustic signals as the input to the NN 1 , NN 5 , and NN 10 , respectively. The Output of the NNs was a single value corresponding to the depth of the laser cut in [mm] from the latest measurement from the input. We trained the network using the Mean Square Error Loss (MSELoss): is the output of the NN and y = (y 1 , . . . , y N B ) is the label, that we labeled manually using the OCT images (see Section II and Fig. 2), and N B is the batch size.

B. HYPERPARAMETER SEARCH
We use a hyperparameter search [34], [35] approach with five consecutive acoustic waves to find an initial network with the following search constraints, as visualized in Table 1.
The input size, which correspond to the number of sample points, varies between 2000, 3000, . . . , 7000. The numbers of convolutional and fully connected layers vary between 1 and 9. The parameters of the convolution layers can get the following values: Each output channel can allocate a value of 2 n , where n varies between n = 1, . . . , 8,, the kernel size varies between 2 and 5, and the stride is 1 or 2. In addition, the maxpool kernel is 2 or 3 with a stride of 1 or 2. Batch normalization is applied randomly before any layer, and dropout is randomly applied to the fully connected layers with a dropout rate between 0 and 1. The number of neurons of the fully connected layer is min(2 n , out CNN ), where n varies between n = 3, . . . , 11 and out CNN is the number of neurons after the flattened output of the last convolutional layer. We use the Adam optimizer with a learning rate of 10 −n , where n varies between n = 2, . . . , 5. The maximum value λ of (2), varies between 0, 0.1, . . . , 1.

C. IMPLEMENTATION DETAILS
We implemented the network using the PyTorch 2 [36] framework and trained the networks on an NVIDIA Tesla V100 DGXS 16 GB. As a result of the hyperparameter search, a well-performing architecture is shown in Fig. 4, which uses 2000 sample points as the input size, namely a time window of 256 µs, for the NN. It has 5 convolutional layers followed 2 May 2022, pytorch.org

IV. RESULTS
We evaluated the accuracy of our models through five-fold cross-validation, which included splitting the data into five mutually disjoint subsets and omitting one subset during training for unseen forward passes during testing. We assumed a constant ablation rate in the first approach CA. The median error over all the five-fold cross-validation sets was 0.13 mm and the distance B between the 25th percentile and the 75th percentile was 0.163 mm, as shown in Table 2. In Fig. 5, we visualize a box plot that shows the distribution of the deviating distance between the ground truth value and the output. As shown in Table 2, we see that the average box length was 0.271 mm with an average median value of −0.024 mm. At the interval between I = [3.25, 3.5]mm the median value was −0.076 mm with a box length of 0.396 mm. We note that the number of shots was counted from the start of the ablation process to determine the depth of the cut.
In the second approach, we used one, five, or ten consecutively measured acoustic waves as the input for the NN. We trained the network on the training data and used the best-performing network on the validation data to test the performance of the testing data. In Table 2, we give a detailed description of the results. The median error was 0.174 mm, 0.130 mm, and 0.092 mm for NN 1 , NN 5 , and NN 10 , respectively. In the box plots of Fig. 5, we visualize the difference

V. DISCUSSION
We compared two approaches: CA, which assumes a linear estimation of the depth and a NN with either one (NN 1 ), five (NN 5 ), or ten (NN 10 ) consecutively measured acoustic waves as an input. In the box plots in Fig. 5, we observed that CA has fewer outliers than the NNs. The median value is close to 0 for all intervals, and even within the interval I (Distance I ), the median value was close to 0. The NNs had more outliers, and the median value between the distance of the output and the ground truth depth shifts into the negative with increasing depth. Hence, it underestimated the depth of the cut, especially in the interval I, as can be seen in Table 2 and Fig. 5. However, the box length B was smaller for the NN than the CA for the interval I. We further observed that CA outperforms the NN 1 and had a similar performance to NN 5 . The best performing network was NN 10 .
To further investigate the underestimation of the depth, we retrained the Network NN 5 with data that reaches a maximum depth of 2.8 mm (NN 5 s ). We observe in Fig. 5 that both NN 5 and NN 5 s had a significantly larger error in the reported interval I and [2.75, 3]mm, respectively. This is also reflected in Table 2, where the median value at Distance I were −0.332 mm and −0.278 mm, with a box length of 0.275 mm and 0.117 mm, respectively. Therefore, we assume that underestimating the depth at the end is due to the lack of training data.

TABLE 2.
In the top row, the error, in the centre row, the mean distance over all intervals between the estimation and the label from the box-plots in Fig. 5, and in the bottom row, the distance of the last interval from the box-plots Distance I are described. We show the median, 25th percentile (PCTL), 75th PCTL value, and the distance B, which is the difference between the 25th and 75th PCTL, of all the testing data from the cross-validation. CA is the method where we assume a constant ablation rate. NN 1 , NN 5 , and NN 10 are the approach with the NN that uses one, five, or ten consecutive acoustic measurements. NN 5 s and NN 5 h use 5 consecutive acoustic signals as input. NN 5 s data do not exceed 2.8 mm depth, and therefore Distance I represents the interval 2.75 mm − 3 mm, while all the others represent the interval of I = [3.25, 3.5]mm. NN 5 h was trained with some label augmentation.
To this end, we retrained the network with label augmentation. Specifically, we added a random value r to the depth during the training of the network for all depths exceeding 3 mm. The value r is the absolute random number from the normal distribution N (0, 0.5), with the mean value of 0 mm and the standard deviation of 0.5 mm. We augmented the label as described above, leading to overestimating the depth during the network training. In Fig. 5 we can see that this strategy counteracts the general underestimation of the depth of the network at the last intervals. This is also reflected in Table 2, where the median value of Distance I was reduced from −0.332 mm to −0.127 mm. However, the box length B was increased to the value 0.275 mm to 0.415 mm. Therefore, augmenting the data improves the median accuracy but produces a higher output fluctuation.

VI. CONCLUSION
The experiments show that assuming a constant ablation rate already leads to good depth estimations of the cut by only counting the number of shots. This assumption, however, is limited to shallow cuts and does not hold for deep bone ablations that need a cooling system [37]. These cuts can reach a depth of up to 3 cm. Moreover, the number of shots must be maintained for a valid estimation. This is not stateless, meaning it needs all the information since the beginning of the ablation to estimate the depth. Therefore, it is not fail-safe since it may cause loss of depth information in case the number of the previous ablation gets lost.
It is essential for medical devices to continue working, even when a power failure occurs and all prior information is lost. Hence, we opt for a stateless method to ensure a fail-safe device. In this regard, the proposed approach with the NN is stateless (almost no prior information of previous ablations is needed) and, therefore, advantageous as it uses one or multiple consecutive acoustic waves as an input to predict the depth of the cut and does not need all information from the beginning of the ablation. Therefore, it is fail-safe and can predict the depth after only a few laser pulses. The NN approach has comparable accuracy but slightly more outliers due to its statelessness and sensitivity. The performance improved with an increasing number of consecutive acoustic waves used as input. Too many consecutive acoustic waves are disadvantageous because multiple acoustic waves are needed to estimate the depth accurately, increasing the risk of cutting the hole too deep and damaging sensitive tissue behind the bone.
In this work, we demonstrate the possibility of predicting the depth of a laser-ablated hole by analyzing the acoustic shock waves captured by a single transducer. The results encourage further investigations into the depth estimation during the laser ablation of tissue using acoustic waves. Our experiments were performed in a dry environment, and for future work, we plan depth estimation during a similar setup in wet conditions. Irrigation during laser ablation allows deeper cuts yet presents further challenges in combination with OCT systems. One of the challenges facing the irrigation system is that water accumulates in the hole, distorting depth measurement. In addition, the debris and water droplets pollute the OCT's protective window, reducing the image's contrast. An important factor may be the heterogeneity and the age of the bone influence the ablation process and the prediction of the depth using acoustic waves, which needs to be investigated. MASSIMILIANO (NYUAD). He is also an Associated Faculty with the Department of Biomedical Engineering, Tandon School of Engineering, New York University, Brooklyn, NY, USA. He has authored over 90 peer-reviewed articles, book chapters, books, and patents. His main research interests include the development of smart devices for medical therapy, diagnostics, and monitoring using novel optical technologies, which include smart laser surgery, optical coherence tomography (OCT), photoacoustics, biomedical spectroscopy, AI-aided optical diagnostics and imaging, optical-based smart wearable biosensors, and miniaturized systems. image-guided therapy, robotics-guided laser osteotomy, and virtual reality. As a Principal Investigator, he has finished many projects in these areas and published over 250 papers, patents, and book chapters. He is also the founder of three spin-off companies and licensed his patents and software to medical device companies. VOLUME 10, 2022