Optimizing Terahertz Waveform Selection of a Pharmaceutical Film Coating Process Using Recurrent Network

In-line terahertz pulsed imaging has been utilized to measure the film coating thickness of individual tablets during the coating process in a production-scale pan coater. A criteria-based waveform selection algorithm (WSA) was developed to select terahertz signals reflected from the surface of coating tablets and determine the coating thickness. Since the WSA uses many criteria thresholds to select terahertz waveforms of sufficiently high quality, it could reject some potential candidate tablet waveforms that are close but do not reach the threshold boundary. On the premise of the availability of large datasets, we aim to improve the efficiency of WSA with machine learning. This article presents a recurrent neural network approach to optimize waveform selection. In comparison with the conventional method of WSA, our approach allows more than double the number of waveforms to be selected while maintaining excellent agreement with offline thickness measurements. Moreover, the processing time of waveform selection decreases so that it can be applied for real-time coating monitoring in the pharmaceutical industry, which leads to more advancements in quality control for pharmaceutical film coating.

P HARMACEUTICAL film coating processes are widely used to ensure color uniformity, light protection and taste masking of the dosage forms. Functional coatings can be used to mask the taste or smell of a product, to protect the active pharmaceutical ingredient (API) against the acidic environment of the stomach or the gastric mucosa against an aggressive API, and to prolong API release. Coating thickness and integrity are particularly important for functional coatings where a minimum thickness is required to ensure gastro-resistance of a dosage form or to achieve a targeted release profile/ rate. Active coatings contain an API, the amount of which is directly correlated to coating thickness. Several non-destructive sensing techniques have been demonstrated to quantify film coating thickness such as Raman spectroscopy [1], near-infrared spectroscopy [2] and more recently, optical coherence tomography [3], [4]. Terahertz pulsed imaging (TPI) was introduced approximately 15 years ago and has attracted interest in the pharmaceutical industry as a fast, nondestructive modality for quantifying film coatings on pharmaceutical tablets. Extensive reviews on the subject matter have been conducted [5]- [8]. In short, incident terahertz pulses penetrate through the coatings where a portion of the terahertz pulse reflects back to the detector at each coating interface or abrupt change in refractive index. Film coating thickness is then determined from the separation of measured reflection peaks in the processed signal, given by d = cΔt/(2n) , where d is the coating thickness, c is the speed of light, Δt is the peak separation time and n is the refractive index of the coating material. In an effort to better understand the pharmaceutical coating process, in-line TPI has been demonstrated to measure tablet coating thickness directly during the coating operation with a fast acquisition rate (up to 120 Hz) thus yielding statistical information on the coating variability of the tablet population inside the coating unit. This has been demonstrated for production [9], [10] and lab-scale [11] processes and consolidated against numerical modeling [12], [13]. In processing the saved data stream of raw waveforms, a criteria-based waveform selection algorithm (WSA) was developed previously [9], which This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ can select waveforms that represent reflections from the coated tablets that are both in the focus of the terahertz beam and normally aligned to the terahertz detector. The WSA is dependent on a combination of thresholds determined using offline data including the amplitude, width, and position of the peaks that represent tablet surface and subsequent coating/core interface. Depending on the threshold values, WSA could reject potential candidate tablet waveforms resulting in limited thickness measurements taken i.e., hits. Opportunities therefore loom to increase the number of tablet waveforms selected by relaxing the parameters [10] without significantly compromising on the accuracy to offline thickness measurements. This would then allow for a more complete representation of the process allowing for systematic investigations to be performed leading to greater process understanding [10].
Significant progress have been made on analyzing terahertz measurements of samples with simple geometries using conventional terahertz time-domain spectroscopy in an ideal laboratory condition [14], [15]. Under such scenarios, an understanding of the underlying mechanisms is often needed e.g., Lorentzian profiles for water vapor removal [16] or a double Debye in resolving molecular water states [17]. For analyzing measurements that are convoluted by additional effects, such as shape irregularity, variations in thickness, attenuations, Fabry-Perot oscillations and the combinations thereof, experimental data-driven approaches using machine learning algorithms have found success in automatic signal recognition without manual and professional intervention. For example, convolutional neural networks (CNNs) [18]- [21] and various shallow classifiers, like support vector machines [22]- [27], k-nearest neighbors [23], [28], [29], random forest [30], and other single-layer neural networks [31]- [34] have been used for image/waveform classification. With the exception of CNN, these common classifiers are usually used in conjunction with feature extraction methods, e.g., principal component analysis. In the context of thickness measurements geometric algebra has also been proposed, though for classifying transmission measurements at thicknesses between 0.5-3 mm in an ideal laboratory setting [35]. Given the complexity of the pharmaceutical tablet pan coating process (spray atomization, thermodynamic drying and tablet mixing) [36] coupled terahertz in-line measurement through perforations of a rotating coating pan travelling at 0.41 m/s, developing an analytical solution is non-trivial. Consequently, mechanistic modeling using discrete element models are used yielding promising results [12], [13] As large volumes of experimental datasets can be generated, the challenge is to investigate whether a data-driven approach using machine learning could also be used to learn the features for real-life classification, representative of a production scale coating environment.

A. Long-Short Term Memory Network
Recurrent neural networks (RNN) form a widely-used class of artificial neural network, which is capable of handling sequential data. Since it was first described in 1982 [37], this class of network has been developed for application in many different fields, such as speech recognition [38] and machine translation [39]. It contains a learning structure where all inputs and outputs are inter-connected [40]. Hence, the input from each unit depends on the previous output. Long-short term memory (LSTM) was developed by [41] to overcome the limitations of RNNs with improved ability to capture long-term dependencies. Unlike standard RNN unit, the memory cell in LSTM replaces the traditional node in the hidden layer. Fig. 1 shows the internal structure of a LSTM unit. It contains memory blocks and memory cells, along with the gate units they contain [42]. The input gate takes charge of the input stream to the memory cell and the output gate takes charge of the output flow to other LSTM units. The forget gate is controlled by a one-layer neural network. The output of the forget gate is a sigmoid activation function that is connected to the previous memory block. The activation of this gate is calculated as where x t is the sequence input, C t−1 is the previous LSTM block memory, h t−1 is the previous block output and V f is the bias vector. P is the weight vectors for the input and σ represents the sigmoid activation function. Since the output of the forget gate is applied to the memory block of previous LSTM unit, the previous memory block will affect the signal of the current LSTM. If the activation output vector values are close to zero, then the previous memory will be erased [40]. The input gate creates new memory determined by a simple neural network with the tanh activation function and the previous memory block. These signals can be expressed as The output gate generates the output signal of the current LSTM unit. These outputs can be expressed as As the most popular type of RNN, LSTM has also been widely applied for classifying sequential data such as electrocardiogram signals [40] and power load forecasts [43]. Since terahertz waveforms are time dependent, LSTM is used to analyze terahertz waveforms. Additional justifications for the choice of LSTM over MLP or CNN can be found in supplementary information where higher sensitivity, precision, and accuracy are observed. Furthermore, CNN requires the waveforms to be converted to images prior to classification, which would require additional overheads thus may not be ideal for waveform selection in real-time [44]. Here, a particular type of LSTM network i.e., bidirectional LSTM (BLSTM) network, which divides regular RNN neuron states into forward and backward for higher selection accuracy [40].

A. In-Line and Offline Dataset Acquisition
Details of the coating experiments used in this work have been covered elsewhere [9]. In short, a production-scale perforated pan tablet coater (Premier 200; Oystar Manesty, Merseyside, U.K.) was used for the coating trials. The diameter of this coater was 1.3 m and the diameter of each circular perforation was 3 mm. TPI (TeraView Ltd., Cambridge, U.K.) was mounted onto the side of the coater where terahertz signals were focused onto the surface of tablets inside the coating pan. Tubular baffles were additionally fitted into the coater pan to facilitate tablet bed mixing. Individual terahertz waveforms were acquired at a rate of 120 Hz throughout a 300-min coating process. The rotational speed of the coater was set as 6 r/min. The tablet geometry was biconvex (10 mm diameter, 370 mg). Two runs under the same process conditions (runs A and B) were performed. The tablet load for these two coating trials was 175 kg. In order to validate the in-line measurement data, several tablets were removed randomly from the coating pan at 60, 120, 180, 240, and 300 min for both runs, and were measured using the offline TPI. It should be noted that even though process conditions were identical, coating variations do exist between runs due to the inherent process complexities [10]. As a matter of fact, weight gain variations of coatings from runs A and B span from 16% to 20% throughout the process thus affecting product quality. In another separate coating trial (run C), tablets underwent a different coating process altogether compared to runs A and B. In particular, a batch of uncoated tablets of the same load was added into the coating process after 140 min of coating [10]. The dataset resulting from run C is used to assess network's ability to predict on a process completely different to what it was trained with. Fig. 2 shows the structure of the BLSTM network. The first layer is a sequence input layer of size 1x69, to accommodate our cropped waveform data. The BLSTM layers learn longterm dependencies between time steps with 50 hidden units. This is followed by a dropout layer with rate 0.5 and a fully connected layer. For the classification output layer, we used the cross entropy error function and the softmax activation function, which is widely used for 1 of K classification [45]. The softmax function normalizes the network outputs to between zero and one. To prevent the network from overfitting, a dropout layer has been added into the network, which prevents the units being too consistent with each other during the training process.

B. Design of Neural Network
The BLSTM network is trained with cross-entropy loss for 20 epochs with the Adam solver, a learning rate of 0.0001 and a mini-batch size of 125.

C. Training Dataset Preparation
Because the in-line measurement dataset contains waveforms from the tablets, surface of the coating pan, perforation edge and noise waveforms, the training dataset is split into two classes: hit class (reflected from tablet) and miss class (reflected from other places) for network training. The primary training dataset is generated from the offline measurement of run A at the process endpoint. Offline waveforms from run A are used to represent the hit class, the training data of the miss class is generated from an empty coater. Fig. 3 shows examples of measured offline terahertz timedomain waveforms of a coated tablet at different coating time points, which shows a main positive peak (at 0.0 mm) corresponding to the air/coating interface and a negative peak for the coating/core interface. To reduce the effect of noise on the network while maintaining the main features of the tablet waveforms, all waveforms were preprocessed. In particular, the front of the main peak and the rear of the negative peak of offline waveforms were discarded. Furthermore, waveforms whose main peak's position outside of ± 0.2 mm, i.e., outside the length where thickness can be accurately measured, and with an amplitude less than 0.05 were discarded from the dataset. The in-line dataset for run B is used for external validation.

D. Training Dataset Augmentation
As there were significant variations in the number of selected tablet waveforms in cross-validation, thus indicating unstable training, the training dataset is increased. In particular, we simulated terahertz waveforms using Fresnel's equations, representative of reflections from a single-layer coated tablet acquired with the sample placed at the focus and normal to the terahertz sensor by a robotic arm [5]. Given the incident terahertz field E i propagating through air, the equation for the reflection assuming plane wave approximation [46] is where d is the coating thickness, k is the wavenumber in coating material, r ab is Fresnel reflection coefficient (0, 1 or 2 in this case corresponds to air, coating and core) that is defined as r ab = n a −n b n a +n b , where n a and n b is the refractive index of material a and b. In order to generate a large amount of tablet waveforms, three parameters: n a (refractive index of coating); n b (refractive index of core); and d (thickness of tablet coating) are used as variables.
The averaged refractive indexes of the coating and core materials are obtained from offline measurements, which are 1.79 and 1.63, respectively. Then, we made n a and n b vary within ± 2% of the averaged values. According to the calculated coating thickness of the offline measurements, the values of d for each tablet waveform are randomly selected from 45-110 μm. Fig. 3 shows the examples of simulated and measured offline tablet waveforms for 2, 3, 4, and 5 h where generally good agreement can be observed. There are some differences between simulation and measurements due to the assumption that the refractive index is constant where in actual fact, it is frequency dependent [5] and settings used in deconvolution and filtering also affect the waveform profile [46]. Although the front part before the main peak (between −0.15 mm and −0.05 mm) is different, it will not affect training process as waveform pre-processing excludes this part. It should be noted that standard deconvolution and filtering settings are used here [46] and where these values deviate away the standard settings, high frequency oscillations and interface peak broadening will be introduced but with minor effect on the subsequent analysis. The distributions of data numbers used in  Table I.

E. Model Refinement
The training results using the augmented training dataset show that the averaged coating thicknesses increase over coating time, however, compared with offline measurements, the network tends to identify waveforms with thick profiles as tablets. This highlights that the offline measurements do not truly represent in-line waveforms. This is in part due to how the measurements are acquired. Such an idealized scenario would be impossible to maintain for measurements taken inside the rotating coating pan. The offline waveforms therefore serve only as a guide to what the in-line waveforms would look like. To accommodate this, we consider transfer learning, which can be employed to refine a trained model to be effective with a different task or data. This model is further trained as the starting point for another model for a second task [47]. To train the network to learn the differences between offline and in-line, transfer learning is used to refine the trained network. The training procedure is divided into two distinct steps: initial model training with simulated offline data based on run A and retraining the same BLSTM network weights using a small amount of in-line data of the run A selected by WSA at a learning rate of 0.0005 and a mini-batch size of 125 for 15 epochs. Table II gives the dataset distribution for the transfer learning. The training of the first step lasts about 30 min and the training time for the second step is in less than 5 min.

F. Postprocessing and Implementation
The offline training takes 30-40 min and the training of the transfer learning step is much faster at 5 min. After applying the trained network to the entire in-line dataset of run B, the coating thickness of the selected tablet waveforms are calculated. Since the minimum resolution of TPI is 30-40 μm [9], the tablet waveforms with coating thickness less than 30 μm are manually removed from the hit class. All the analysis and algorithms developed in this study are implemented in MATLAB (MathWorks, USA).

A. Network Performance
To evaluate the performance of the BLSTM network, we compare the tablet waveforms selected by the BLSTM network with the tablet waveforms selected by the WSA. The confusion matrix for the BLSTM network is given in Table III where results of the WSA are assumed to be correct. This assumption is valid because WSA used stringent thresholds to ensure only high quality waveforms are accepted [10]. We use accuracy, sensitivity and specificity [48] to evaluate the performance of the network, which are defined as follows accuracy = TP + TN TP + TN + FP + FN (7) where true positives (TP) and true negatives (TN) are the number of the hits and miss correctly classified, while false positives (FP) and false negatives (FN) are the number of the incorrectly classified hits and miss, respectively. The classification results can then be evaluated, showing total accuracy is 80.57%, sensitivity is 72.06% and specificity is 81.76%, thus indicating that BLSTM can accurately select the correct tablet waveforms from the in-line dataset.
In order to examine these miss-classified waveforms further, Fig. 4 shows the coating thicknesses of the selected tablet waveforms at 30 s average for comparison against offline measurements where generally a good agreement against the offline TPI, weight gain measurements and WSA is observed. It should be noted that the waveforms selected by WSA and BLSTM are processed in the exact manner to determine the coating thickness [9]. As expected, a steady increase in coating thickness is observed in-line with coating time [9]. Thicknesses from WSA and BLSTM before 80 min are not considered due to the fact that coating thicknesses are below the minimum TPI resolution of 30 μm [9]. At the ending stage of the coating process (250-300 min), selected thicknesses deviate in slope compared with the WSA, but are still consistent with offline measurements. Fig. 5 shows coating thickness distribution from 60 to 300 min, which loosely follows a Gaussian distribution with the mean increasing steadily over coating time. The intertablet coating uniformity, expressed as the coefficient of variation or the relative standard deviation of coating thickness is compared in Fig. 6 where there is a good agreement [10]. Although BLSTM is slightly lower in the early part of the process, Figs. 5 and 6 indicate that the coating uniformity evaluated by two methods is consistent. Compared to the WSA [10], BLSTM is able to select twice the number of tablet waveforms without compromising on the accuracy when compared against the offline measurement. This work therefore shows promise for in-line waveform selection.
To examine the selected waveforms further, Fig. 7 shows some examples of true and false positives. By comparing the amplitude of the second negative peak (coating/core interface) between two true positives and false positives, it can be seen that the waveforms with large second negative peak are selected as tablet by both WSA and BLSTM. However, WSA overlooks waveforms with second peak amplitude falling below certain threshold while BLSTM is less rigid and can discriminate based on additional features learnt from the dataset. Table IV gives the performance between WSA with stringent thresholds, WSA with relaxed thresholds [10] and BLSTM. R 2 here indicates the correlation of coating thickness between offline terahertz measurements made on coated tablets removed at regular intervals during the coating process and two selection methods. The data shows that BLSTM can select the same amounts of tablet waveforms, while still in good agreement with offline thickness measurements. To further assess the applicability of the algorithm for real-time processing, we examine the computational     [10] time of the algorithms. In particular, a single waveform takes 1 ms to be processed in WSA while BLSTM takes 0.1 ms, an order of magnitude faster. This therefore suggests that BLSTM network could potentially be deployed for selecting terahertz waveform in real-time during the coating process.

B. Unforeseen Processes
The close agreement between coating thicknesses of BLSTM selected waveforms and offline measurements suggest BLSTM has learnt variations between coating processes with identical conditions. To further examine the ability to generalise in addition to process variations, we present the trained network with the dataset from run C [10], acquired from a process with conditions entirely artificial and vastly different to that of run A. This in turn resulted in very low classification accuracy (Sensitivity = 44.31%) especially for the already coated tablets shown in Fig. 8. Therefore, whilst the network can operate on similar processes i.e., run B, the network cannot generalize to processes with a complete change in process conditions. This can be improved if the network can be trained using data from coating runs with different process conditions supplemented by additional data of an empty production scale coater to better represent the miss class.

IV. CONCLUSION
In this article, we have demonstrated a BLSTM network for terahertz waveform selection as part of a pharmaceutical tablet coating process. Compared with conventional WSA, BLSTM is able to select the tablet waveforms whose features are less apparent and without compromising on accuracy when compared against the offline thickness data. The processing time is largely improved as well so that it can potentially be used for real-time monitoring of the pharmaceutical film coating process for both quality control and process investigation. However, the network has to be refined by transfer learning due to the lack of training data from an empty production-scale coater. This therefore restricts the generalization ability of the network to processes similar to what it has been trained with. He was the Chair of microstructure engineering with the Department of Chemical Engineering and Biotechnology, University of Cambridge, Cambridge U.K., and is the Kenneth Denbigh Lecturer and Fellow at Gonville and Caius College. He is currently the Terahertz Application Group, Cambridge focusing on terahertz spectroscopy and imaging for material characterization, nondestructive imaging supported by complementary techniques. His work was recognized with the 2016 Royal Pharmaceutical Society Science Award and he is an Elected Fellow of the Royal Society of Chemistry.
Hungyen Lin (Senior Member, IEEE) received the Ph.D. degree in electronic engineering from the University of Adelaide, Adelaide SA, Australia, in 2012.
He is currently a Lecturer in Electronic Engineering with Lancaster University, Lancaster, U.K. Prior to that, he was a Research Associate with the Terahertz Applications Group, University of Cambridge, Cambridge, U.K. His research interests include developing terahertz and optical technologies and creating industrially relevant applications in pharmaceutical manufacturing, fuel cells and advanced materials.
Dr. Lin is an EPSRC Peer Review College Member and a Fellow of the Higher Education Academy.