Synthetization, Distortion, and Geometric Correction of Isoelectric Focusing Gels for Newborn Screening

Hemoglobinopathies are inherited red blood cell disorders. They are commonly screened from newborns early after birth, and if not detected early and treated properly, some of them can be fatal. One of the most common method for screening hemoglobinopathies in newborns is isoelectric focusing (IEF). IEF separates different hemoglobins to distinct bands based on their isoelectric point. Different hemoglobinopathies can be detected from IEF as additional bands from normal hemoglobins and also from quantitative issues in normal hemoglobins. However, distortions are commonly present in IEF gels that hinder the interpretation of the gel. In this study, we showcase a method for straightening distorted images of IEF gels. Since no dataset containing images of IEF gels was present, we created a novel Sig2Img synthetic sample generator. This allowed us to create gel sides with any desired bands and intensities. Artificial distortions were also applied with basic image processing functions, to represent different kinds of distortions commonly found in IEF gels. Fifteen experiments were created to evaluate the method. The experiments showed that the correction method managed to straighten gel sides even with added difficulties, such as within-sample and sample-to-sample distortions, and added Gaussian noise. Experiments also showed that the method retained the relative quantities of the bands accurately. The straightening enables easier visual interpretation of the gel when screening for hemoglobinopathies. It also allows accurate quantitation of bands which was not originally possible from distorted samples. This gives the method clinical significance, as quantitation can be added to the clinical workflow.

sequestration, overwhelming septicemias, and acute chest 98 syndrome bring great danger to the second half of the new-99 borns first year of life with a high mortality rate. This is the 100 reason why newborn screening is crucial for detecting these 101 disorders as early as possible to prevent or at least effectively 102 treat them, for example with blood transfusions.

104
NBS for hemoglobinopathies such as SCD is well-105 established, as related prevention programs have been in 106 place for over 40 years. Commonly, the screening is con-107 ducted in centralized laboratories. At a hospital, fresh cord 108 blood or heel pricked dried blood spot (DBS) samples are 109 collected from the newborn, then delivered and tested for 110 abnormalities. For this purpose, technologies such as liquid 111 chromatography (HPLC), tandem mass spectrometry and 112 electrophoresis-based methods such as capillary zone elec-113 trophoresis (CZE) are used. From these methods, IEF remains 114 widely adopted, due to high sensitivity and relatively low 115 cost [15]. 116 IEF, a type of gel electrophoresis, can be used to separate 117 different hemoglobins of a patient sample. The gel format 118 contains a pH gradient, and once electric current is applied, 119 the hemoglobins will migrate within the gel to specific 120 regions where the pH is equal to their isoelectric points. This 121 enables sensitive and qualitative measurement of abnormal 122 hemoglobins such as HbH, which is undetectable by other 123 biochemical methods such as HPLC and CZE. For NBS, IEF 124 is considered the gold standard method [3]. 125 C. IEF GEL 126 Similarly to any NBS test conducted in a centralized labora-127 tory, the gel format is made for measuring multiple patient 128 samples during a run. For example, using a 44-well template 129 on a two-sided gel amounts to 88 sample positions in one 130 gel. In this context, one or more position per side is used for 131 control samples. The AFSC control sample is commonly used 132 due to the Hb pattern it contains [16]. 133 The IEF gel is considered as a one-dimensional for-134 mat, as the separation of hemoglobins occurs on one axis. 135 Fig. 1 showcases the separation of HbA, HbF, HbS and 136 HbC of a control sample, and an unaffected newborn sam-137 ple with HbF ac , HbA and HbF. In the context of NBS 138 of hemoglobinopathies, typically the IEF workflow would 139 include full 1st tier screening gels, followed by a confirma-140 tory gel of the screening positive results. Qualitative mea-141 surement of patient samples is commonly done by visually 142 comparing the individual bands to the nearest control sam-143 ples. The presence of any abnormal bands, or the abnormal 144 ratio of HbA and HbF would indicate a screening positive 145 result. Digitalization and relative quantitation [17] of gels is 146 commonly not done, one technical reason for this is that the 147 relative quantitation is affected by distortions routinely found 148 in IEF gels. 149 While gels used in IEF are robust when com-150 pared to the competing technologies used for NBS of 151 VOLUME 10, 2022   209 Intarapanich et al. in 2015 proposed an image-processing 210 tool for DNA gel electrophoresis that featured sample seg-211 mentation, band extraction and sample classification [22]. 212 Their proposed cross-correlation adjustment to correct distor-213 tions was reported as effective, however the results were poor 214 if the samples in the gel had few bands, which is routinely the 215 case in NBS of hemoglobinopathies when compared to DNA 216 analysis, as most samples contain four or less bands.

217
It should be noted that imaging analysis of two-218 dimensional electrophoresis gels is a more popular research 219 topic [23], due to the fact that in DNA and protein research 220 the separation of molecules by isoelectric point and molecule 221 mass are both considered important. These methods do not 222 however translate well to analyzing the one-dimensional IEF 223 gels used in newborn screening, because the patterns of inter-224 est are different.

226
The correction of within-sample and sample-to-sample dis-227 tortion by our proposed method was tested by defining prob-228 able real-world use cases. Full 1st tier screening gels and 229 partially filled confirmatory gel sides were considered. Since 230 NBS can be time-sensitive, running partial gels is common. 231 These gels increase the complexity of the correction prob-232 lem by having less information that can be utilized by the 233 correction method. Sides full of unaffected newborn samples 234 and sides with SCD samples were considered, as the cor-235 rection method would have to perform well in the presence 236 of Hb variants. In terms of distortions, Gaussian noise [24] 237 representing a non-ideal imaging setup was also considered, 238 93490 VOLUME 10, 2022 FIGURE 2. Example of within-sample and sample-to-sample distortion. In subfigure (a), the within-sample distortion of the image representation is minimal, which produces distinct AFSC Hb peaks in the signal representation. In subfigure (b), the distortion of the horizontal information in AFSC image representation causes more overlap of bands, which produces non-representative relative quantitation if calculated. Subfigure (c) showcases a gel side containing sample-to-sample distortion. This drifting of the sample area causes problems when highly distorted patient samples are quantitatively compared against the closest control sample. as added background noise will make the detection and quan-   The example gel and the 72 control sample images extracted 329 from it were not enough for the development and evaluation 330 of a distortion correction method, that is supposed to have 331 clinical relevance during NBS. The gel did not contain any 332 patient samples, and the amount of geometric distortion was 333 low. In order to produce more routine-like data for the 334 study, synthetic gel side images containing control and patient 335 samples were calculated using a novel neural network we 336 call Sig2Img.

337
Sig2Img is a deep fully-connected feedforward neural net-338 work [26], which takes a 1D signal representation of a sample, 339 and reconstructs it as a 2D IEF sample image. The network 340 architecture contains an input layer of 384 nodes, two hid-341 den -and batch normalization layer blocks with 500 nodes 342 each, and an output layer of shape 384,39. ReLU activations 343 were used in hidden nodes. Mean squared error [27] and 344 RMSprop [28] were used to fit the network, with a batch 345 size of 256 and 50 epochs. The relatively simple architecture 346 which was sufficient for our task highlights the difficulty of 347 the task, as peaks in the 1D signal need to be translated to 348 proper band ellipses in 2D space. Subfigure (a) of Fig. 3 349 showcases the architecture and the training process.

350
AFSC and FAS control sample images gathered from the 351 one gel were augmented in order to enrich the training data 352 used for fitting the Sig2Img model. For each image, 20 vari-353 ations were created using height shift, width shift, zoom of 354 10% and horizontal flip. This amounted to 1512 images, 355 from which 80% was used as training data, and 20% as the 356 validation data. Since the purpose of Sig2Img model was to 357 capture the underlying signal-to-image function present in 358 our one gel image, its generalizability to other gel images 359 could not be tested due to unavailability of out-of-sample 360 test data. From each sample data, a 1D signal representation 361 was calculated with a mean over every pixel column, min-362 max normalization and Savitzky and Golay smoothing of 363 the signal [29]. This process is highlighted in subfigure (b) 364 of Fig. 3.  Image processing functions were developed to simulate 411 sample-to-sample and within-sample distortions in the syn-412 thetic gel side images. These functions are comprised of two 413 parts: the generation of a one-dimensional distortion profile 414 curve, and the distortion of the image according to the profile 415 curve.

416
The sample-to-sample distortion profile generator works 417 as follows. First, three sinusoidal signals are generated with 418 random amplitude, phase and frequency as (2) 427 Next, pixel coordinates c x ∈ [0, W − 1] are mapped to 428 the interval x ∈ [0, 1], where W denotes the width of the 429 image in pixels. This mapping allows to generate the discrete 430 sum curve vector s, whose values correspond to s(x) at the 431 mapped pixel coordinate values. The index corresponding to 432 maximum absolute value, where Python-style slicing notation is used. Finally, a linear 438 gradient can be added to the distortion vector d as where b is a constant 0.02.

441
Within-sample distortion generation works as above, but 442 instead of three sinusoidal signals we used only one, where the amplitude a max is set to 4 for convex and -4 for 445 concave banana-shaped distortion profiles.

446
These distortion profiles can then be used to distort an 447 input image (which can be a synthetic gel side image or 448 an image of a sample) by displacing its pixel columns 449 vertically by the amount of d(c x ), where non-integer dis-450 placements can be implemented using linear interpolation of 451 pixel values. Sample-to-sample distortion is applied between 452 100 x-coordinates from the start and 100 x-coordinates before 453 the end of the image. Within-sample distortion is applied 454   In a rare case, if none of the four highest peaks are inside the 515 same band in the next column, then the previous point that 516 was in the band is selected again. Fig. 5 shows an example 517 of this process with a dust particle affecting the tracing of a 518 single band. First image shows the original image. Second 519 image shows the four highest peaks found in each pixel 520 column, where red represents pixel with greatest intensity, 521 with blue, green, and, purple representing the second, third, 522 and fourth greatest intensity, respectively. As this example 523 shows, the dust particle has the greatest intensity in a few 524 pixel columns, but the distance comparison finds the second 525 highest peak in those pixel columns to be the closest to the 526 previously selected points, which causes the method to trace 527 a single band successfully, as shown in the last image.

528
The last procedure before straightening the bands is apply-529 ing a 6th degree polynomial function to the chosen points. 530 This results in more accurate tracing of the band and there-531 fore the outcome is much smoother after shifting the pixel 532 columns up and down, especially for bands that have very 533 large slope. 534 Fig. 6 shows the straightening of a single sample. The first 535 image shows the original curved sample, from which a single 536 band has been traced with the method described earlier. Each 537 pixel column is then shifted up or down to the median of the 538 chosen points so that they form a straight horizontal line.

539
The alignment of samples is done after all the bands have 540 been straightened. The first control samples' HbF band is the 541 reference, to which the other samples are aligned. This band 542 is identified by creating a 1D signal of the control sample by 543 calculating the mean value for each row of the image. HbF 544 band can be identified from this signal by locating the second 545 peak over a certain threshold, since the order of the bands in 546 a control sample is known to be HbA, HbF, HbS, and HbC. 547 This corresponds to step three of the process summarized 548 in Fig. 4.

549
After this, the fourth step of the process is done, where the 550 whole sample area is moved so that the HbF band of the first 551 control sample is aligned with a preselected y-coordinate. 552 This ensures that the y-coordinate of this band is definite, and 553 it can be used as a reference location. It also establishes that 554 all the images straightened will have the sample area at the 555 same position.

556
Then, the 1D signals are created the same way for rest of 557 the samples, and the highest peaks of all these signals are 558 stored. The highest peak in each samples' signal is used to 559 align the sample with the first control samples HbF band. 560 In normal newborn samples and in most other newborn sam-561 ples, the band with the greatest quantity, and therefore the 562 band with the highest peak in the signal, is HbF. These two 563 steps are steps five and six in Fig. 4.

564
Control samples have generally HbA with the greatest 565 quantity. So, the alignment of control samples is fixed, 566 by identifying HbA band in each control sample, the same 567 way as identifying HbF band in the third step of the process, 568 and then aligning each control samples' HbA band with the 569 HbA band of the first control sample, shown as the seventh 570 step in Fig. 4.  Example of the tracing of a single band. The added dust particle causes the highest peak determination logic to break. From the first image the four highest peaks (intensities from highest to lowest are red, blue, green, and purple) are calculated for each pixel column. The particle causes highest peak to shift for five columns in the second image. The third image shows the successfully selected points after the distance comparison process.  Table 1 also illustrates this, as the 602 biggest mean difference in relative quantitation for these 603 experiments was 0.46% in HbF for experiment 1.1 and maxi-604 mum absolute difference was 0.67 in HbF for experiment 3.1. 605 These also show the minimal effect of the Gaussian blur to the 606 quantitation, since there is no other image processing done in 607 these experiments.  Table 1 also 613 supports this argument, with again only slight differences in 614 the relative quantitation.  8, 9, and 10. 624 These figures show that the method works as intended, even 625 after applying Gaussian noise, that has a magnitude estimated 626 to be five times greater than the background noise.

627
It is clear from Table 1 that in each singleton exper-628 iment the mean relative quantitation difference is very 629 VOLUME 10, 2022  HbF ac in a study by Shiao and Ou [7], which indicates 634 that the differences shown in Table 1 are not significantly 635 greater, and thus not enough to influence the interpretation 636 of the gel. Also, in each experiment the largest difference is 637

659
In this study, we proposed novel methods for the genera-  rately. This is crucial for IEF gels, since when screening for 675 hemoglobinpathies, in addition to detecting abnormal bands 676 in a sample, the quantities of the bands are also as meaningful.

677
Therefore it is important that the straightening does not hinder 678 the image result. As the Tables 1 and 2 show, the mean relative 679 quantitation difference is below the normal variation of Hb 680 bands in each experiment.

681
The correction also enables relative quantitation of the 682 bands in situations where it is not possible otherwise. Because 683 the relative band quantities are calculated from the peak areas 684 in the 1D signal representation, the signals created from 685 distorted samples do not represent the bands accurately. How-686 ever, after correctly straightening, the signal representation 687 now contains distinct peaks which correspond to the actual 688 bands, and therefore the quantitation is made possible.

689
The main limitation of this study was the limited image 690 data of real gel material, which reduces our Seg2Img model's 691 ability to produce more diverse and representative synthetic 692 gel samples. Future work would include additional data gath-693 ering and model evaluation.

694
It should be noted, that the gel correction method was 695 also tested to perform well with a real-life screening gels, 696 where quantitation was made possible for highly distorted 697 bands [5]. The results showcased in this publication provide 698 further evidence of this, by systematically testing for com-699 mon normal and abnormal situations arising from routine 700 hemoglobinpathy screening.

701
Enabling quantitation in situations where the gel is too 702 distorted for interpretation is what gives the method clini-703 cal significance. NBS laboratories using IEF for screening 704 hemoglobinopathies could include relative quantitation to 705 their clinical workflow, which in turn would enable more 706 sophisticated gel analysis such as hemoglobin migration 707 assessment during a gel run.