An Automatic CADI’s Ionogram Scaling Software Tool for Large Ionograms Data Analytics

Scale the ionosonde ionograms to produce accurate readings is a professional manual scaling technique. However, there is a high demand for auto-scaling software that can manage a large number of ionograms in order to avoid the time and effort involved in manual scaling as well as human errors. Noise-free, accurate trace identification and precise segmentation are required for the auto-scaling program to work. The Canadian Advanced Digital Ionosonde (CADI) ionograms are processed and auto-scaled using a new model on an open-source (Python) platform in this paper. Filtering the noise, Convolution Neural Network (CNN) based trace detection, layer-wise segmentation, and then extracting the ionospheric features are used to accomplish the scaling accuracy. The investigation uses raw ionogram files generated by the CADI system in Hyderabad, India (Lat: <inline-formula> <tex-math notation="LaTeX">$17.47^{\circ }\text{N}$ </tex-math></inline-formula>, Long: <inline-formula> <tex-math notation="LaTeX">$78.57^{\circ }\text{E}$ </tex-math></inline-formula>) between 2014 and 2015. Raw ionograms in <inline-formula> <tex-math notation="LaTeX">$^\ast $ </tex-math></inline-formula>.md4 or <inline-formula> <tex-math notation="LaTeX">$^\ast $ </tex-math></inline-formula>.md2 file formats can be accepted by the suggested model (Individual or Hourly integrated). The proposed auto-scaling software tool’s individual block performance is examined with several classes of ionograms, and the overall performance is evaluated with a huge set of ionograms obtained during adverse space weather circumstances (16th to 18th March 2015). Univap Digital Ionosonde Data Analysis (UDIDA) software tool was considered for manual scaling. The results of manual scaling are compared with that of proposed scaling software. In fmin and h’f, respectively, the proposed model has a mean absolute error (MAE) of 0.36 MHz and 11.72 km, and a root mean square error (RMSE) of 0.7 MHz and 22.36 km.


I. INTRODUCTION
Digital ionosondes are high frequency (HF) and high-power ionosphere probing devices. Ionosonde transmits series of modulated pulses at vertical incidence and records reflected echoes representing the ionospheric features in the form of ionograms. Traces in the ionogram reveal the features of the ionosphere in terms of frequency and height components. The ionospheric features can be extracted from the ionogram by employing manual scaling or auto-scaling software tools. Even though the manual scaling results are accurate, but it is achieved only with the right expertise. Manual scaling of various classes of ionograms is a time taking and tedious job. In contrast, the ionogram auto-scaling is The associate editor coordinating the review of this manuscript and approving it for publication was Manuel Rosa-Zurera. faster and entirely accurate than manual scaling values but tends to fail in complexity, such as ordinary and extraordinary traces, E and F layer spread phenomenon, and incomplete ionogram formation [1]. Pezzopane and Scotto (2002) developed Autoscala software [2], Huang and Reinisch (1983) proposed and developed an ARTIST software [3], Ding Zonghua (2010) proposed Cadiscale software [4], and Pillat et al. (2013) proposed Univap Digital Ionosonde Data Analysis (UDIDA) software for scaling the ionospheric features automatically [5]. UDIDA addressed reducing the manual work in the ionograms by resulting time, frequency versus height information from each ionogram for further processing [5]. Whereas the processed results of UDIDA require finetuning, Autoscala results reduce error during specific periods and events [6]. Lynn (2017) proposed a method for ionogram displaying and auto-scaling of F layer [7]. The method VOLUME 10, 2022 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ relies on the formation of frequency and height histograms in each ionogram. Jiang et al. (2017) developed an Ionogram Scaler software to carry out manual and automatic scaling of vertical incidence ionograms [8]. The software program performed well in terms of ionogram scaling and Ionosonde Total Electron Content (ITEC) estimate. Chen et al. (2018) proposed an algorithm for automatic scaling of ionograms with separated O and X waves [9]. Image recognition, mathematical morphology, and graph theory are used to create the novel auto-scaling method, and crucial parameters are identified using ionospheric properties. Fagre et al. (2020) proposed an algorithm for automatic scaling of F layer from the ionograms [10]. An image processing methodology for the extraction of curvilinear structures is used to offer a method for automatically scaling the F-layer from ionograms. The accuracy and consistency of automatic scaled data are still challenging even though significant auto-scaling techniques and models have been proposed in recent years. The major problem in automatic ionogram scaling is correctly distinguishing multiple hop Es reflections (in the virtual height of F region) and right F-traces. In addition to that, correct identification of the ordinary part of spread traces is also a significant issue. The ARTIST auto-scale software uses neural networks and hyperbolic trace fitting techniques to identify the trace and scale the ionograms' features. The image processing technique is used in Autoscala software, and a fuzzy logic technique is used in the UDIDA software. The physical significance in the ionosphere irregularities is identified by classifying the ionogram using the CNN method. Feature extraction from the ionogram increased the prediction accuracy in prediction models [11]. De La Jara and Olivares (2019) proposed ionospheric echoes detection in digital ionograms using Convolutional Neural Network (CNN), a subset of Deep Neural Network (DNN). The CNN model can capture ionospheric features using the filtering process of ionograms [12].
In this paper, a concept of ionogram auto-scaling procedure on an open-source platform is proposed to extract the ionospheric fmin and h'min of E, F1, and F2 layer features. An open-source ionosonde data analysis visual tool development will benefit for the ionospheric researchers and would be helpful for other CADI ionosonde receivers across the world.

II. METHODOLOGY
The architecture of the proposed scaling software is described in Fig.1. It has a de-noising filter, CNN (Visual Geometry Group (VGG-16)) based trace detection (classifier), segmentation, and an auto-scaling block. Pretrained CNN (VGG-16) net modified with the required number of classes and used Tensorflow and Keras for training, validating and testing the net for ionogram classification. CADI data files offline plotting program developed on Python is available to the developers and can be downloaded from GitHub (https://github.com/pitgo/cadi24h/blob/master/cadi24h.py). Auto-scaling module on python 3.6 is developed. Matplotlib and NumPy for displaying the ionogram and listing the frequency versus height value bins are considered.

A. RAW IONOGRAM FILES READING AND PLOTTING
The proposed software can handle both * .md2 and * .md4 file formats. It reads the header information such as station ID, time, and integration (Individual or Hourly) details from the raw ionogram file. The program sets the reading information in terms of flags 1, 2, 3, 4, 6 for the time intervals of 1, 0.5, 0.33, 0.25, 0.16 Hours from * .md2 and * .md4 files. Then, based on the flag information, the program sequentially reads the frequency and height bins from the location. Finally, the program plots the frequency versus height points as x and y-axis, respectively.

B. FILTERING THE NOISE
An adaptive sliding frequency window technique is implemented at each height to de-noise or filter the ionogram's noise [13].  , and it is considered in the proposed scaling software to detect or identify traces in the ionogram images. Preceding number to VGG specifies the network depth to hold the trainable parameters [14]. The number of convolution layers in a net or the depth of the network significantly affects model accuracy. Generally, better performance is achieved with more convolution layers, but converging in deeper neural networks is challenging, and their accuracy may get saturated [15]. Also, the receptive fields or kernel size should be as small as possible to minimize the training time. So, there must be a tradeoff in selecting the network depth and kernel size.
The VGG-16 net mean absolute error is less when compared with AlexNet and ResNet. VGG-16 is a network with 16 layers of depth that holds the trainable parameters [14]. It has two sets of two convolution layers with the filter size of 64 and 128 respectively, 3 sets of three convolution layers with 256, 512, and 512 filter sizes in each set, and finally has 3 dense or fully-connected layers respectively with 512 units in two dense layers and number of training class units in the final dense layer. The details of the CNN (VGG-16) Net architecture are tabulated in Table 1. All convolution layers have hidden rectified linear units (ReLU) as their activation function, and all sets are interconnected with max-pooling layers. A pre-trained VGG-16 net is considered and modified the fully connected layer block with the user-defined classification.

D. IONOSPHERIC LAYER-WISE SEGMENTATION
Chen et al. (2013) investigated and scaled the F layer parameters by separating the E and F layer trace pixels respectively extended in the range of 90 km to 150 km and 150 km to over 500 km using bounding box estimation to locate the E and F layer traces from the ionograms [16]. Scotto and Pezzopane (2007) examined the ionograms to scale the sporadic E layer observed in the height range of 90 km to 120 km or more [17]. Yusupov and Bakhmetieva (2021) explored the sporadic E Layer with a structure of double cusp in the vertical sounding ionogram in the range of 90 km to 130 km and reported that they can be distinguishable from other D, E, and F layer traces [18]. Enell et al. (2016) evaluated the comparison of manual scaling and Autoscala scaled parameters of E, F1, and F2 layer parameters and reported that Es were observed in the range of 100 km to 170 km which are not part of the normal E and F layer trace and F1 layer critical frequency observed above 150 km [19].
It is important and necessary to consider the geographical location, diurnal, seasonal, and solar cycle variation parameters while segmenting the layer-wise frequency versus height points. In our work, frequency versus height points are segmented based on the general height settings such as 90 km to 150 km for E layer, 160 km to 290 km for F1 layer, and 300 km to 600 km for F2 layer. And in the case of Spread F and Sporadic E event traces image classification, F layer window is set to 160 km to 600 km. Layer height settings can be arranged with respect to latitudinal and seasonal ionospheric changes.

E. AUTO-SCALING
The respective layer-wise segmented height versus frequency bins are passed to the scaling block. The scaling block lists the frequency bin values with indexed height values in ascending order. The first indexed values of frequency and height are the minimum components of that particular layer trace. To improve the accuracy and minimize the error, the first 5 minimum indexed frequency and height values are averaged to get frequency minimum and height minimum components of each layer trace.
There are two program components in the proposed scaling software. The first one is a CNN-based ionogram classification program, which is trained on an IBM server (IBM 3400 M3) and considered the history of the trained program to classify the input ionogram. The second program is the scaling program, which is running on the same IBM server to scale the features from the ionogram.
Before applying the raw ionogram files at the input of the proposed tool for auto-scaling, the CNN block is trained with the training set and optimized to achieve better accuracy. Then, the location of the ionogram files is given to the proposed auto-scaling module to read the files sequentially. Finally, the raw ionogram file content will be serially transferred through the filter block for removing the noise, CNN trace identification block for identifying the various traces, segmentation block for separating the frequency versus height points, and scaling block for extracting the ionospheric features.

III. RESULTS & DISCUSSION
Raw ionogram files ( * .md4) generated by CADI located at Hyderabad, India (Lat: 17 Ionograms are manually confirmed and labeled with the class number it belongs to and segregated under the labeling class. A set of 4000, 6000, 6000, 5000, 6000, 5000, 4000, and 4000 similar ionograms are manually recognized and segregated with the labeling of (blank/noise), (single ordinary trace), (ordinary and extraordinary trace), (E and F layer trace), (E, F, and secondary/multiple traces), (spread F event trace), (spread F and sporadic E trace), and (sporadic E, F layer and multiple traces), respectively. 80% of images from each class are considered as training data sets, and 10 % are considered for validation and testing using the temporal split procedure.

A. CNN BASED TRACE DETECTION (CLASSIFICATION) BLOCK PERFORMANCE EVALUATION
The performance of CNN based trace detection module is evaluated in comparison with the traditional Artificial Neural Network (ANN). In the case of CNN, each image size in the training, validation, and testing set is fixed to 224 × 224 × 3 to support a pre-trained VGG-16 net. In the case of ANN, a Pattern Recognition Tool (nprtool) is used for ionogram image classification. The ionogram images are converted to black and white image with a square matrix size of 224 × 224. Then, all images are combined to make a single large matrix and train the ANN. Initially, both CNN and ANN nets are trained on IBM server (IBM 3400 M3) with the training images set (80% of total images set from each class). The CNN and ANN nets are tuned and optimized for the settings during the training process.
The CNN trace detection efficiency is evaluated with the 10% test images set from each class compared to the ANN classification efficiency results with the same test image set. The classification accuracy, F-Score, and False Omission Rate (FOR) evaluation metrics [20] opted for the analysis and comparison of the proposed CNN and traditional ANN classifier are presented in Table 2.
where, TP is True Positive, TN is True Negative, FP is False Positive, and FN is False Negative. The CNN classifier's overall accuracy is about 97%, F-Score is 89.34 and significant less FOR of 1.6. Whereas the traditional ANN overall accuracy is about 78.4 %, F-Score is 78.8, and high FOR of 3.78. The overall accuracy is increased by 14% and 18.66% when compared with the module presented in [11] and ANN, respectively.
The proposed auto-scaling software can be implemented in off-line mode and the computation time is about 1 to 2 seconds for each raw ionogram file depending on the complexity of each ionogram, such as ordinary and extraordinary ionograms, Sporadic E event, and spread F event ionograms. When compared with ANN computation time for each input ionogram file, there is an improvement of few seconds in the computation time for the same input ionogram file to the CNN classifier.

B. PERFORMANCE EVALUATION WITH VARIOUS CLASSES OF IONOGRAMS
The performance of the de-noising filter, CNN trace detection, segmentation, and auto-scale blocks is evaluated with different classes of ionograms (Fig. 2). Fig. 2 (a), (d), (g), and (j) respectively indicate the classes of F1 and F2 layer traces, sporadic E, F1, and F2 layer traces, sporadic E and spread F traces, and spread F trace. After applying the adaptive sliding frequency window technique with 2 nonzero points, elimination at each height (6 km resolution) preserved the valid and spread traces in the filtered ionogram as shown in Fig. 2 (b), (e), (h), and (k), respectively.
The CNN-based trace detection block outputs a number 1 depending on the class of ionogram it detected. The number indicated in Fig. 2 (c), (f), (i), and (l) shows the accurate detection of various layers by the CNN-based trace detection block. The layer-wise segmentation block adjusts the height range settings depending on the number it received from the CNN-based trace detection block and separates the frequency indexed height values to respective layer bins. The images shown in Fig. 2 (c), (f), (i), and (l) indicates the non-presence of other layers residual part due to the segmentation process implemented.
For different classes of ionograms shown in Fig. 2 (a), (d), (g), and (j), the auto-scaled results in comparison with manually scaled values of minimum frequency (fmin) of various layer traces are respectively presented in Fig. 3 (a), (c), (e) and (g) and virtual height (h'f) of various layer traces are respectively presented in Fig. 3. (b), (d), (f) and (h). In Fig. 3, the pink color bar indicates the auto-scaled value and the orange color bar indicates the manual scaled values. Es layer details are presented with blue edge color, F1 layer details with green edge color, and F2 layer details with black edge color. When compared with the manually scaled values, the auto-scaling block resulted in F1 and F2 layer fmin values with an error of 0.03 MHz and 0 MHz ( Fig. 3 (a)) and h'f values with an error of −0.4 km and 3.0 km ( Fig. 3 (b)) respectively for the input ionogram shown in Fig. 2 (a).
In the case of the ionogram shown in Fig. 2 (d), the autoscaling block outputs Es, F1, and F2 layer fmin values with an error of 0.02 MHz, 0.07 MHz, and −0.05 MHz (Fig. 3 (c)) and h'f values with an error of −2.0 km, 1.0 km, and 2.0 km (Fig. 3 (d)) respectively.
Similarly, in the case of sporadic E and spread F event traces in the ionogram (Fig. 2(g)), the auto-scaling block extracted fmin and h'f values respectively from Es, and SF traces with an error of 0.12 MHz (Fig. 3 (e)) and 1.8 km ( Fig. 3 (f)), and −0.03 MHz (Fig. 3 (e)) and −0.8 km (Fig. 3 (f)). And finally, in the case of spread F trace class ionogram (Fig. 2 (j)), the auto-scaling block outputs fmin and h'f values with an error of 0.02 MHz (Fig. 3 (g)) and 2.2 km (Fig. 3 (h)), respectively. The better scaling accuracy in fmin and h'f is due to the considering average of the first 5 minimum points during the scaling process.

C. MODEL PERFORMANCE EVALUATION DURING ST. PATRICK'S DAY STORM
The complete auto-scaling module performance is evaluated with about 432 raw ionogram files recorded during one of the major storms from 16th to 18th March 2015. The F layer's fmin and h'f results of the proposed auto-scaling model are compared with the UDIDA manual scaling values. Fig. 4 shows the comparison results, and corresponding statistical results are presented in Table 3. It is clear from Fig. 4 (a) and (b) that the proposed auto-scaling software results of fmin and h'f are closely following the manual scaling values during dawn and dusk periods and a bit overestimation (error) during mid of the day on pre, post and storm days.
Overestimation (higher than manual scale value) could be because of eliminating first points by the noise filter due to treating them as noise or miss interpretation of weak signal indications from the ionogram during manual scaling (which results in lower values). It is also evident from Table 3 that the proposed auto-scaling software better extracted fmin and h'f values from all valid ionogram files. It is also noticed   that the proposed auto-scaling software tool assigned NaN values at the blank or un-useful ionograms (Fig. 4). The overall RMSE of 0.7 MHz and 22.36 km in fmin and h'f, respectively in the case of large ionogram data set, is due to the filtering process adopted, the accurate identification of the traces in the ionogram by the CNN trace detection block, layer-wise segmentation process and considering the average of first 5 minimum points during the scaling process. The RMSE values obtained are very close to the acceptable range mentioned in Jiang et al. (2017) [8]. The acceptable value is within ±0.5 MHz of the manual value for the frequency and ±25 km of the manual value for the height.

IV. CONCLUSION
In this paper, the CADI ionogram processing and auto-scaling software tool are presented on an open-source (Python) platform. The complete module is implemented with a noise filter, CNN-based trace detection, segmentation, and scaling modules. A VGG-16 net, a subset of deep learning CNN, is used to detect traces in a wide variety of ionograms.
Initially, the CNN-based trace detection module is trained, validated, and tested with more than 50% (40,000) of images recorded from 2014 -2015 at Hyderabad, India station. Optimized the trace detection accuracy of the CNN module and compared the results with traditional ANN. Then, the proposed auto-scaling software tool individual block performance is evaluated with various classes of ionograms, and the performance of the complete auto-scaling model is evaluated using the ionogram data set recorded from 16th to 18th March 2015. Finally, proposed auto-scale software tool results are compared with UDIDA manual scaled values. Auto-scale results of the proposed model are very much close to the manual scale values. The MAE (0.36 MHz, 11.72 km) and RMSE (0.7 MHz, 22.36 km) values show the model's fair performance. The better accuracy is achieved due to the implementation of noise filter, CNN-based trace detection, layer-wise segmentation, and considering the average of the first 5 minimum points in the proposed auto-scaling software tool.