Chest X-ray Image Analysis with Combining 2D and 1D Convolutional Neural Network based Classifier for Rapid Cardiomegaly Screening

Cardiomegaly is an asymptomatic disease. Symptoms, such as palpitations, chest tightness, and shortness of breath, may be the early indications of cardiac hypertrophy, which can be divided into cardiac hypertrophy and ventricular enlargement. Their causes and treatment strategies are different. The early detection of cardiomegaly can help to make decisions for administering drugs and surgical treatments. In addition, with regard to problems in manual inspection, such as time consuming and the need for human interpretations and experiences, an assistive tool is required to automatically develop and identify normal heart or enlarged hearts. Therefore, this study proposes the combination of 2D (two dimensional) and 1D (one dimensional) convolutional neural network based classifier for rapid cardiomegaly screening in clinical applications based on chest X-ray (CXR) examinations in frontal posteroanterior view. The 2D and 1D convolutional processes and multilayer connected classification network are used to enhance the original CXR image and to remove unwanted noises to increase accuracy in feature extraction and pattern recognition tasks. The training dataset and testing dataset are collected from the National Institutes of Health CXR image database, which is used to train the classifier and validate the performance of the classifier in a K-fold cross-validation manner. Experimental results indicate the potential performance for rapid cardiomegaly screening with regard to recall (%), precision (%), accuracy (%), and F1 score.


I. Introduction
An enlarged heart, which is known as cardiomegaly, is not a serious disease, and it may have no signs or symptoms in some people and may have symptoms, such as shortness of breath, abnormal heartbeat (arrhythmia), and edema, in others. Cardiomegaly will cause your heart to pump harder than usual or gradually damage your heart muscle. Congenital heart diseases or abnormal heartbeats can cause heart enlargement, resulting in high blood pressure, heart valve disease, and cardio myopathy. The risks of complications for cardiomegaly include heart failure, blood clots, heart murmur, and cardiac arrest. Therefore, the firstline chest X-ray (CXR) image [1][2] is an easy inspection method to directly detect the presence or absence of an abnormality for cardiopulmonary disease detection. This imaging inspection is cost effective, and it has low radiation dose for availability to rapidly screen cardiomegaly. In first-line examination, as seen in Figure 1, the cardiac silhouette can be used to estimate the index of the cardiothoracic ratio (CTR) by radiologists, which has a 0.50 threshold value for separating the normal condition (no finding) from cardiomegaly [3][4][5][6][7][8][9]. However, the CTR index is required to select the maximal horizontal cardiac diameter (MHCD) and the maximal horizontal thoracic diameter (MHTD) manually or by using segmentationbased methods (as shown in Figure 1). A CTR index greater than 0.50 indicates the symptoms of enlarged heart. However, manual screening has insufficient human resources, and it is time consuming in medical diagnosis. Given the gray-scale gradient changes in the edges between the lung and heart, segmentation-based methods, such as active shape models, pixel classification, active appearance models, and Harris operator, are used to extrapolate the boundaries of the right/left lung and heart regions in a CXR image. Hence, the heart contour can be critically identified for the desired object location. Then, the feature map of a cardiopulmonary disease can be easily searched with the specific bounding box. However, CXR images may contain noise, and the traditional nonlinear mapping, intensitybased, and gradient-based method is sensitive to noise. It also needs manual manner to determine the MHCD and MHTD for estimating CTR. In addition, digital noise, such as Gaussian noise, Poisson noise, or speckle noise [10][11][12], usually affects the quality of medical images in details and edges, thereby reducing the efficiency of image segmentation, image classification, and pattern recognition tasks. Hence, an image denoising method is necessary to improve the quality of digital medical images.
The noise [10][11] on CXR images usually contains a low dose of ionizing radiation, which affects the quality of images for cardiopulmonary-related disease diagnoses. Digital filters, wavelet analysis, principal component analysis, and machine learning methods [10] have been proposed to remove such noise and thus improve CXR image quality. However, these methods cannot remove Gaussian and Poisson noise. Hence , to address the abovementioned problems, this study aims to design a multilayer classifier capable of noise filtering, image enhancement, feature extraction, and classification tasks in image preprocessing. Deep-learning-based methods, such as DenseNet (Dense Convolutional Network) [13], ResNet (Residual Network) / FC-ResNets (Fully Convolutional Residual Network) [14][15], and UNet (Fully Convolutional Network) [16][17], can be used for feature enhancement, feature extraction, and classification to automatically screen the presence of cardiomegaly on posteroanterior (PA) CXR images.
These multi convolutional-pooling layers and fully connected network can train the end-to-end and pixel-topixel image segmentation, which show promising results in this study. These multilayers and classification network can also be used for automated segmentation of liver or tumors based on computed tomography images [6,18]. They have high performance for multilabel classification applications using the NIH (National Institutes of Health, NIH, Clinical Center) CXR dataset [2,13]. These 2D fully convolutional neural networks (CNNs) are usually greater than 10 convolutional -pooling layers. Thus, they can perform the image preprocessing and postprocessing tasks to filter noise, enha nce the fea ture ma p , and then increase the identification accuracy. Through a series of convolution and pooling processes, a multilayer CNN with high capacity for visual object detection can enhance and extract the desired features at different scales and different levels from low-level features (objects' edges or curves) to highlevel information (objects' shapes) for detecting nonlinear features. Hence, by increasing the image preprocessing scheme, the network can increase nonlinearity and obtain feature representation. Then, a pooling process with max pooling is used to reduce the sizes of feature maps for obtaining abstract features and overcome the overfitting matter in the learning stage [19][20]. Those feature maps can be combined to classify the input CXR images into the possible class. The deep learning (DP)-based CNNs have gradually reduced error rates for classification applications and can also work with noisy data to improve image resolution [21]. However, these processes will increase the complexity levels and have some limitations, such as the determination of a number of convolutional-pooling layers (multilayers), the sizes of convolutional masks (3  3, 5  5, 7  7, 9  9, 11  11, …) assignments, large-scale training dataset requirements, high computational complexity for training classifier. In addition, DP-based methods are required on a graphics processing unit (GPU) to accelerate the training tasks with a large amount of training dataset.
Therefore, to simplify the image processing and classification tasks, this study establishes a suitable convolutional -pooling layer and a fully connected network to achieve good accuracy for image classification in cardiomegaly detection. We will utilize the combination of a 2D and one-dimensional (1D) multilayer CNN [14-17, 19-20, 22], consisting of a 2D convolutional layer, flattening layer, 1D convolutional layer, pooling layer, and This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.  multilayer classifier in the classification layer. In the first 2D convolutional layer, we use 2D fractional-order-based convolutional processes with different scale fractionalorder parameters (v  [0, 1]) [23][24][25] to detect the heart's edge and contour in the specific region. Along the horizontal and vertical directions, two fractional-orderbased convolutional windows (with a sliding stride = 1) are used to perform spatial convolutional processes for enhancing the heart's feature and removing noises from the original CXR image. With the suitable fractional-order parameters (v = 0.2 ~ 0.4) [23][24][25], the heart feature map can be discriminated as the region of interest (ROI) by the spatial convolutional processes for feature extraction. Then, in a ROI, with a specific bounding box, the heart feature map can be easily selected from CXR images. In the flattening layer, the 2D image is converted from matrix presentation to vector presentation as a 1D feature signal by flattening. In the second 1D convolutional layer, a 1D kernel convolutional window (with a sliding stride = 1) is subsequently used to deal with the 1D feature signals and can preliminarily quantify the difference levels in feature signals [26]; hence, these feature signals can be distinguished to separate nor mal conditions from cardiomegaly.
In the classification layer, a fully connected network, consisting of an input layer, pattern layer, summation layer, and output layer [23][24][25], is established as a multilayer classifier to identify cardiomegaly by mapping the relationship between input feature patterns and normal condition (CTR < 0.50) or cardiomegaly (CTR  0.50). The optimization algorithms, such as forward and back propagation (FBP) algorithm [27][28] and gradient descent algorithm, can be performed in parallel to adjust the network parameters in the preceding layer from an output layer to hidden layers. The FBP algorithm has been applied in feed-forward multilayer perceptrons. However, FBP algorithm will incrementally optimize network's overall parameters, including network-connected weights and neuron's biases. The change magnitude with error backpropagation in network parameter adjustment is iteratively updated, resulting in a slow convergence speed and great volatility, making it easy to fall into a local optimum in the training stage [30]. In this study, the gradient descent algorithm [23][24][25] uses the gradient values to refine the optimal network parameter with iteration processes to increase the classification accuracy. We can select an appropriate learning rate to speed up classifier's training processes, and then the convergence speed is also increased. In experimental validations, CXR images are enrolled from the NIH Clinical Center's CXR database [2,13]. As shown in Figure 1, the labeled "No Finding" and "Cardiomegaly" images are divided into training and testing datasets to train the proposed classifier in the training stage and validate classifier's feasibility in the recalling stage. Using cross-validation, the experimental results indicated the classification efficiency for automatic cardiomegaly screening on 2D PA CXR images.
The remainder of this study is organized as follows: 2 VOLUME XX, 2021

A. Experimental Setup
The NIH CXR dataset comprised 112,120 PA X-ray images with disease labels from 30,805 patients (which are collected from 1992 to 2015) [2,13], which are collected from text radiological reports using natural language processing and are stored in hospitals' picture archiving and communication systems. This medical image database shows common thoracic diseases, which can be detected and located with multilabels by validating using artificial intelligence methods. These disease labels are expected to be > 90% accurate for supervised learning classification in thorax disease screening. This dataset contains 14 disease labels, such as pneumonia, effusion, infiltration, nodule mass, and cardiomegaly, and one of these labels is cardiomegaly. We will select 200 images from this medical image database, including 100 images with cardiomegaly (positive label) and 100 images with normal condition (nofinding labels). The enlargement of the cardiac silhouette may be due to cardiomegaly, pericardial effusion, or anterior mediastinal mass. The cardiothoracic ratio (CTR) is an aided index to assess the enlargement of the cardiac silhouette, and CTR can be represented as follows [3][4][5][6][7][8][9]: where CTR is measured on PA CXR view (as seen in Figure 1), which is the ratio of MHCD to MHTD, inner edge of ribs / edge of pleura). A mean index for normal condition is 0.42 -0.50; < 0.42 indicates pathologic, and > 0.50 is usually used to identify the abnormal conditions for indicating > 0.55 as cardiomegaly and 0.50 to 0.55 as mild cardiomegaly [7,[30][31]. Hence, CTR can be used as a threshold value for cardiomegaly evaluation. Then, the labeled CXR images can be used to train the deep-learning based CNN as a classifier for separating normal condition from cardiomegaly.

B. Chest X-ray Image Preprocessing
The CXR images can be converted from Digital Imaging and Communication in a medicine format to a tagged image file (TIF) format. The TIF is a lossless image format, which can lower the computation time for automatic CXR image examinations. Each size of the CXR image is specified as 1,024 (width) × 1,024 (length) pixels, 8 bites / pixel, with 0 -255 gray-scale values. In addition, image processing must speed up the pattern recognition task; thus, we need to reduce the sizes of X-ray images before feeding images into the multilayer CNN-based classifier. Hence, we perform rescaling to downscale images from 1,024 × 1,024 pixels to 420 × 420 pixels and maintain sufficient image visual details for indicating heart contours, as shown in the normal condition (no finding) and cardiomegaly in Figure 2. Then, the 100 × 200 bounding box (BB) is used to extract the region of interest (ROI) for obtaining the pathologic information of cardiomegaly, as shown in the gray-scale feature maps in Figure 2.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3171811, IEEE Access Author Name: Preparation of Papers for IEEE Access (July 2021) 2 VOLUME XX, 2021

C. Combining 2D and 1D Convolutional Processes for Image Enhancement and Feature Extraction
In dealing with the 2D CXR image and increasing the classification accuracy, we aim to use 2D and 1D convolutional-pooling processes for image enhancement and feature extraction. In 2D spatial convolutional processes (as seen in Figure 3), two fractional-order convolution (FOC) masks are used to process 2D CXR images and extract low-level features, such as heart's edges and corners. Each fractional-order mask is moved with a stride of 1 (stride = 1) and with zero padding in the horizontal and vertical directions, which can be set as a 3 × 3 sliding window with different fractional-order parameters, v  [0, 1], to perform the operations of convolutional weights and the general form of the FOC, which can be presented as follows [23][24][25]: where FOC() is the fractional-order operator; I xy , I xy  [0, 255] is the pixel value at location (x, y) in a 2D CXR image, where the image dimension is calculated as n  n, x = 1, 2, 3, …, n, and y = 1, 2, 3, …, n; FOCI xy is the mapping value at location (x, y); v is the fractional-order parameter. Moreover, based on the Grümwald-Letnikov theory [25,[31][32], M is the 3 × 3 fractional-order mask, and the mask matrix can be represented as follows: where M x and M y are the FOC masks in the horizontal and vertical directions, respectively. Each FOC mask multiples each element by the corresponding input pixel values, I xy , and then obtains an enhanced feature pattern containing spatial features. The 2D convolution can be performed by a FOC mask in the x direction and then convolving with another FOC mask in the y direction, which act as two lowpass frequency filters [10] and then remove the highspatial-frequency components from a CXR image. It serves as a smoothing filter for edge detection [23][24][25]32]. The results of the convolution processes are combined and normalized as follows: where FOCI xy,x and FOCI xy,y are the convolution results in the horizontal and vertical directions , respectively. After the first image enhancement, using the 100 × 200 BB at the specific region, the feature map of a heart can be extracted from the enhanced CXR image and then can be flattened (FLAT) from a matrix presentation, FOCI (image: 100 × 200), to a vector presentation, FLATI x (signal: 1 × 20,000). Then, 1D convolutional operator is used to perform the second image enhancement process, and can be presented in a discrete-time convolutional form [26]:  (9) where x[i] is the subsampling feature signal obtained with a sliding stride = 40. Hence, the vector dimension of the feature signal can be reduced from N + M − 2 to n′ without zero-padding (n′  500). As seen in Figure 4, for feature maps of normal condition (CTR  0.50) and cardiomegaly (CTR > 0.50), the number of feature parameters is reduced to  25% of the total number of a feature map, and can retain key feature, which can reduce the computational complexity. In addition, as shown in Figure 4, feature maps in vector form (as 1D feature signals) are used to preliminarily quantify the different levels (red and green feature maps) for further classification applications.

D. Multilayer Classifier in Classification Layer
In the classification layer (as seen in Figure 3), a multilayer connected network, consisting of an input layer, a pattern layer, a summation layer, and an output layer, is used to establish a classifier with a feeding feature signal for further cardiomegaly screening. The pattern layer can map the 1D feature signal and output into a high-dimension space by using a linear combiner. Hence, the output of the pattern layer can be represented as follows: ] 2 where w ki is the network-weighted values between the input layer and pattern layer, as a matrix W = [w ki ] Kn ′ (n′ = 500), which can be set by using K  500 input training feature signals. Classifier's output can be normalized as follows: where t j is the jth desired target outputs referring to the input signals in the training dataset; T = [t 1 , t 2 ] for multiple classes, and   (0, 1) is a tolerance error. The gradient descent method is used to adjust the optimal parameter, σ opt , using the iteration computations [25][26]32], and the gradient values can be computed as follows: Hence, the optimal parameter, σ opt , can be refined using the iteration computation: where  is the learning rate, 0 <   1; p is the number of iteration computations, p = 0, 1, 2, 3, …, p max , and p max is the maximum iteration number. The optimal parameter, σ opt , can be used to minimize the function of MSE.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

III. Experimental Results and Discussion
The NIH CXR image dataset [2,13] was used to validate the intended medical purpose and evaluate the proposed combining 2D and 1D CNN-based classifier for two-class classification. Each enrolled image was resized from 1,024 × 1,024 pixels to 420 × 420 pixels (96 dpi, with a bit depth of 32 bits), which were converted from digital imaging and communication in a medicine format to a tagged image file (TIF) format. The TIF format is a lossless image that could lower the computation time [24]. Figure 5 shows the distributions and statistics of normal condition (CTR  0.50) and cardiomegaly (CTR > 0.50) for 100 enrolled subjects. Given a specific BB with 100 × 200 pixels, feature maps could be screenshot from the 100 CXR images and then could be used to train the proposed classifier. In an automatic cardiomegaly screening design (as seen in Figure 6), four processes were identified: (1) CXR image enhancement using 2D spatial fractional-order convolutional processes, (2) feature map extraction (ROI), (3) 1D convolutional pooling, and (4) cardiomegaly screening with the combination of 2D and 1D CNN-based classifier. The proposed digital image process and classifier algorithms were implemented on a tablet PC using a highlevel graphical programming language in LabVIEW and MATLAB software (NI™, Austin, TX, USA), and the GPU (NVIDIA® GeForce® RTX™ 2080 Ti, 1755 MHz, 11 GB GDDR6) was used to speed up the executed time for digital image processing and pattern recognition tasks. Table 1 shows the related data of the proposed multilayer classifier, including its layer functions, manners, and feature maps. The feasibility study was validated as described in detail in the subsequent sections.

A. Feasibility Tests for the Proposed Multilayer Classifier
For the 200 enrolled subjects from the NIH CXR image database, we could extract 200 feature maps using the 2D convolutional process and 1D convolutional pooling, including 100 normal map and 100 abnormal maps for cardiomegaly. In this study, we could randomly select 100 trained feature maps to train the multilayer classifier in the learning stage. Then, using 100 pairs of input-output feature maps, we could establish a fully connected topology network, consisting of 500 input nodes, 100 pattern nodes, three summation nodes, and two output nodes in the    Figure 3 and Table 1). In the literature [23][24][25], the 3  3 fractional-order convolutional mask using fractional-order parameters v = 0.20 − 0.40, (v = 0.30 was selected) could yield promising results for image enhancement and remove noise [23][24][25], as shown in Figure  7. Hence, the heart's edge and contour could be identified and then easily selected from a CXR image. Then, in the feature extraction layer, 1D convolutional pooling was used to extract the feature patterns in vector form and reduce the feature parameters, thereby addressing overfitting in the learning stage. The trained feature maps (as shown in Figure 4) were fed into the classifier, and the convergent condition was set as the tolerance value, ε ≤ 10 −2 , and the initial condition, that is, σ 0 = 1.0000. Furthermore, the gradient descent method was used to adjust the smoothing parameter in the pattern layer to minimize the MSE function using the iteration computations. Figures 8(a) and 8(b) show the training history curves for the proposed classifier, as optimal parameters and MSEs versus iteration numbers, respectively. Using different learning rates, η = 0.3 − 0.6, and the gradient descent method required < 20 iterative computations to reach the specific convergent condition. Given these optimal solutions (Figure 8(b)), the optimal parameter, σ opt  0.0.0581, could minimize the MSE function and increase the classification accuracy in the learning stage (100% accuracy). The overall iteration computations reached an average CPU time of 4.8430 s to refine the optimal parameter. Hence, the proposed classifier This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

B. Cross-Validation Tests for Proposed Multilayer Classifier
For feasibility tests in clinical applications, the collected feature maps were divided into two groups, including the trained dataset and untrained dataset, and then the trained and untrained feature maps were randomly selected to train the classifier, which was used to validate the classifier by performing 10-fold cross-validation. As shown in Table 2, four criteria were used to evaluate the proposed classifier model, including precision (%), recall (%), accuracy (%), and F1 score indexes [23,32]. Accuracy (%) was an index to measure the percentage of correct classification; thus, a classifier had been established in the recalling stage. For screening positive cases, we could observe that the recall (%) index, which indicated the number of positive cases, could be predicted. The precision (%) index was also known as the positive predictive value (PPV), which indicated the number of positive cases (including the correct percentage). The F1 score index was a harmonic mean of the precision (%) and recall (%) (combining the precision and recall into a single screening metric), which indicated that the F1 score provided equal weight to precision and recall, including the errors of false positives and false negatives; a classifier could obtain a high F1 score, whereas both precision (%) and recall (%) had high values.
After training the classifier using 100 trained feature maps, 100 untrained feature maps, including 50 for normal subjects and 50 for cardiomegaly subjects, were randomly selected from the dataset to validate the classifier. Using the same validation process, 10-fold cross-validations were performed; the experimental results of the proposed classifier are shown in Table 3, with an average precision (%) of 97.60% and an average recall (%) of 99.20% for predicted abnormality and correctly identified abnormality (true positive [TP] for CTR > 0.50), respectively, an average accuracy of 98.40% for correctly identified normal and cardiomegaly, and an average F1 score of 0.9838 for the proposed classifier performance on classification tasks, which was greater than 0.9000, indicating the potential application of the classification model. In addition, recall (%) as an index of PPV, was greater than 80% based on the predictive performance of the classifier. Hence, we could recommend the combination of 2D and 1D CNN-based classifier to automatically screen the presence of cardiomegaly on PA CXR images in clinical applications.

C. Discussion
Experimental tests showed promising results for the proposed classifier in automatic cardiomegaly screening using PA CXR images. In addition, the 10-fold crossvalidations were performed, as seen in Table 3, and the manual method with CTR estimation (using equations (1) and (2)) in 100 CXR images had good reproducibility and accuracy, with average CTRs of 0.4377 and 0.6307 for identified normal condition and cardiomegaly, respectively. Using the related classifier's data in Table 4, we also established a multilayer 2D CNN-based classifier consisting of a fractional-order convolutional layer with two convolution masks (Stride = 1), a kernel convolutional layer with 16 kernel convolution masks (Stride = 1), 16 maximum pooling masks (Stride = 2) in a pooling layer, a flattening layer, and a fully connecting classification network (multi-layer perceptron). The fully connecting network consisted of an input layer (1, 250 nodes), two hidden layers (168 and 64 nodes), and an output layer (2 nodes). The 2D CNN-based classifier was implemented using the open source Tensorflow platform (Version 1.9.0) in Python [33][34]. The same trained and untrained feature maps with 10-fold cross-validation obtained an average precision of 97.80%, an average recall of 98.20% for predicting the possible cardiomegaly and correctly identifying TP, an average accuracy of 98.00% for correctly identifying normal condition and cardiomegaly, and an average F1 score of 0.9799 for verifying the classifier performance. The proposed multilayer classifier also performed better than the multilayer 2D CNN-based classifier, as shown by the experimental results in Table 4. However, the multilayer 2D CNN required more feature parameters for trained the classifier and then increased high computational complexity with the iteration computations. In addition, the GPU needed to accelerate the training process and parallelize computations with the multi convolution and pooling processes. It took an average CPU time of < 300 s for training the classifier. Hence, under the same multilayer architecture, the performance of the proposed multilayer classifier was superior to that of the 2D CNN -based classifier. In specific, the proposed multilayer classifier had less feature parameter requirement in the This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.  2D convolutional layer, simpler linear weighted sums for the 1D convolutional process to deal with the 1D feature signals, and simpler classifier implement process compared with the 2D CNN-based classifier.
In clinical diagnosis, clinicians and radiologists could rapidly use visual inspection to identify the normality or abnormality on PA CXR images and then compute the CTR indexes. However, manual inspection was time consuming, and the diagnostic results were dependent on readers' interpretations and experiences. The proposed combination of 2D and 1D CNN-based classifier's diagnostic tests took less than 2 s CPU time in dealing with 100 CXR images. Hence, automatic screening could address the insufficient human resources for manual screening and allow clinicians and radiologists to focus on follow-up medical strategies. Some advantages of the proposed classifier are shown below:  The feature maps could be enhanced in 2D spatial convolutional processes by identifying the heart's edge and contour and removing noise;  The feature maps in vector form (as feature signals) were used to quantify the different levels for separating the normal condition from cardiomegaly.  The dimension of feature patterns could be reduced to improve the overfitting problems;  The multilayer classifier could be easily established by the trained dataset with the key input-output paired feature maps and easily implemented using highlevel programming languages (Language C / C++ or MATLAB software).

VOLUME XX, 2021
In addition, the proposed classifier had a limitation in identifying heart enlargement or myocardial hypertrophy. The determination of the heart size, such as four chambers (ventricles and atriums), was an important measurement parameter to evaluate potential cardiomegaly. Cardiac echocardiography (CECHO), cardiac magnetic resonance imaging (CMRI), and cardiac computed tomography (CCT) [35][36][37][38] were superior to chest radiography, which provided good imaging to assess the heart chamber size and determine the heart chamber. The results of CECHO showed promising sensitivity and specificity in determining cardiac chamber sizes [36,39], as the gold standard, and high correlation between CTR indexes and heart sizes. Compared with the expensive manners, such as CMRI and CCT, this study recommended that the CXR images were easy and cheap; thus, it could directly estimate the CIR index for heart size measurement in preliminary examination during first-line examination; hence, during automatic screening tool verification, the larger the F1 score (> 95%), the proposed multilayer classifier had the better performance in separating the normal condition from cardiomegaly and the greater its authenticity for an informed decision.

IV. Conclusion
We developed a combining 2D and 1D CNN-based classifier with CXR images to identify the disease present in normal condition or cardiomegaly during first-line examination, and the performance of the proposed classifier was also validated. In the convolutional layer, the sequence of 2D fractional-order and 1D kernel convolutional processes was used to enhance the image and remove unwanted noise, which could help to extract the 2D feature maps with specific BB and to transform into 1D feature signals for further classification tasks. Flattening and 1D pooling processes could reduce the dimension of the feature map, leading to low computational operations for real-time digital image processes and pattern recognition tasks. Using 10-fold cross-validation, randomly untrained feature map was fed into the classifier, and its pattern recognition scheme showed promising results in separating the normal condition from the cardiomegaly, as the average recall, average precision, average accuracy, and average F 1 score were greater than 95% for screening abnormalities. The experimental results indicated that the training model, computational efficiency, and automatic screening were better than the manual method in clinical application. Training and examining the classifier and CXR images would take less than 5.0 s. Using image examinations, such as CECHO, CMRI, and CCT [35][36][37][38], the four chambers of the heart could be accurately estimated to evaluate the enlarged heart or myocardial hypertrophy (left ventricular hypertrophy). Considering the absence of pain or immediate risk, the above imaging manners were a potential way to measure heart size, muscle thickness, and pumping function to discover cardiomegaly, such as left and right ventricular hypertrophy. Ro utine chest r a d i o g r a p h y c o u l d a l s o r a p i d l y i n s p e c t d i l a t e d cardiomyopathy, which might increase heart size on a CXR image, such as right/left atrial shadow (atrial enlargement) or right / left ventricular hypertrophy (ventricular enlargement) [40][41]. Therefore, the proposed multilayer classifier could replace the manual manner for tasks requiring specific expertise and experience (clinicians and radiologists) for medical image examinations. In addition, the trained dataset could be divided into four classes, incl ud i ng no r mal co nd it io n (CT R  0.5 0 ), mild cardiomegaly (0.50 < CTR  0.55), moderate cardiomegaly (0.55 < CTR  0.60) [7][8], and severe cardiomegaly to train the proposed classifier, which could maintain its intended medical purpose in real-world application and could also raise its indication in clinical applications as a computeraided decision-making tool.