Bayesian-Convolutional Neural Network Model Transfer Learning for Image Detection of Concrete Water-Binder Ratio

Since concrete is still the most widely used building material at present, the phenomenon of engineering accidents caused by inaccurate mix proportion is extremely prominent. This makes the contradiction between timely and effective detection and test results cannot be given quickly by traditional technology become particularly outstanding. In this paper, a new method based on Bayesian-convolutional neural network model transfer learning is proposed to detect water-binder ratio, the most important parameter in mix proportion of concrete mixtures. Bayesian optimization was applied to pretrained convolutional neural networks to establish Bayesian-convolutional neural network models, avoiding tuning hyperparameters manually. The authors performed several experiments and obtained large numbers of images of freshly-mixed concrete mixtures, which were used as datasets to carry out water-binder ratio detection. These models achieved high accuracies on training, validation and testing sets. Applying these models, we could implement real-time and high sampling rate water-binder ratio detection. The authors integrated the models and developed a detection system of concrete mix proportion. Equipped with definite hardware facilities, this system can effectively monitor the quality of concrete in production process and prevent engineering accidents. According to training curves of the models, a new parameter was introduced to discuss how mix proportion influence the sensitivity of concrete mixtures apparent state to the change of water-binder ratio, which is an important consideration to preliminary assess the service behaviors of concrete. Through this parameter, we also explored the essence of image features learned by models is the fluidity of concrete mixtures.


I. INTRODUCTION
Concrete is a kind of artificial composite material mainly composed of cementitious materials and aggregate particles cemented in it. Due to the advantages of simple process of production, well-performed durability, plasticity and fluidity, low price, and easy access of raw material, which are irreplaceable by other building materials, modern concrete has been the most widely used building material in all kinds of industrial, civil and infrastructure construction in more than 150 years since its invention [1]- [3]. Mix proportion involves the choice of components and their proportions that will result in definite target properties of concrete. Most mix design methods suggest that strength in the hardened state The associate editor coordinating the review of this manuscript and approving it for publication was Nuno Garcia . and slump in the fresh state are the key properties influenced by mix proportion, which are also the most important service behaviors of concrete [4].
Researches indicate that water-binder ratio(W-B) is the most important factor affecting strength and slump of concrete among all mix proportion parameters when the type of cement is determined. Up to now, it is still applied by the mainstream norms all over the world as the basic principle of concrete preparation that compressive strength and the reciprocal of W-B(that is, binder-water ratio, B-W) show a good linear relationship for ordinary concrete whose strength is not too high [1]. Therefore, it can be said that W-B is the most important parameter in mix proportion.
China national standards point out [5], during construction process, the on-site sampling test shall not be less than once every 200m 3 when concrete is continuously poured over 1000m 3 for the same batch with the same mix proportion. Essentially, this is an inspection method with a low sampling rate and it is impossible to test most concrete products because of its essential characteristic of consuming products themselves. The on-site sampling test includes various items such as compressive strength test, slump test, resilience test. Meanwhile, it is estimated that more than 3.3 × 10 10 t of concrete are used annually all over the world [1]. Energy and resource consumption will be greatly reduced if a detection method that does not consume products themselves uses together with traditional method. On the other hand, under the condition of proper design and construction, the failure of building materials is the most important factor leading to engineering accidents. Simultaneously, traditional testing methods represented by the experiments after curing for 28 days are all posterior approaches. The deficiency of mechanical properties caused by inaccurate W-B usually could not be found until concrete strength has been taken shape. Failure to avoid errors in production process leads to the failure in timely detection and effective remedy. It is extremely urgent to apply computer science and artificial intelligence methods to timely detect concrete W-B with high sampling rates, so as to prevent the engineering accidents caused by material failure.
In the past few decades, researches used kinds of models (genetic algorithm, particle swarm optimization algorithm, kernel ridge regression and M5P tree model algorithm, support vector machine, etc.), series of artificial neural networks (BP neural network, RBF neural network and hybrid artificial neural network, etc.), as well as a variety of combination models, through mix proportion data, to predict the performance indicators of concrete (compressive strength, slump, modulus of elasticity, etc.). The types of concrete involved are also diverse (high-performance concrete, recycled concrete, environmentally friendly concrete, self-compacting concrete, fiber reinforced concrete, etc.). Abundant accomplishments have been achieved [3], [4], [6]- [29]. These researches directly processed numerical data, resulting in the lack of intuition, and quality defects caused by mistakes in production process, like the failure of production equipment, cannot be detected when mix proportion is correctly calculated.
In terms of image application, researches mainly processed the cross-section images of hardened concrete to obtain information such as aggregate particle grading and distribution, color and luster of cementitious materials, to evaluate its performance. Han et al. [30] developed two-dimensional image analysis method based on concrete cross-section image to evaluate coarse aggregate characteristic and distribution in concrete. Başyiğit et al. [31] applied image processing technique to assess compressive strength of concrete, and further considered that this technique can be an auxiliary tool to destructive and non-destructive testing methods. Dogan et al. [32] used artificial neural networks and image processing together, concluded that the described method is a good alternative to the traditional methods to identify the mechanical properties of concrete. Wang et al. [33] used digital image processing method to evaluate binary images of cross sectional of self-consolidating concrete, and developed statistical models to predict flowability of self-consolidating concrete mixtures. These researches involving image data adopted cross-section images of hardened concrete, which also require 28 days for curing and lost the timeliness.
In terms of monitoring or identification mix proportion, Jung et al. [34] proposed a new fingerprinting method for identifying the concrete mix proportion by the acid neutralisation capacity of concrete, adopted the empirical equation to determine Chung et al. [35] presented mathematical models for monitoring mix proportions of concrete based on dielectric constant measurement, and put forward that applying these models, new diagnosis methods could be developed for monitoring accurate mix proportions by measuring the microwave permittivity of fresh concrete at early ages. These approaches also require some time for detection. Therefore, all the above achievements have certain limitations, and no definite research applying image of concrete mixture was found for real-time mix proportion or W-B detection.
In this study, the authors aimed at overcoming the limitations of existing detection methods. For this aim, 67 sets of images of freshly-mixed concrete mixtures with different mix proportions were collected from experiments, and divided into 15 datasets according to mix proportion, each dataset contained four or five sets of images (refer to section II for details). Four pretrained convolutional neural networks (CNN): AlexNet, VGG16, GoogLeNet, and ResNet101 were selected and fine-tuned. Bayesian optimization was applied to fine-tuned CNNs to establish Bayesian-Convolutional Neural Network(B-CNN) models. For each model, we found optimal network hyperparameters and training options to conducted transfer learning, and selected the best-performed model after comparing accuracies and elapsed time for each dataset. In this way 15 B-CNN models were obtained to detect W-B of concrete mixtures. Integrating these models, the authors developed a detection system of concrete mix proportion which can be applied to practical engineering. When attached to certain hardware facilities, this system can implement real-time and high sampling rate detection of W-B in production process and effectively prevent engineering accidents from happening. Table 1 lists comparisons of traditional testing method and B-CNN method, showing that the latter overcomes the limitations of the former mentioned above. Finally, a new parameter was introduced to describe the sensitivity of concrete mixtures apparent state to the change of W-B, through which we also explored the essence of image features learned by the models.

A. DESIGN OF MIX PROPORTION AND EXPERIMENTS
Cement concrete is generally composed of cement, coarse aggregate, fine aggregate and water. Mineral admixtures (such as fly ash) and chemical admixtures (such as superplasticizer) could be added to improve specified properties VOLUME 8, 2020 when necessary. In the present study, experiments were conducted using ordinary concrete, which was only composed of cement, water, sand as fine aggregate and crushed stone as coarse aggregate, since these are the most basic components of concrete, also the basic of other modified concrete research.
Actually, when design and calculate mix proportion, except water dosage, the quantity or proportion of other components mentioned above are usually not directly used, but converted into two dimensionless quantities for discussion, namely, water-binder ratio (W-B) and fine aggregate ratio (FAR), which are defined as: where w and c represent water and cement dosage (kg/m 3 ) respectively. a f , a c represent fine aggregate and coarse aggregate dosage (kg/m 3 ) respectively. Taking the fact that the density of ordinary concrete is 2350 − 2450kg/m 3 [36] into consideration, the content of above four components can be uniquely determined by water dosage, W-B and FAR for a certain mix proportion. In addition, the nominal maximum particle size of coarse aggregate (NMSCA) is also an important parameter affecting fluidity, the selection of FAR and water dosage. Therefore, W-B, FAR, water dosage and NMSCA were selected as considerations for mix proportion design.
In order to cover all possible mix proportions in construction, the ranges of mix proportion parameters were considered as follows: Hydration reaction occurs when cement meets water. This is the essential theory of cementitious materials taking their effect in concrete. It is widely accepted in the discipline of civil engineering that minimum W-B value for complete hydration is about 0.25-0.3. Incomplete hydration may occur when W-B is less than this value, resulting in cement wasting. Meanwhile, W-B is negatively correlated with compressive strength, and large W-B will lead to bleeding of concrete mixture. The recommended W-B range in engineering is about 0.3-0.7 or 0.8.

2) FAR
FAR is the main factor affecting concrete fluidity. Fine aggregate cannot fully fill the gap between coarse aggregate when FAR is small, whereas larger FAR leading the increasing of specific surface area of the aggregates, both of which will affect service performance of concrete mixtures. Reasonable range of recommended FAR in civil engineering is about 25%-45%.

3) NMSCA
Large quantities of cement will be consumed and production cost will greatly increase if NMSCA is too small. Large coarse aggregate particles will lead to the increase of weak links and affect mechanical properties of concrete.
Considering proportion design in practical engineering, water dosage was kept at a constant value of 200kg/m 3 . W-B, FAR and NMSCA were selected as the variables to mix concrete. Based on above three factors and relevant materials [36]- [38], three variables were classified into three or five grades. See Table 2 for the specific grades. The present study combined five grades of FAR and three grades of NMSCA listed in Table 2 to form 15 groups of experiments. The authors excluded some extreme mix proportions that would not be adopted in practical engineering and conducted a total of 67 experiments. W-B detection was carried out on the basis of these experiments when FAR and NMSCA were fixed. The specific mix proportions of 67 experiments is shown in Table 3. ''−'' means the corresponding mix proportions were excluded and experiments were not conducted.

B. MATERIALS
Suitable materials were selected to perform the experiments as the plan is confirmed.
The cement selected in experiments was P.O. 42.5 ordinary Portland cement produced by Tangshan Branch of JINYU JIDONG Cement Co., Ltd. Its basic physical and mechanical properties are shown in Table 4. After inspection, the content of its chemical components conforms to China national standards [39].
River sand with moderate fineness was selected as fine aggregate. Its physical properties conform to China national standards, as shown in Table 5 [40].
Crushed stone was selected as coarse aggregate. Table 6 lists its physical properties. When NMSCA>10mm, continuous size grading should be adopted to reduce the porosity of concrete and improve its strength. According to China national standards and relevant data [41], the continuous particle size grading of coarse aggregate was determined as shown in Table 7.

C. EXPERIMENT PROCESS
All experiments in this study were performed in Beijing from May to July in 2019. The concrete mixer of type HJW60, produced in Wuxi, was selected for concrete mixing, as shown in Fig.1. Coarse aggregate, fine aggregate and cement were added in turn, mixed the solid mixture for 15-20 seconds, then slowly added water in well-mixed solid mixture, mixed for 50-60 seconds, and poured the well-mixed concrete mixture VOLUME 8, 2020  into the tray with the size of 800mm×800mm, as shown in Fig.2.

D. IMAGE COLLECTION
To reduce the effect of light on image quality, a 1m high metal plate was fixed on each side of the tray to block uneven illumination. Hand-held image acquisition equipment took photos at a fixed height of 40cm from the bottom of the tray to reduce accidental errors. It is pointed out by China national standard [42] that the whole process of slump test should be completed within 150 seconds, taking too much time will lead to slump loss. Additionally, it takes little time for well-mixed mixture to enter the transport process. Therefore, image collection was completed within 120 seconds after the mixtures pouring to ensure the freshness of the apparent state of concrete.
We collected a total of 8340 images of freshly-mixed concrete mixtures from 67 experiments, and divided them into 15 datasets. Seven of these datasets contained images obtained from five experiments, and the remaining eight of them contained four sets of images. The specific numbers of each set of images are shown in Table 3. Relevant parameters of hand-held image acquisition equipment are shown in Table 8.

A. CNN AND TRANSFER LEARNING
As a typical deep learning neural network, starting with the raw input data, each composing module in CNN transform the representation at one level into a higher and more abstract level. With the composition of enough such transformations, complex functions or natural features could be learned [43]. The training of CNN is an end-to-end learning process [44], which can learn the features from the data implicitly. Therefore, it is unnecessary to manually extract data features, also avoids pre-processing or reconstructing the original data [16], [45]. The basic components of the first few units of CNN architecture are very similar, that is, adopts the serial convolution layer and pooling layer to arrange the data features layer-by-layer, and CNN was also named for this architecture [16], [46]. And the last unit consists of few fullyconnected layers and a traditional classification model [16].
It is expensive or even impossible for most traditional machine learning methodologies to recollect the data or rebuild the models in many practical applications [47], [48]. Transfer learning has emerged as a new learning framework to address this problem. The basic idea of transfer learning is to transfer knowledge from existing models and data to tasks to be learned, so as to implement the "transfer" of knowledge. In this definition, rather than learning target tasks and all the source at the same time, transfer learning is more concerned about target task. Current CNN models have high computing resources and complex computational requirements. They are also prone to fall into local optimization problems or over fitting, making transfer learning to be an ideal choice [47]- [49]. Another advantage of transfer learning is that it does not require a large number of datasets, instead, better accuracy could be achieved on a small dataset.

B. PRETRAINED CNN
CNN using ImageNet datasets for training has been emerging continuously recent years. In this study, transfer learning was carried out applying well-performed ones: AlexNet, GoogLeNet, VGG16 and ResNet101.
The great success of AlexNet is the beginning of deep learning with CNNs attracting extensive attention. AlexNet makes the following improvements on the basis of previous networks: 1) Apply rectified linear unit (ReLU) as a new activation function; 2) Employ dropout to avoid overfitting; 3) Use overlapping max pooling in pooling layers; 4) Propose local response normalization (LRN) in order to aid generalization [50]- [52].
VGG16 has a lot in common with AlexNet. The two are basically similar in convolution and pooling calculation, and VGG16 is also equipped with three similar fully-connected layers at last. Compared with AlexNet, VGG16 greatly deepens the network and the total number of parameters is very large of original VGG16 [53].
GoogLeNet proposed nine inception modules to overcome the drawback that high accuracies were usually achieved by deepening the layers, and the computational complexity exponentially increases as the layers become deeper. Multiple convolutions with multiple kernels and max pooling was permitted to take place simultaneously within a single layer by inception structures, to ensure that network selects more useful features and trains with optimal weights [54], [55].
ResNet solved the drawback that the accuracy becomes saturated and then degrades rapidly with the increasing of network depth. The credit for the tremendous performance goes to the core idea of shortcut connections introduced by ResNet. Shortcut connections simply perform identity mapping, which are those skipping one or more layers, and their outputs are added to those of stacked layers [56], [57]. The ResNet family provides various networks of different depths, ResNet101 network is selected for this study. Table 9 lists comparisons of relevant parameters of four kinds of CNNs. Other configurations of four networks are available in references list in this section.

C. BAYESIAN OPTIMIZATION
Since most hyperparameters are continuous variables, they have extreme loose constraints on their numerical ranges, and usually have coupling effects, selecting them is a basic challenge in deep learning researches. Trial-and-error optimization approaches are slow. Bayesian optimization is a powerful tool for joint optimization. Fundamentally, Bayesian optimization is a sequential model-based approach. There are two key ingredients in Bayesian optimization framework. The first ingredient is a probabilistic surrogate model, which consists of a prior distribution and an observation model that describes the data generation mechanism. The second is a loss function that describes how optimal a sequence of queries are. Ideally, Bayesian optimization minimizes the expected loss to select an optimal sequence of queries. A more informative posterior distribution is produced by updating the prior over the space of objective functions after observing the output of each query [58], [59].

D. ESTABLISHMENT OF B-CNN MODELS
Four pretrained CNNs mentioned above: AlexNet, VGG16, GoogLeNet and ResNet101 were all trained for image classification in a subset of ImageNet. This subset was applied by ImageNet Large-Scale Visual Recognition Challenge (ILSVRC) starting in 2010. The last fullyconnected layer and softmax layer divided the training set into 1000 classes. This number is much larger than the setting in this research that dividing W-B into four or five classes when FAR and NMSCA are fixed. Therefore, it is necessary to fine tune the networks to adapt our datasets so as to conduct transfer learning. The last fully-connected layer was removed and replaced with new corresponding layer. In the new fully-connected layer, Output Size value was set to four or five according to the mix proportion in Table 3. The rest relevant layers were reconfigured according to new configured fully-connected layer.
Bayesian optimization was applied to four fine-tuned CNNs to establish B-CNN models. It is unnecessary to optimize network structure since we applied pretrained CNNs. We only need to optimize hyperparameters of training options. In each model, the following training options remain fixed: (1) Solver Name: stochastic gradient descent with momentum (SGDM) optimizer.
(2) Learn Rate Schedule, Learn Rate Drop Period and Learn Rate Drop Factor: Drop the learning rate to 50% of initial learning rate in the last 5 epochs of training to help the network parameters settle down closer to a minimum of the loss function and reduces the noise of the parameters update.
(3) L2Regularization: 0.0001. (4) Gradient Threshold: Inf. (5) Gradient Threshold Method: L2norm. Scale the gradient so that the L2 norm equals Gradient Threshold when the L2 norm of the gradient of a learnable parameter is larger than Gradient Threshold.
(6) ValidationFrequency:1, that is, use validation set after each iteration to evaluate and modify the weights learned by the networks.  Meanwhile, Bayesian optimization was applied to optimize Mini-Batch Size, Epoch, Initial Learning Rate and Momentum. We preliminarily trained the network to determine the approximate search range of hyperparameters. In order to ensure rapid convergence of the models, we did not set continuous closed intervals as the search ranges of hyperparameters. Instead, several discontinuous values were delimited to form four sets of search values within their approximate range. The sets of search values of each training option hyperparameters are shown in Table 10.
B-CNN models combined selected values of four hyperparameters to be optimized to form variety combinations. Running the models with different combinations, different validation accuracies would be obtained. B-CNN models performed Bayesian optimization by minimizing the classification error of validation set, that is, Bayesian optimization used validation accuracy to select the best model. Thus, 60 B-CNN models can be established for 15 datasets. According to the specific CNN applied in the models, these models were named B-AlexNet, B-VGG16, B-GoogLeNet or B-ResNet101 respectively to distinguish them. The establish process of B-CNN models is shown in the form of flow diagram in Fig.3.

IV. RESULTS
All programs in this study were operated in MatlabR2018b. Hardware facility is a desktop computer equipped with 3.2GHz Intel i5-6500 CPU, 8GB RAM, x-64 based processor and NVIDIA GeForce GT730 GPU.

A. DATASET PROCESSING
A few blurred images and the ones with too much noise were deleted. Meanwhile, in order to enhance the robustness of the models, data augmentation was applied on each set to increase the images to original numbers, as shown in Table 3. To prevent the models from overfitting and memorizing exact details of training images, we did not add more images, since existing ones could cover all the mixture poured in the tray, and more images would lead to more detail repetitions. The specific methods of data augmentation include clipping at any size, flipping along the vertical axis, and randomly rotating 90 • , 180 • , or 270 • .
In each set of images, one image was randomly selected as testing image to form a testing set containing 67 images, and the rest were randomly divided into training set and validation set, among which training set accounted for 70% and validation set for 30%.

B. TRAINING OF B-CNN MODELS
Run the models to obtained the optimal training option hyperparameters. Fig.4 and 5 respectively show the validation accuracies and elapsed time of 60 B-CNN models trained with the optimal training option hyperparameters obtained by the models.
For the four models of the same dataset, the authors compared validation accuracies and elapsed time and referenced training curves to obtain the optimal B-CNN model. Undeniably, B-AlexNet model has the highest accuracy and the shortest training time on most datasets. As shown in Fig.4, although validation accuracies of B-VGG16 model is slightly higher than that of B-AlexNet model for dataset II and dataset VII, taking Fig.5 into consideration, elapsed time of B-VGG16 model is much longer than that of B-AlexNet model. Further referring to training curves, B-VGG16 model showed an overfitting phenomenon, and some models showed the phenomenon that validation accuracy is much lower than training accuracy as shown in Fig.6. Above facts intuitively indicate that for datasets with small number of images, using simple networks with less parameters and shallow depth for training can achieve high    accuracy and efficiency. On the contrary, the application of complex networks is prone to overfitting due to data insufficiency, which would influence training effect. Based on above reasons, we adopted B-AlexNet models to detect W-B of concrete mixtures for all the 15 datasets. The optimal hyperparameters, training and validation accuracies obtained by B-AlexNet models for each dataset are shown in Table 11 and Fig.7.
China national standards [60], [61] suggest that the median is taken as the representative compressive strength when the difference between maximum (or minimum) and the median compressive strength value of a set of specimens exceeds 15% of the latter; and when both differences of the maximum and the minimum value exceed 15%, the corresponding set of specimens is regarded invalid. And for the same acceptance batch, compressive strength shall meet the following requirements when sample size is not less than 15 groups: where, f cu,min represents the minimum compressive strength of concrete cube specimens in a same acceptance batch, f cu,k is the standard value of compressive strength of concrete cube specimens and λ is conformity assessment coefficient, valued as 0.85. VOLUME 8, 2020  Additionally, compressive strength and W-B are negative linearly correlated, therefore, we artificially set the accuracy threshold to 85% to judge whether the accuracies meet the requirements of practical engineering. As shown in Fig.7, training accuracies of 15 models were not less than 95%, and the highest reached 100%. In terms of validation accuracies, they reached the highest of 98.77% and lowest of 85.03%, both higher than 85%. Therefore, all the models performed well and meet practical engineering application requirements.
Since the final validation accuracy values given by the models are accuracies on the last iteration, confusion matrices were drawn for classification results of validation set to visually show classification ability of the models. Fig.8 shows the confusion matrix for validation images corresponding to dataset V as an example. The element a(i, j) in each cell of the matrix represents the number of images belonging to class i and classified to class j. The two columns of cells on the right represent the classification accuracy and error of each W-B class. Table 12 summarizes the average accuracies and the classification accuracies of each W-B class shown in confusion matrices. It can be seen that the highest accuracy is 100%. Although the lowest is less than 70%, the average accuracies are all higher than 85%. Compared with the error threshold, classification accuracies shown by confusion matrices also meet the actual requirements in practical engineering.

C. TESTING OF B-CNN MODELS
Testing set containing 67 images was used to evaluate the generalization ability of B-CNN models. In this study, although the essence of B-CNN models are classification models which output possibilities that testing images belong to each class, for each testing image, the classified object W-B is a quantifiable value, so the values of W-B and corresponding detected possibilities can be calculated by weighted average  calculation to obtain W-B ''detected value'' output by the models. In the present study, four statistical indicators, namely absolute percentage error (APE), mean absolute percentage error (MAPE), absolute fraction of variance (R 2 ), and root mean square error (RMSE), were applied to measure the errors between detected and actual value. APE, MAPE, R 2 , and RMSE were calculated from the following formulas respectively: where, a i and p i are the actual and detected value of the ith sample, and n is the sample size.
The possibilities detected by the models that testing images belong to each class, W-B detected values and the error analysis are shown in Table 13. It can be seen that all APE values are within ±4%, MAPE values of 15 models are not more than 1%, R 2 values are very close to 1, and RMSE values are in the order of magnitude of 10 −3 and 10 −4 . Above four statistical indicators for error evaluation all show a very small error between detected and actual values. Fig.9 visually shows the error between actual and detected values of each testing image. Two curves representing detected and actual values almost completely coincide, indicating that W-B detected values are very close to actual values. The histogram shows APE of each testing image, whose value is far less than the error threshold of ±15%.
Data and diagrams all indicate that whether on training set, validation set, or on testing set, 15 B-CNN models show excellent learning and generalization ability. B-CNN models effectively learned the knowledge of relationship between images of concrete mixtures and their W-B. In addition, detection time of each testing image is within one second. Therefore, we can draw the conclusion that B-CNN models implement real-time and effective W-B detection of concrete mixtures.

D. COMPARISONS WITH OTHER APPROCHES
B-CNN method is compared with other approaches proposed in references [34] and [35], as shown in Table 14. Compared with the results on W-B detection in references, testing sets of B-CNN models have a larger average R 2 , and time required for detection is the shortest. Additionally, resources wasting could not be addressed by approaches proposed by references [34], [35], since these methods still require specimens for detection although one of them could monitoring W-B at early age of the second day. Above comparisons comprehensively and mathematically demonstrate the effectiveness of B-CNN method. VOLUME 8, 2020

E. DEVELOPMENT OF CONCRETE MIX PROPORTION DETECTION SYSTEM
With the assistance of App Designer, the application construction platform provided by MatlabR2018b, the authors integrated 15 models to develop the detection system of concrete mix proportion. After selecting corresponding FAR and NMSCA, the system can not only load stored images, but also call cameras to take real-time pictures for W-B detection. This system, supplemented by certain hardware facilities, such as HD cameras, can timely detect W-B of concrete mixtures with high sampling rates in production process. The authors laid out the visual components and programmed the behavior in the App Designer platform and exported the system as a separate executable program. The user interface of developed detection system is shown in Fig.10.

A. A NEW PARAMETER: EVASR85
The parameter of validation accuracy can reflect the degree of image features differentiation between different classes to a certain extent, but it is more numerically complex, and its value is related to the selection of training option hyperparameters. For such defects, a new parameter was extracted from the training curves of the models, that is, epoch that validation accuracy steadily reached 85% (EVASR85). ''Steadily'' means achieving and no longer falling below the specific accuracy until the end of training process. The selection of accuracy threshold 85% is still based on the error requirements in practical engineering. Compared with validation accuracy, in addition to simpler in form and more numerically intuitive, EVASR85 can reflect the learning process and efficiency of the models instead of just showing final results of training.
Furthermore, we noticed that initial learning rates selected by 15 B-CNN models are not the same, models trained on datasets II, VIII, IX, XIV, XV selected 0.0005 or 0.00005 while 0.0001 was selected by the rest models. It is regarded as a basic theory that different learning rates may lead to the difference of convergence time, which may affect the value of EVASR85. In order to address this problem, we adjusted initial learning rates to 0.0001 and retrained the models on datasets II, VIII, IX, XIV, XV respectively without changing the other training option hyperparameters selected by B-CNN models. Comparisons of training curves with different learning rates are shown in Fig.11, and comparison of EVASR85 values is shown in Table 15. It can be seen that on each dataset, curves trained with different initial learning rates are similar in shape, they also have the same final validation accuracy, and there is little difference between the VOLUME 8, 2020   The correlation between EVASR85 and validation accuracy was analyzed in Fig.12. Distinct negative correlation was found between them. Therefore, we replaced validation accuracy with EVASR85 to analyze the relevant properties of concrete mixtures.
EVASR85 could be defined as a parameter describing the degree of image features differentiation between different classes. A small value of EVASR85 represents the model learns image features quickly and achieves high accuracy early and stably, further reflects there is larger difference among images in a dataset and the image features are distinct. In addition, we noticed that the image feature learned by B-CNN models is the apparent state of concrete mixtures. Therefore, EVASR85 can be further defined as a parameter describing the degree of apparent state differentiation of concrete mixtures with different W-B, that is, the sensitivity of apparent state to the change of W-B. A small EVASR85 reflects concrete mixtures have evident apparent state distinction, and more sensitive to changing W-B. The difference of apparent state of mixtures is one of the most important considerations for engineers to preliminary assess the service behaviors of concrete.

B. THE SENSITIVITY OF CONCRETE MIXTURES APPARENT STATE TO THE CHANGE OF W-B
According to the parameter of EVASR85, the authors analyzed the influence of mix proportion on the sensitivity of concrete mixtures apparent state to the change of W-B. Fig.13 shows the relationship between EVASR85 values of 15 models and corresponding mix proportions.
Considering from the perspective of NMSCA, it can be seen from Fig.13(a) that when NMSAC=10, the value of EVASR85 is the largest among three models with the same FAR, whereas EVASR85 values are small when NMSCA=20 or 31.5. Above facts show that when NMSCA is large, the images in a dataset have significant differences and distinct features. In other words, the apparent state of concrete mixture is sensitive to the change of W-B. On the contrary, the apparent state shows the insensitivity to changing W-B when NMSCA=10. To figure out the reason, when coarse aggregate particles are small, they will coating more water and cement paste due to larger specific surface area, and there is less cement paste and water floating on the surface of concrete mixtures regardless of W-B value, affecting the sensitivity of apparent state to the change of W-B. On the contrary, larger coarse aggregate particles have smaller specific surface area and are coated with less water and cement paste. In addition to visible difference in the quantity of free water and cement paste, the distinguishing in consistency of cement paste will be evidently exposed, determining the striking difference of the apparent state. The approximate trend of five curves in Fig.13(b) also indicates that the sensitivity of apparent state to changing W-B decreases with increasing NMSCA.
From the perspective of FAR, as shown in Fig.13(b), a larger or smaller FAR lead to larger EVASR85 values, and generally smaller EVASR85 values correspond to moderate FAR. Comprehensive comparison shows that when FAR=35, the models performed best on EVASR85. In a word, apparent state of concrete mixtures is most sensitive to changing W-B when FAR is moderate. The reason is that when FAR is relatively small, large amount of water and cement paste are    required to fill the gaps between coarse aggregate which are not fully filled with fine aggregate; on the other hand, as specific surface area of aggregates increases, large FAR reaches the same condition as when NMSCA is small. For moderate FAR, W-B will significantly affect the amount of free water and cement paste, and the cement paste consistency will be evidently exposed, so the apparent state is sensitive to the change of W-B. The fluctuation of three curves in Fig.13(a) also shows this conclusion intuitively.
In Table 12, two classification accuracies that are less than 70% can be found, which belong to datasets IV and XIII, corresponding FARs are 30 and 45 respectively and corresponding NMSCA of both is 10. These two lower accuracies can be explained by the influence of mix proportion on the sensitivity of apparent state to the change of W-B. Rules discussed above could be proved by images of some concrete mixtures shown in Fig.15.

C. THE ESSENCE OF IMAGE FEATURES LEARNED BY B-CNN MODELS
Fluidity is one of the most important properties affecting service behavior of concrete, and slump is one of the most commonly used indicators to measure the fluidity of concrete mixtures, that is, the height of the mixtures falling naturally due to its own gravity when lifting the fixed mold after loading and ramming. Slumps of some concrete mixtures measured during experiments are shown in Fig.14.
It can be seen that when NMSCA is small, the change of slumps is relatively slight as W-B changes, that is, the fluidity difference of mixtures is small. Larger NMSCA corresponds to greater slump change, that is, larger fluidity distinction. As for FAR, the slumps with larger and smaller FAR do not change significantly when W-B changes, whereas the slumps with moderate FAR changes significantly and the fluidity has large distinction.
Undoubtedly, the influence of mix proportion on the fluidity of concrete mixtures is consistent with that of EVASR85, a parameter describing the degree of image feature differentiation. So far, we can conclude that the essence of image features learned by B-CNN models is the fluidity of concrete mixtures, and the performance shown by the models represents the sensitivity of fluidity of concrete mixtures to the change of W-B to some extent. Furthermore, EVASR85 can also be defined as a parameter describing the difference in fluidity of concrete mixtures. The smaller value of EVASR85, the more evident difference in fluidity.
For concrete mixtures with the same FAR and coarse aggregate properties, previous researches generally accepted [1] that the most important factors affecting the fluidity are the quantity of free water and cement paste and the consistency of the latter, which were premised in the discussion in section V. B as the causes of the difference of concrete mixtures apparent state. Therefore, the consistency discussed above also justifies the correctness of the premise. Images of part concrete mixtures shown in Fig.15 could also intuitively confirm above conclusions.

VI. CONCLUSION
In this paper, a new method based on B-CNN model transfer learning was proposed to detect W-B of concrete mixtures. Applying pretrained CNN greatly reduces the quantity of demanded data and the time required for training the models. Bayesian optimization was applied to pretrained CNN to establish B-CNN models, which can effectively avoid tuning training option hyperparameters manually and make the networks reach their optimal performance as soon as possible. In order to cover all possible mix proportions involved in practical engineering, 67 experiments were conducted, and the same number sets of images of freshly-mixed concrete mixtures collected from experiments were divided into 15 datasets. For each dataset, B-AlexNet, B-VGG16, B-GoogLeNet and B-ResNet101 were fine-tuned and trained, among which the authors selected the best-performed one with optimal hyperparameters to detect W-B. 15 models achieved high training accuracies, and their validation accuracies are all above 85%, which all meet the requirements of practical engineering application. Different Statistical indexes for error evaluation all show that the models perform well on testing set. Above results show the proposed method can effectively detect W-B of concrete mixtures. 15 models were integrated to develop detection system of concrete mix proportion. In addition to loading stored images, this system can also call cameras to take real-time pictures. We can timely detect concrete mix proportions with high sampling rates in production process if this system is equipped with certain hardware facilities, and further effectively prevent engineering accidents caused by material failure. By referring training curves of the models, a new parameter, EVASR85, is introduced to describe the degree of apparent state differentiation of concrete mixtures, which is used by engineers as one of the most important considerations to preliminary assess service behaviors of concrete. Through this parameter, we analyzed the influence of mix proportion on the sensitivity of concrete mixtures apparent state to the change of W-B, and further concluded that the essence of image features learned by B-CNN models is the fluidity of concrete mixtures.
In further study, following aspects could be mainly considered: (1)Detecting FAR or NMSCA, which were determined in advance in the present study; (2) Applying multi-label classification models to simultaneously detect three mix proportion parameters of W-B, FAR and NMSCA; (3) Detecting mix proportion of concrete mixtures with other components such as mineral admixture and superplasticizer.