Stream of Unbalanced Medical Big Data Using Convolutional Neural Network

In order to address the problem that the traditional algorithm can not predict the network link load effectively, which leads to high packet loss and energy loss, long turnaround time, slow stream rate and poor anti-attack ability, the paper proposes the stream algorithm of unbalanced medical big data based on convolutional neural network (CNN). The proposed algorithm included two stages:In the first stage, the decomposition-prediction model was constructed, the combined wavelet analysis and neural network analysis were used to complete the network link load prediction; In the second stage, based on the network link load situation, we analyzed the structure of each layer of convolution neural network, constructed the medical big data stream optimization model, introduced the ReLu function to calculate the convolution neural network, solved the optimization model, and completed the stream processing of unbalanced medical big data. The experimental results show that the network link load prediction accuracy of the proposed stream algorithm is as high as 93%, the lowest packet loss rate is only 2.0%, the energy loss of the stream process is low, the rate is fast, and the anti-attack efficiency is high, which is more conducive to the realization of data stream.


I. INTRODUCTION
Unbalanced data often exists in the medical field, that is, the types of data are not evenly distributed [1]- [3]. The amount of data in one category is abnormally larger than that in other categories [4], [5], occupying a dominant position. This kind of data is called unbalanced data [6].
Intelligent medical treatment is a hot trend in medical development at present. It mainly uses technology to integrate medical data to form a resource sharing platform, promotes medical resource sharing, and solves the problems of waste of medical resources and inefficient medical progress due to news blocking During the development of intelligent medical treatment [7]- [9], a large amount of medical data will be generated, with different data sources and data uses [10], and there are also huge differences in computer communication interfaces and related communication protocols. The resulting big data The associate editor coordinating the review of this manuscript and approving it for publication was Honghao Gao . has various attributes [11], [12]. Due to different link lengths and uneven load capacity, it is easy to cause data loss and excessive average turnaround time during data sharing [13]. In order to better store and share medical data resources, suppress data packet loss, and ensure data integrity, medical big data needs to be streamed. Unbalanced medical data processing as a special data learning method has become a key research issue in the medical field [14].
Convolutional neural network is a commonly used artificial neural network model. It is widely used in data processing, artificial intelligence and pattern recognition [15]. It is currently used in medical fields with significant effects. Compared with other data stream methods, the convolutional neural network has a special structure and partial connection between layers. It can independently select data features, fully and flexibly process medical data, and obtain better data research results.
Yumeng et al. [16] used convolutional neural networks to conduct user interest recognition research, and used convolutional neural networks to fuse diverse user information. Based on this, they used maximum likelihood estimation to implement interest recognition. Wanjun et al. [17] proposed an adaptive enhanced convolutional neural network recognition method, constructed an adaptive enhanced model, combined with the convolutional neural network operation process analysis, optimized the performance of the neural network, and obtained good recognition results. Yongping et al. [18] used a convolutional neural network to adaptively set the window size, extract the semantic features of short text, and predict the emotional tendency, which significantly improved the short text classification effect.
Based on the above research, this paper proposes a stream algorithm of unbalanced medical big data based on convolutional neural network. The algorithm first performs network link load prediction analysis, and on this basis, uses convolutional neural networks to stream medical data traffic.
Finally, to test the performance of the algorithm stream, we carried out test and the results showed that compared with the traditional stream algorithm of unbalanced medical data. The link load prediction accuracy under the stream algorithm is high, the data packet loss rate is reduced, the average data turnaround time is shortened, the stream efficiency is high, the energy consumption is low, and the anti-attack ability is strong. Thus it showed that the algorithm has better stream effects and is conducive to medical data balance and sharing.
The main contributions of this paper are as follows: (1) A decomposition-prediction model was constructed, which combines wavelet analysis and neural network analysis to complete the network link load prediction together, and provides the basis for data stream.
(2) Constructed an optimization model for medical big data stream; (3) ReLu function was introduced in the process of stream treatment of unbalanced medical big data, which increased the data stream rate in a powerful way.
(4) The diversified index was used to verify the proposed algorithm, which improves the reliability of the research results.

II. RELATED WORK
For the time being, the medical data stream technology is centered on intelligent algorithm, such as the clustering algorithm, ant colony algorithm and tabu search algorithm.
Literature [19] proposed a new attribute weighting method. This method treats each attribute as a single classifier and uses the area under the ROC curve (AUC) to measure its discriminative ability.Then, each AUC value is used for weighting. Finally, this new attribute weighting method is incorporated into Naive Bayes for unbalanced data classification, but the method has weaker attack resistance. Literature [20] focused on analyzing an unbalanced data classification method, and experimentally verified the performance of the method, and obtained good results, but the method has a higher data packet loss rate; Literature [21] discussed the use of GPUs to speed up gravity calculations, and used data classification methods in the process. Classify the data of the gravity calculation process and assign them to the GPU threads in parallel to calculate the gravity at the same time. The calculation results are better, but the energy loss during the calculation process is higher; Literature [22] designed a large-scale unbalanced data classification algorithm with the differential twin neural network, which enhanced the depth structure expressing ability of convolutional neural network through the differential convolution in order to increase the discriminative ability of convolutional neural network. On this basis, we optimized the class feature maps of each set of unbalanced data, linked each set of data with multiple hyper planes and judged the class labels of data in accordance with distance between data and hyper plane so as to realize data stream. However, this method has the problem that it takes a long time to split the data. Reference [23] designed a new distributing prediction algorithm of unbalanced data based on threshold optimization algorithm. This algorithm has added the information classification process based on the traditional KSVM classifier and effectively classified the information near the hyper plane. Meantime, as for the fixed defects of time threshold of unbalanced data, we carried out dynamic adjustment to realize data stream with dynamic threshold optimization algorithm. However, the calculation process has a large energy loss; Literature [24] pointed out that most traditional pattern classifiers assume that their input data performs well in similar low-level class distributions, balanced class sizes, etc. However, the actual data set shows various forms of irregularities, which are often enough to confuse the classifier, thereby reducing the ability of the classifier. Therefore, the method based on the irregular characteristics of the distribution and characteristics to analyze the data classification problem, found an effective data research method, and achieved certain results, but the calculation performance of this method still needs to be further improved.
Aiming at the problems of the above research methods, this paper proposes a stream algorithm of unbalanced medical big data based on convolutional neural network. Algorithm research has been completed from the two stages of network link complex prediction and data stream description. The experimental results show that the link load prediction accuracy of the proposed algorithm is high, the minimum packet loss rate is only 2.0%, the average stream turnaround time is only 6.82s. The energy loss during the shunting process is low, and the rate is fast. The anti-attack efficiency can reach 90%, and the overall performance is better.
At present, medical big data is gradually coming out of inter-departmental restrictions and have realized resource interconnection and sharing, thus laying an important data foundation for the development of medical undertakings [25], [26]. In the face of the huge amount of medical data that are growing at an amazing speed, however, the smooth and effective operation of data shared network is confronting huge pressure [27], [28]. They are likely to exceed the available bandwidth of network link and cause network congestion. Therefore, it is urgent to find an effective VOLUME 8, 2020 solution. In case of such situation, the network operator has put forth two solutions. The first is to upgrade the current medical data shared network or establish a base station with a smaller coverage [29]. This solution is effective though, it will cost a high volume of time and costs and the overall efficiency is low [30]. The second is data stream. Data stream can be explained as distributing data flow of some wireless network to another or several wireless network(s) in return to share the load of a single network and reduce data transmission delay, shorten the average turnaround scheduling time, lower data package loss rate and ensure the data integrity [31]- [33].
Therefore, we designed the stream algorithm of unbalanced medical big data based on convolutional neural network and it mainly includes two parts: The first part is to establish the prediction model of network load degree to predict the network link load; the second part is to distribute the medical data flow with the stream model of convolutional neural network according to the prediction results. Next, we'd like to make a concrete analysis of the algorithm.

III. STREAM ALGORITHM OF UNBALANCED MEDICAL BIG DATA BASED ON CNN A. NETWORK LINK LOAD PREDICTION
The stream algorithm of unbalanced medical big data has two steps-link load prediction and data flow stream. When transferring the data from one node to another node and finally to the destination node and one of the nodes selects the next hop [34], [35], we need to test the cache state. Only when the remaining cache space of the next hop node is larger than the information size can it hop to the next node [36]. It will not cause congestion. This process is called network link load prediction. Network link load prediction is important for data stream [37], [38].
As different networks have different topological structures, the network loading capacities are also different [39]. Therefore, before predicting the load of medical data shared network, making clear the network topology is of great importance to the next procedure of prediction of network link load.
The commonly used network link load prediction methods include wavelet analysis and neural network analysis. The wavelet analysis in the most effective method for processing unstable time series. The wavelet can decompose the long medical data flow into multiple independent small data flows. In this way, we can carry out time series prediction for the historical data flow on the network links. Neural network analysis relies on unique structural advantages, which can input, process, and output data to complete network link load prediction. Although both methods can achieve load forecasting, the accuracy of the research results is lower by relying on a single method for prediction.
Therefore, this work combines the two methods to construct a decomposition-prediction model. The model first uses wavelet analysis to decompose the data stream, and then uses the decomposed data as the input data of the neural network model, and runs the neural network to predict the network link load capacity. The steps are as follows:

1) DECOMPOSE DATA STREAM WITH WAVELET ANALYSIS
Randomly selecte a wavelet basis function and use the following process to expand and translate and get a set of function families. The selected wavelet basis function A ab (t) is as follows: where, a represents time scale; b represents time displacement; t represents time. By expanding and translating process for formula (1), namely the discrete wavelet transform, and the transformed wavelet basis function is defined as follows: where, m represents the specified wavelet scaling integer; n represents the specified integer of wavelet translation; a 0 represents fixed scale parameters greater than 1 (a 0 > 1);b 0 represents transfer-parameters.
By calculating the similarity coefficient, and the process is as follows: where, Data x (t) can be expressed as the cumulative sequence of similarity coefficient and detail coefficient: where, C mn (t) is the detail coefficient, which can be expressed as: According to the transformed wavelet basis function and similarity coefficient, data stream decomposition can be completed, and the decomposed data stream can be expressed as: where, O represents the original data stream.

2) NEURAL NETWORK LINK LOAD PREDICTION
Generally speaking, the neural network includes input layer, hidden layer and output layer [40], [41], and the basic thinking is as follows: The neural network includes two processes: learning and training. First, design network structure to determine the input and output layer. Here the historical flow data in the network structure are taken as the input and the hopping load as the output. And the hidden layer is more important, as the number of hidden layer neurons will directly influence the prediction results. Therefore, the number of hidden neurons is generally calculated by the following formula: where, q represents the number of neurons in the input layer; p represents the number of neurons in the output layer; c represents the constants between [1,10]. The decomposed data stream D is input to the neural network model, and the data stream is processed according to the neurons. During the process, the S-type tangent function and S-type logarithmic function are selected as the excitation function, and the formula is as follows: S-type tangent function: S-type logarithmic function: where, in formulas (8) and (9), e represents gradient, and x represents threshold. θ is the excitation parameter. Therefore, the neural network link load prediction value can be obtained and expressed by the following formula: Based on the above steps, the neural network link load prediction can be completed, the network status can be clear, and the foundation for medical big data stream can be laid. For prediction, we selected neural network toolbox in MATLAB for network training [42]- [44].

B. USING CONVOLUTIONAL NEURAL NETWORK TO STREAM MEDICAL BIG DATA
Convolutional neural network includes input layer, pooling layer, fully connected layer and output layer.The input layer is responsible for managing the original data. The pooling layer is mainly set up to avoid dimensional disasters. Therefore, the main role of this layer is to compress large data and avoid fitting problems. To achieve the above goals in this layer, it is mainly accomplished by the maximum value pooling method, that is, the maximum flow value of the data is calculated and retained. Fully connected layer and output layer. They are located at the bottom of the entire convolutional neural network. The main role is to output the optimal medical big data stream result.
The convolutional neural network is used to reasonably stream the unbalanced medical big data, as follows: Input: Original medical big data, and network link load prediction; Output: Unbalanced medical big data stream results.
(1) Firstly, we calculated the stream of frequent dynamic measurement of unbalanced medical big data: where, β represents the cross-correlation of unbalanced medical big data, N represents data volume, g represents ambiguity function of unbalanced medical big data, and Q represents data fuzzy features.
(2) As for limited unbalanced medical big data set X i , we established vector set of three-dimensional reconstruction stream: where, newi * can be obtained through statistical analysis method. The process is as follows: (3) Building the automatic corrective decision function of unbalanced medical big data, such as: where, t is time, and k is the effective amplitude of big data.
(4) On the basis of realizing automatic corrective decision of unbalanced medical big data, the optimization model of medical data flow stream is as follows: where, z represents the total transmission rate of medical data flow; g represents the max sending rate; r i represents the ratio of the data sub-flow assigned on the i link N i accounting for the total transmission data;r represents the result vector of flow stream ratio. (5) The convolutional neural network will select sigmoid function as the network activation function to solve the optimization model of the above medical data flow stream. The expression of the formula is shown in formula (9). But this will lead to the growing calculated amount and vanishing gradient. As a result, the entire neural network cannot finish the follow-up training. To this end, we will use a new motivation function, namely ReLu function. The function formula is as follows: where, d represents the stream predictive value. (6) After learning convolutional neural network, we need to train it, namely adjusting the values of various parameters through the error between the estimated value and actual value to increase the rationality of shared network stream of medical data and obtain unbalanced medical big data stream results. VOLUME 8, 2020

(7) End
The above process can complete the unbalanced medical big data stream. The overall stream process is shown in Figure1.

IV. EXPERIMENTAL RESULTS AND ANALYSIS A. EXPERIMENTAL ENVIRONMENT AND PARAMETER
In order to verify the application performance of the proposed stream algorithm of unbalanced medical big data based on convolutional neural network, we have carried out the below simulation tests. The testing environment consists of 25 virtual nodes. It is shown in Table1.
This paper completes the unbalanced medical big data stream based on the convolutional neural network. The parameters of the convolutional neural network are shown in Table2.

B. EXPERIMENTAL DATASET
The experiment selects the data set in the ADNI database (http:adni.loni.usc.edu) as training data and experimental test  data. The database includes a large amount of clinical data, Magnetic Resonance Imaging, Positron Emission Computed Tomography, Plasma from Blood, single nucleotide polymorphisms and Subject physical examination. The experimental data is described in Table 3.
According to Table 3, the five data sets used in this experiment are Magnetic Resonance Imaging, Positron Emission Computed Tomography, Plasma from Blood, single nucleotide polymorphisms and Subject physical examination. Each data set is 300 ten thousand, with a total of 1500 ten thousand data sets, of which 500 ten thousand data sets are used for model training and 1000 ten thousand data sets are used as test data sets.

C. EXPERIMENTAL INDEXS
In the experimental test, there are six test indexs, namely:

1) NETWORK LOAD PREDICTION ACCURACY
In the process of completing big data stream, the network load prediction was first carried out, which laid the foundation for the offloading. In order to verify the effectiveness of the proposed algorithm, the network load prediction accuracy of the proposed algorithm and other literature algorithms need to be compared.
where, M represents the value of the algorithm network load prediction result. M act represents the actual network load result value.

2) DATA PACKAGE LOSS RATE
Data package loss rate refers to the ratio of the number of lost data packages in the test accounting for the number of data packages being transmitted, which is usually tested within the where, v represents the medical data package loss rate; y represents the number of lost data packages; Y represents the total number of medical data packages being transmitted.

3) AVERAGE STREAM TURNAROUND TIME OF MEDICAL DATA PACKAGES
The average stream turnaround time of medical data packages refers to the time used from data being input to completion of stream.
where, l represents average turnaround time; G represents quantity; H 1 represents data upload time; H 2 represents the completion time of application.

4) ENERGY LOSS DURING STREAM
Energy loss refers to energy consumption precipitated by transmission medium during data stream. The lesser the energy loss, the higher the stream efficiency.
where, L s represents basic path loss; L 1 and L 2 represent the plane wave gains at the data sending and receiving terminals.

5) DATA STREAM RATE
The stream rate can reflect the work efficiency of the stream process.
Stream rate = 1 v e × log 2T (21) where, v e represents symbol transmission rate; T represents the valid value state of the stream process.

6) ANTI-ATTACK CAPABILITY OF THE STREAM PROCESS
The anti-attack capability of the stream process can be reflected through effective anti-attack efficiency which can capture the stability of stream process. The calculation process is as follows: Effective anti-attack efficiency = Number of successful defenses Total Attacks × 100% (22)

1) COMPARISON OF NETWORK LOAD PREDICTION ACCURACY
Using the calculation method of formula (17), the network load prediction accuracy of the proposed algorithm and Literature [20], Literature [21], Literature [22], Literature [23], and Literature [24] are calculated, and the comparison results are shown in Figure 2.
According to the analysis in Figure 2, with the amount of data increases, the prediction accuracy curve of the proposed algorithm always shows an upward trend, and the accuracy curves of other literature algorithms fluctuate up and down, especially the algorithm in Literature [20] algorithm, The prediction accuracy curve of the algorithm first rises and then decreases, and the fluctuation range is large. The accuracy of Literature [24] is relatively high. The maximum prediction accuracy of Literature [21] and Literature [23] is 60%. Literature [22] is relatively stable, about 42%. It can be clearly seen from the Figure2 that the prediction accuracy of the proposed algorithm is always the highest, which can be as high as about 93%.
According to the above data analysis, it can be seen that the algorithm has significant advantages and high prediction accuracy. This is because this paper proposes a decomposition-prediction model, which combines wavelet analysis and neural network analysis to predict the network load, and obtains good results.

2) COMPARISON OF DATA PACKAGE LOSS RATE
Using the calculation method of formula (18), the data packet loss rate of the data classification algorithms of different literature algorithms is calculated, and the results are shown in Figure3.
In comparison to the five traditional data stream algorithms, the use of the stream algorithm of unbalanced medical big data based on convolutional neural network for stream can largely reduce the package loss rate of medical data, with the package loss rate ranging from 2.0% to 3.4%.
Among the five traditional algorithms, the data loss rates of the algorithms in Literature [21] and Literature [22] are high, both of which are above 8%. The data packet loss rate of the algorithm in Literature [20] is about 6%. With the increase of the number of iterations in the algorithm of Literature [23], the packet loss rate changes greatly, the highest is 8%.
The Literature [24] has a relatively low packet loss rate, but the lowest value is 4%, which is still higher than the highest packet loss rate of the proposed algorithm. It can be clearly seen that the proposed algorithm is significantly lower than the other five data stream algorithms. It can be seen that  the unbalanced medical big data stream algorithm based on convolutional neural network can effectively guarantee the integrity of the packets with stream.

3) COMPARISON OF AVERAGE STREAM TURNAROUND TIME OF MEDICAL DATA PACKAGES
Using formula (19), Test and calculate the turnaround time of different algorithms during the stream of medical data packets, and calculate average value.
From Table4,in comparison to the five traditional data stream algorithms, the stream with the stream algorithm of unbalanced medical big data based on convolutional neural network can greatly reduce shorten the average stream turnaround time of medical data packages. The average stream turnaround time of medical data packages with such algorithm is 6.82s but that of the five traditional data stream algorithms is all above 10s.
Literature [21] algorithm has the highest, which is 15.0s. The average stream turnaround time of medical data packets in Literature [20] is relatively low, which is 10.48s, butit is still higher than the average value of 3.68s in proposed algorithm. Thus it can be seen that the treatment performance of stream algorithm based on convolutional neural network is superior and has achieved the goal of this study and is more conducive to the development of medical and health undertakings.

4) COMPARISON OF ENERGY LOSS DURING STREAM
We tested the energy loss of stream process of different algorithms and the results are shown in Figure4. According to Figure4 (a)-Figure3 (f), with the increase of the number of iterations, the energy loss of stream process of different algorithms is changing. The energy loss of stream process of algorithm in Literature [20] is on the rise and the algorithm in Literature [21] before the number of iterations 400, the loss polyline continued to decline, the number of iterations 400-500, the loss did not change. The algorithm in Literature [22] after 200 iterations, the energy loss in the stream process continued to decline.
The energy loss of stream process of algorithms in Literature [23] and [24] and the proposed algorithm does not have an obvious law of change. But that of the proposed algorithm is always under 20dB, which is lower than the loss rate of other five stream methods. It can be seen that the stream efficiency of stream algorithm of unbalanced medical big data based on convolutional neural network is the highest. This is because during the operation of the convolutional neural network in this paper, the new excitation function ReLu function is used for analysis, which reduces the computational complexity and the energy consumption of algorithm operation.

5) COMPARISON OF DATA STREAM RATE
In order to further verify the efficiency of stream algorithm of unbalanced medical big data based on convolutional neural network, we contrasted the data stream rate of different algorithms. Use formula (21) to calculate the stream rate of different algorithms. The results are shown in Table 5.
From Table5, with the increase of the number of iterations, the data stream rate of different algorithms is changing. The data stream rate of the proposed algorithm is all above 55bts. The data diversion rate of algorithms in Literature [20], Literature [21] and Literature [23] is too low to exceed 30bts. The stream rate of the algorithms in Literature [22] and Literature [24] is relatively high, but it is still far lower than that of  the proposed algorithm, and the maximum is no more than 50bts. Therefore, it can be seen that the stream algorithm of unbalanced medical big data based on convolutional neural network has faster stream rate, fully takes advantage of the convolutional neural network, and has a high efficiency in the stream process.

6) COMPARISON OF ANTI-ATTACK CAPABILITY OF THE STREAM PROCESS
Use formula (22) to calculate the effective anti-attack efficiency of different algorithms and verify the anti-attack ability of the algorithm stream process. The comparison results are shown in Figure5. From Figure5, in multiple iterations, only the effective anti-attack efficiency of the stream algorithm of unbalanced medical big data based on convolutional neural network of the proposed algorithm is steadily on the rise, with the highest up to 90%.
The algorithm in Literature [23] has the anti attack efficiency next to that proposed paper, which is more than 80% as a whole. The algorithms in Literature [21], Literature [22], and Literature [24] have obvious fluctuations, and the overall variation varies from 72% to 80%. The algorithm in Literature [20] has the lowest effective anti-attack efficiency, with an average of about 71%. It can be seen that the stream algorithm of unbalanced medical big data based on convolutional neural network has stronger anti-attach performance and application stability.

V. CONCLUSION
Medical undertakings have a direct bearing on the survival and development of humans. Every day, medical institutions receive numerous patients and there will be generous amounts of medical data. But the medical data are stored in the medical institutions and not shared with others. These independent data are unfavourable to the long-term development of medical undertakings. Therefore, the state has rolled out many policies to promote resource sharing. Whereas, in the process of medical data sharing, the shared network falls short of full load, which will contribute to data package loss and slow average turnaround time. To this end, this paper has proposed a stream algorithm of unbalanced medical big data based on convolutional neural network. The network link load is analyzed. Under the condition of grasping the network load, the convolutional neural network and the ReLu function are used to perform data calculation and analysis to complete the research on balanced medical big data stream.
The experimental results show that the proposed network link load prediction accuracy of the proposed algorithm is as high as about 93%, the data packet loss rate is low, the average stream turnaround time is short, and the energy loss during the stream process is always kept below 20dB. The rate is fast and the anti-attack efficiency is high. This proves that the proposed algorithm has a strong shunting ability, is conducive to achieving data balance, and can provide effective support for the realization of medical resource sharing.
In the research, we also find that when solving the data flow stream model with convolutional neural network, if the sigmoid function is used as network activation function, it will give rise to increasing calculation amount and vanishing gradient, which not only makes it hard for the neural network to complete the subsequent training, increases energy loss of stream process and decreases stream rate. While the use of ReLu function as activation function can avoid this phenomenon.
In the future, we will consider different application environments, further collect more comprehensive unbalanced medical big data, further analyze the comprehensive characteristics of the data, find more detailed stream paths, optimize data stream steps, and obtain more effective ways to analyze medical data.
WEIWEI GAO was born in 1976. She received the bachelor's degree from the Harbin Institute of Technology, China, and the master's degree from Northeastern University. She is currently working as a Professor with the Wenzhou Business College. Her research interests include artificial intelligence, information security, and big data analysis.
LI CHEN received the Ph.D. degree. She is currently an Associate Professor. Her main research interests include machine learning and mobile communications technology. She is a member of the China Computer Federation.
TAO SHANG was born in 1968. He received the Ph.D. degree. He is currently a Professor. His research interest is mainly in the area of mechatronic engineering, robotics and intelligent systems, and data mining. He has published dozens of research articles in scholarly journals in the above research areas. VOLUME 8, 2020