Research on Music Visualization Based on Graphic Images and Mathematical Statistics

Music visualization can present music information through the visual way of graphic images, which is helpful to improve the accuracy and effectiveness of music information communication. In view of the shortcomings in the current music visualization field, this paper combines K-means clustering, fusion decision tree and other mathematical statistical methods on the basis of music graphic images to construct a music visualization model based on graphic images and mathematical statistics. First, the application principles of Schlieren imaging and laser Doppler imaging in the visualization of music graphic images are described. Secondly, on the basis of music graphic images, K-means clustering method is used to perform cluster analysis on music visualization information. Finally, through the fusion decision tree method, the classification of music visual information is studied. The actual case analysis and performance test results show the superiority of the music visualization method based on graphic images and mathematical statistics. This method can provide a scientific reference model and basis for the modern music industry to establish new visualization systems using graphics, images and mathematical statistics.


I. INTRODUCTION
Music information visualization, that is, according to the principle of music acoustics, music information is presented in a visual way of graphic images, so that it can be intuitively used to eliminate the deficiencies of music sounds in music education and music research, and improve the transmission of music information Accuracy and effectiveness [1], [2]. Music information visualization is based on music acoustics, integrating basic sciences such as psychology, mathematics, and physics, and combining modern engineering technologies such as human-computer interaction, computer information technology, and sensing and control technology [3]. By abstracting, extracting, statistically and mathematically modeling a variety of music information, it is visually displayed and finally can be used in music creation, music performance, music education, music research and other fields [4].
In some fields of science and engineering technology (such as data science and information science), people regard visualization as a tool for displaying data, and represent abstract The associate editor coordinating the review of this manuscript and approving it for publication was Zhihan Lv . data with specific graphics to help describe and explore the world. The rapid development of electronic and computer technology has made the analysis and recording of sound waveforms, spectrum and other technologies also at hand [5]. These technologies enable music to be transformed into data, which can then be analyzed, designed, and calculated, and converted into a computer-controlled visual image. In the field of modern music acoustics, waveform charts, spectrograms, and spectrograms that expand the spectrograms over time have become the most common cases of music information visualization [6], [7]. Sonic Visualizer, developed by the Digital Music Center of Queen Mary University of London, provides a platform for sound signal visualization for music researchers, archive managers, audio signal researchers and enthusiasts to conduct audio research by watching [8]. The platform, including the music technology group of the Pompeii Fabre University in Spain, can be used as a plug-in to run programs developed by various national higher education institutions and research units.
The visualization of music information presents the music information in the visual way of graphic images, which is a trend of music interaction design and a research field of music technology [9]. Music information visualization has different forms under different historical conditions, and most of the current related research is conducted from the perspective of a single discipline or a single visualization form, and there has not been a research from the perspective of the combination of music graphics and mathematical statistics [10]- [12]. Based on this, on the basis of music graphics and images, combined with mathematical statistics algorithms such as K-means clustering and fusion decision trees, a music visualization model based on graphics images and mathematical statistics is constructed. This method can provide scientific reference and basis for the development of modern music visualization.

II. MUSIC GRAPHIC IMAGE TECHNOLOGY
The visual graphics technology discussed in this section is mainly used to study the physical mechanism of wind instruments. This method is of great significance for understanding the physical nature of musical instrument sounds, physical modeling of timbre synthesis, and guiding the design and production of musical instruments [13]. This section mainly introduces the basic principles and applications of two special optical images: Schlieren imaging and laser Doppler imaging. Among these observation methods, the Schlieren imaging method is relatively early. In addition to the imaging methods mentioned in this article, there are Hotwire Imaging, TV holography, Digital Speckle Photography, etc. [14].

A. SCHLIEREN IMAGING
Schlieren imaging method, also known as Schlieren photography method and schlieren method, can be used to observe the air flow of blowpipe instruments. Schlieren comes from German, which means ''line'' and ''stripe''. This method was invented by German physicist August Toepler in 1864 while studying the motion of ultrasound. It is mainly used to observe the density change in transparent media, and it can also indirectly observe the temperature and stretching the rate change situation is now widely used in airflow observation in aerospace engineering [15]. Research using the Schlieren imaging method in the state of musical instrument performance is mainly used in wind instruments. In 1996, researchers in the Netherlands and France observed the phenomenon of wave shock in the trumpet of the trombone under strong force. Shock wave belongs to a form of turbulent propagation, and its moving speed is faster than the speed of sound in the medium. It can carry energy and propagate in the medium like ordinary waves [16]. The optical path diagram of the Schlieren imaging method is shown in Figure 1. The system is composed of a light source and a small hole, a set of parabolic mirrors, a camera, and obstacles (usually blades). The parabolic mirror near the light source is called the objective lens, and the parabolic mirror near the camera is this is called a mirror.
A group of parabolic mirrors in the figure can also be replaced with a group of concave mirrors. If it is replaced with a concave mirror, the focal length in the figure should correspond to twice the radius of the concave mirror. In order to improve the clarity and contrast of the observed image, an obstacle (generally a blade) is generally placed at the focal point of the image mirror and blocks light about half the diameter of the focal spot. Its purpose is to block the bending caused by the change in airflow density. Observe that part of the light better. The placement of obstacles has certain skills, and it is necessary to slowly and carefully find the appropriate position. If the image is observed before or after the focus, the brightness of the screen may be uneven. When the screen appears similar to left dark right bright, left bright right dark, or dark upper bright, dark bright upper, dark, you need to adjust the front and back distance of the blade in the optical path to find the exact position of the focus and fix it [17].
When using the Schlieren imaging method to observe the airflow, the object to be observed is placed between two sets of parabolic mirrors. When the density of the airflow changes, the light that should be parallel to the objective lens bends under the density change, and some of them bend. Since the light path deviates from the original line, it is possible to observe the change of the medium density in the observation area.

B. LASER DOPPLER IMAGING
Laser Doppler imaging is a more advanced gas imaging method. It uses a laser as a light source, and divides the laser into a reference light and detection light through a beam splitter. The reference light is directly incident on the photo detector, and the detection light is directed at a moving object. Due to the movement of the object, the reflected probe light will change in frequency and phase. The reflected probe light will also enter the photo detector along with the reference light through the optical path [18]. The frequency and phase changes brought by the moving object will inevitably cause the modulation of the reference light signal by the probe light. The photo detector detects the modulated signal. After the modulated signal is subjected to information processing, the flow velocity of the particles can be analyzed in turn, and the flow velocity image when the instrument is played can be obtained by plotting it [19]. The principle of laser Doppler imaging is shown in Figure 2.
Researchers at the Free University of Brussels have used laser Doppler imaging to study the vibration of the organ, VOLUME 8, 2020 and compared it with the more traditional Schlieren and PIV methods [20]. Laser Doppler imaging is an ideal imaging method because of its large imaging range and high accuracy.

III. MUSIC VISUALIZATION BASED ON MATHEMATICAL STATISTICS
Statistical graphs refer to graphs that display statistical data using geometric figures or images of specific things. In addition to the four elements of pitch, timbre, length, and intensity, people have discovered more features of music information from the perspective of informatics. These features can characterize the micro and macro information of music from different dimensions [21]. According to the degree of these features from micro to macro, the characteristics of these music information can be divided into bottom layer music information, middle layer music information and high layer music information.

A. UNDERLYING MUSIC INFORMATION
The bottom layer music information is analyzed from several milliseconds to tens of milliseconds of music information. The most common visual displays are waveform diagrams and spectrum diagrams [22]. The spectrogram is obtained by Fourier transform of the waveform within a frame. By further analyzing and processing the waveform and spectrum, you can get more low-level music information, such as instantaneous energy, chroma spectrum, cestrum, and Mel Cepstral coefficient, spectrum centroid, spectrum differential amplitude, etc. The underlying music information is often referred to as ''feature'' in music informatics. The extraction and analysis of these features can be described as the aspect that has had the greatest impact on the appearance of musicology in recent centuries. Higher-level analysis becomes more microscopic, reflecting the latest progress in musical acoustics [23]. Audio thumbnails are the task of extracting the most representative parts of music recordings, where popular music is usually a choir. This method is based on finding the maximum value of the filtered version of the self-similar matrix. The result (that is, the extracted thumbnail) is exported to the corresponding audio file, and at the same time, the self-similarity (and the detected area) can also be displayed [24].

B. MID-LEVEL MUSIC INFORMATION
Compared with the bottom information, the middle information of music has more practical significance for people, including melody, rhythm, tonality, harmony and so on. After undergoing a certain amount of music training, people can recognize this information, and computers can often obtain practically meaningful middle-level information through comprehensive analysis and processing of the underlying information [25]. In recent years, many research topics in the field of music information processing have focused on how to ''translate'' the original waveform information into symbolic music information that is easier for humans to understand, although this topic does not seem to have much to do with the research and application of music itself. However, the visualization products (some are final products and some are intermediate products) involved in the research of these music informants are of great benefit to the music discipline to understand and analyze music. Based on the author's music learning and practical experience, this visual information can not only provide a new perspective for the industry to interpret and analyze music or music learners to understand music, but also improve the efficiency of information transmission.

C. HIGH-LEVEL MUSIC INFORMATION
The visualization of high-level music information has brought many changes to music. First of all, it makes the analysis that needs to be described in language in the past and still has a certain degree of ambiguity become more intuitive and the communication efficiency is higher. Secondly, with the aid of computer-aided analysis and the increasing use of mathematical language to describe music, it has also allowed the past to obtain information that needs to be accumulated through years and months of analysis by researchers [26]. Through more and more data collection and computer-aided analysis, a statistical macroscopic result can be obtained faster. From this perspective, it speeds up the process of research; finally, the intersection of multiple disciplines in the modern era has given people a new perspective on analyzing music. In the field of music perception and music posture research, more and more sensors, information collection systems and information processing systems are used to obtain statistical results after statistical analysis. In this section, we will analyze the principle of visualizing high-level music information by taking music structure, music expression, music composition, posture and behavior, and subjective auditory attributes as examples.

IV. MUSIC VISUALIZATION BASED ON STATISTICAL GRAPHICS
Mathematical model refers to a model constructed with mathematical language using mathematical logic methods. The development of the Internet has resulted in a large amount of data accumulation. Massive digital music has brought a wealth of ''data food'' to researchers of music information. With the constant attention to data value mining, in recent years, there have been endless ways to study and analyze music information from the perspective of graphic images and mathematical statistics, forming systems musicology, music informatics, computational musicology and other fields.

A. K-MEANS CLUSTERING ALGORITHM
The K-means clustering method is used to cluster each music data. K-means clustering algorithm is to randomly select K objects as the initial clustering center, then calculate the distance between each object and each seed clustering center, and assign each object to the nearest clustering center [27]. Then, the non-representative object data is repeatedly used to replace the representative object data, thereby improving the quality of clustering and the quality of clustering results.
Represents the distance from the common node in the cluster to the CH. The most commonly used measure of distance is the Makowski distance (ie p-norm), i.e.
When p = 2, the Makowski distance is the Euclidean distance (2 norm), and the formula is simplified as In this paper, the Euclidean distance used to calculate the distance between nodes is based on the three-dimensional space. The final distance calculation formula is as follows.
It is estimated by a cost function, and the center point of each cluster is updated according to the principle of decreasing the value of the square difference function. The average difference between the measurement object of this function and the object it refers to. The distance used in this paper is the distance metric with added attributes.

Gradient Boosting Decision Tree (GBDT) is a machine learning method widely used in regression and classification tasks.
It produces a prediction model in the form of a collection of weak learners (usually decision trees). Iteratively weaker the learner is combined into a stronger learner [28]. The establishment of each decision tree is to reduce the residual of the previous model, so that the residual decreases toward the gradient, and the residual is continuously reduced in successive iterations. In the GBDT iteration process, the goal of the next iteration is to find a weak learner of the CART regression tree model to fit the residuals of the previous model, so that the model generated by the previous iteration and the current model are obtained The loss between the output value and the real value should be as small as possible, and finally the models generated by all iterations are accumulated to obtain the final prediction model.
In 2014, He and others from Facebook found that GBDT can automatically discover a variety of distinguishable features and feature combinations, and can use the partition path VOLUME 8, 2020 of the samples on each subtree as the input feature of other models, greatly simplifying manually select and combine features [29].
Logistic Regression (LR) is a binary classification model widely used in the industrial field. On the basis of linear regression, a layer of sigmoid function mapping is added to the mapping of features to results to predict the value. Limited to [0, 1], you can output probabilities of different categories. The probability p(y = 1|x, θ) represents the probability that y belongs to 1 given the characteristic variable x, and h θ (x) = p(y = 1|x, θ), then there is a logistic regression model: In which, θ = {θ 0 , θ 1 , . . . θ p } represents the coefficient value corresponding to each feature, θ value It can be obtained by solving the maximum likelihood estimation function. Assuming that each sample in the data set is independent of each other, the likelihood function: Let x be the input n-dimensional feature variable, the set y ∈ {c 1 , c 2 , . . . , c n } is the input category, X is the random variable on the input space, Y is the random variable on the output space, the combination of X and Y The probability distribution is P (X, Y ), and P (X, Y ) independently and identically generates the training data set [30]: So available: The conditional probability has made the construction of conditional independence, namely: Bring Equation 8 into Equation 7 to get the basic formula, which means the probability of the output category A given the instance Y.
In practical application, when classifying feature instances, we choose the one with the largest probability value as the final category, which can be formalized as formula 4.
The gradient descent method is often used to obtain the parameter θ, but due to the limited learning ability of the LR model, a large amount of artificial feature engineering is usually required to improve the learning ability of the model [30], [31]. How to automatically mine effective features and feature combinations has become an urgent problem to be solved and using models to explore the combined relationship between features has become an effective way to solve this problem [32]. This article considers applying the fusion model structure combined with LR model after the feature transformation of GBDT to music visualization research. The feature combination is performed by GBDT, which is fed into the LR model for training and combined into a fusion decision model. The structure of the fusion model is shown in Figure 4.

V. EMPIRICAL CASE ANALYSIS OF MUSIC VISUALIZATION
The research data in this paper is mainly composed of music information of a large-scale network music platform captured by a web crawler program, which contains 2000 pieces of music information. In order to make the amount of information statistically significant, the selection principle of case samples is mainly based on comparability and availability, that is, within the range of data available to the author, the amount of information selected as rich as possible and comparable. The captured data is counted in units of the first, and the final music data sample situation is obtained.

A. CASE TEST ENVIRONMENT
The above describes the music visualization research based on graphic images and mathematical statistics. This section analyzes and verifies the practicability and efficiency of the method by analyzing actual case tests. When testing music visualization technology based on graphic images and mathematical statistics, the configured test environment is shown in Table 1. Among them, the application server and the database server all use Inter L5520 type CPU, Centos-6.5 system, and the database is MySQL 5.5.28.

B. MUSIC CATEGORY CLUSTERING BASED ON K-MEANS METHOD
First, the K-means method is used to calculate the category clustering results of the crawled music information, and the distance from the cluster center is calculated for clustering. Finally, the results are displayed through visualization techniques. In order to explore the heterogeneity of various categories of music, the spatial autocorrelation index Moran's I was selected for measurement. The construction of the spatial weight matrix is constructed according to the spatial proximity, that is, the adjacent ones are 1, and the non-adjacent ones are 0. At the same time, through the method of local spatial autocorrelation, the high and low clustering maps of various categories of music are produced to explore the local heterogeneity among the music.
Use the factor score formula to calculate the customer's score on the factor, and then use the clustering algorithm based on partition to construct the music category model. Table 2 shows the relevant statistical data after the customer is subdivided according to the factor score. It lists the number of customers subdivided by each type of customer and the scores on each factor. The results in the analysis table show that the divided music has six types, namely 001 (clusters 1, 2), 111 (clusters 3, 5), 000 (clusters 4), 010 (clusters 6), 011 (Cluster 7) and 100 (Cluster 8).
Enter the normalized recording data of each music into the corresponding vector file, and at the same time obtain the Moran scatter diagram of each recording number (see Figure 5 and Figure 6).
First, the global Moran's I value is 0.211 3, and the confidence interval at α = 0.01 is highly significant. This shows that from the perspective of overall differences, the music visualization category and the music's own information attributes show a strong positive self-correlation. The frequency of music visualization presents a more concentrated and differentiated feature, and the category differences between regions are obvious. Therefore, the current situation of high concentration of music categories in the local area VOLUME 8, 2020  and low concentration of surrounding areas has fully demonstrated the important influence of the level of informatization on regional music categories.

C. MUSIC VISUAL CLASSIFICATION BASED ON DECISION TREE
Aiming at the current music visualization problem, this section combines graphic images and mathematical statistics to construct an algorithm suitable for classifying music visualization categories using network music data. After preprocessing, 1500 rows of part of the original data set are extracted. To evaluate the effectiveness of the model, the pre-processed data set is split into a training data set and a test data set, where the training data set contains all the data of the first 1000 rows of the pre-processed data set, and the test data set contains the All data in the remaining 500 rows of the pre-processed data set. The training data set is used to train the model, and then the data of the test data set is substituted into the trained model to predict the music visualization category, and finally the model effect is  evaluated by forming a confusion matrix between the real category and the predicted category of the test dataset.
Establish a fusion decision tree model on the training data set. Set the maximum depth of the decision tree Max depth to 10, the minimum observation value Min split that must exist in a node is 20, and the number of cross-validation x Val is 5. Using the test data set, the confusion matrix of the fusion decision tree algorithm test data set can be obtained.
By confusing the confusion matrix of the test data set of the decision tree model, the accuracy rate of the model can be calculated as 0.2097079, the recall rate is 0.1008315, the F value is 0.1361835, and the prediction accuracy rate is 0.8762079. At the same time, the conditional inference tree model is established on the training data set. After the decision tree model is established, the training data set is substituted into the model to obtain all the parameters of the decision tree model. Then all the data of the test data set except the category are substituted into the trained model to obtain the music visualization category of the test data set. The comparison of the classification effect between the ordinary decision tree and the fusion decision tree is shown in Figure 7, Figure 8, Figure 9 and Figure 10. Good results have been achieved by  applying the fusion model structure of GBDT combined with the LR model to the music visualization research. Since the learning ability of the LR model is limited, a large amount of artificial feature engineering is usually required to improve the learning ability of the model. The combination of features by GBDT is sent to the LR model for training, and the combination into a fusion decision model effectively solves this problem.

VI. CONCLUSION
The visualization of music information originated from the need of music information recording and dissemination, and played a purely functional role. Later, as the subject of scientific research, it became the object of people's observation and understanding of the objective world (mainly acoustic phenomena) related to music. Later, it was used to control the sound and the visual effects it produced. On the one hand, it gave birth to artistic functions for people to appreciate and play with. On the other hand, it became a visual means to assist in understanding abstract concepts in science education. This research starts with the nature of music information visualization and its relationship with other disciplines. It first elaborates the background, reasons and current status of the research. On the basis of expounding the application principles of Schlieren imaging and laser Doppler imaging in music graphics image visualization research, K-means clustering, fusion decision tree and other methods are used to cluster and classify music visualization information. The actual case analysis and performance test results show the superiority of the music visualization method based on graphic images and mathematical statistics. Due to the large number of graphic images and mathematical statistical analysis methods, the methods used in this article are still very limited. Therefore, using more abundant methods to study more representative music visualization information is the focus of future research.